The NEURAL BASES of MULTISENSORY PROCESSES
Edited by Micah M. Murray and Mark T. Wallace
FRONTIERS IN NEUROSCIENCE
FR...
65 downloads
1143 Views
17MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The NEURAL BASES of MULTISENSORY PROCESSES
Edited by Micah M. Murray and Mark T. Wallace
FRONTIERS IN NEUROSCIENCE
FRONTIERS IN NEUROSCIENCE
The NEURAL BASES of MULTISENSORY PROCESSES
FRONTIERS IN NEUROSCIENCE Series Editors Sidney A. Simon, Ph.D. Miguel A.L. Nicolelis, M.D., Ph.D.
Published Titles Apoptosis in Neurobiology Yusuf A. Hannun, M.D., Professor of Biomedical Research and Chairman, Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, South Carolina Rose-Mary Boustany, M.D., tenured Associate Professor of Pediatrics and Neurobiology, Duke University Medical Center, Durham, North Carolina Neural Prostheses for Restoration of Sensory and Motor Function John K. Chapin, Ph.D., Professor of Physiology and Pharmacology, State University of New York Health Science Center, Brooklyn, New York Karen A. Moxon, Ph.D., Assistant Professor, School of Biomedical Engineering, Science, and Health Systems, Drexel University, Philadelphia, Pennsylvania Computational Neuroscience: Realistic Modeling for Experimentalists Eric DeSchutter, M.D., Ph.D., Professor, Department of Medicine, University of Antwerp, Antwerp, Belgium Methods in Pain Research Lawrence Kruger, Ph.D., Professor of Neurobiology (Emeritus), UCLA School of Medicine and Brain Research Institute, Los Angeles, California Motor Neurobiology of the Spinal Cord Timothy C. Cope, Ph.D., Professor of Physiology, Wright State University, Dayton, Ohio Nicotinic Receptors in the Nervous System Edward D. Levin, Ph.D., Associate Professor, Department of Psychiatry and Pharmacology and Molecular Cancer Biology and Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, North Carolina Methods in Genomic Neuroscience Helmin R. Chin, Ph.D., Genetics Research Branch, NIMH, NIH, Bethesda, Maryland Steven O. Moldin, Ph.D., University of Southern California, Washington, D.C. Methods in Chemosensory Research Sidney A. Simon, Ph.D., Professor of Neurobiology, Biomedical Engineering, and Anesthesiology, Duke University, Durham, North Carolina Miguel A.L. Nicolelis, M.D., Ph.D., Professor of Neurobiology and Biomedical Engineering, Duke University, Durham, North Carolina The Somatosensory System: Deciphering the Brain’s Own Body Image Randall J. Nelson, Ph.D., Professor of Anatomy and Neurobiology, University of Tennessee Health Sciences Center, Memphis, Tennessee The Superior Colliculus: New Approaches for Studying Sensorimotor Integration William C. Hall, Ph.D., Department of Neuroscience, Duke University, Durham, North Carolina Adonis Moschovakis, Ph.D., Department of Basic Sciences, University of Crete, Heraklion, Greece
New Concepts in Cerebral Ischemia Rick C. S. Lin, Ph.D., Professor of Anatomy, University of Mississippi Medical Center, Jackson, Mississippi DNA Arrays: Technologies and Experimental Strategies Elena Grigorenko, Ph.D., Technology Development Group, Millennium Pharmaceuticals, Cambridge, Massachusetts Methods for Alcohol-Related Neuroscience Research Yuan Liu, Ph.D., National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland David M. Lovinger, Ph.D., Laboratory of Integrative Neuroscience, NIAAA, Nashville, Tennessee Primate Audition: Behavior and Neurobiology Asif A. Ghazanfar, Ph.D., Princeton University, Princeton, New Jersey Methods in Drug Abuse Research: Cellular and Circuit Level Analyses Barry D. Waterhouse, Ph.D., MCP-Hahnemann University, Philadelphia, Pennsylvania Functional and Neural Mechanisms of Interval Timing Warren H. Meck, Ph.D., Professor of Psychology, Duke University, Durham, North Carolina Biomedical Imaging in Experimental Neuroscience Nick Van Bruggen, Ph.D., Department of Neuroscience Genentech, Inc. Timothy P.L. Roberts, Ph.D., Associate Professor, University of Toronto, Canada The Primate Visual System John H. Kaas, Department of Psychology, Vanderbilt University, Nashville, Tennessee Christine Collins, Department of Psychology, Vanderbilt University, Nashville, Tennessee Neurosteroid Effects in the Central Nervous System Sheryl S. Smith, Ph.D., Department of Physiology, SUNY Health Science Center, Brooklyn, New York Modern Neurosurgery: Clinical Translation of Neuroscience Advances Dennis A. Turner, Department of Surgery, Division of Neurosurgery, Duke University Medical Center, Durham, North Carolina Sleep: Circuits and Functions Pierre-Hervé Luppi, Université Claude Bernard, Lyon, France Methods in Insect Sensory Neuroscience Thomas A. Christensen, Arizona Research Laboratories, Division of Neurobiology, University of Arizona, Tuscon, Arizona Motor Cortex in Voluntary Movements Alexa Riehle, INCM-CNRS, Marseille, France Eilon Vaadia, The Hebrew University, Jerusalem, Israel Neural Plasticity in Adult Somatic Sensory-Motor Systems Ford F. Ebner, Vanderbilt University, Nashville, Tennessee Advances in Vagal Afferent Neurobiology Bradley J. Undem, Johns Hopkins Asthma Center, Baltimore, Maryland Daniel Weinreich, University of Maryland, Baltimore, Maryland The Dynamic Synapse: Molecular Methods in Ionotropic Receptor Biology Josef T. Kittler, University College, London, England Stephen J. Moss, University College, London, England
Animal Models of Cognitive Impairment Edward D. Levin, Duke University Medical Center, Durham, North Carolina Jerry J. Buccafusco, Medical College of Georgia, Augusta, Georgia The Role of the Nucleus of the Solitary Tract in Gustatory Processing Robert M. Bradley, University of Michigan, Ann Arbor, Michigan Brain Aging: Models, Methods, and Mechanisms David R. Riddle, Wake Forest University, Winston-Salem, North Carolina Neural Plasticity and Memory: From Genes to Brain Imaging Frederico Bermudez-Rattoni, National University of Mexico, Mexico City, Mexico Serotonin Receptors in Neurobiology Amitabha Chattopadhyay, Center for Cellular and Molecular Biology, Hyderabad, India TRP Ion Channel Function in Sensory Transduction and Cellular Signaling Cascades Wolfgang B. Liedtke, M.D., Ph.D., Duke University Medical Center, Durham, North Carolina Stefan Heller, Ph.D., Stanford University School of Medicine, Stanford, California Methods for Neural Ensemble Recordings, Second Edition Miguel A.L. Nicolelis, M.D., Ph.D., Professor of Neurobiology and Biomedical Engineering, Duke University Medical Center, Durham, North Carolina Biology of the NMDA Receptor Antonius M. VanDongen, Duke University Medical Center, Durham, North Carolina Methods of Behavioral Analysis in Neuroscience Jerry J. Buccafusco, Ph.D., Alzheimer’s Research Center, Professor of Pharmacology and Toxicology, Professor of Psychiatry and Health Behavior, Medical College of Georgia, Augusta, Georgia In Vivo Optical Imaging of Brain Function, Second Edition Ron Frostig, Ph.D., Professor, Department of Neurobiology, University of California, Irvine, California Fat Detection: Taste, Texture, and Post Ingestive Effects Jean-Pierre Montmayeur, Ph.D., Centre National de la Recherche Scientifique, Dijon, France Johannes le Coutre, Ph.D., Nestlé Research Center, Lausanne, Switzerland The Neurobiology of Olfaction Anna Menini, Ph.D., Neurobiology Sector International School for Advanced Studies, (S.I.S.S.A.), Trieste, Italy Neuroproteomics Oscar Alzate, Ph.D., Department of Cell and Developmental Biology, University of North Carolina, Chapel Hill, North Carolina Translational Pain Research: From Mouse to Man Lawrence Kruger, Ph.D., Department of Neurobiology, UCLA School of Medicine, Los Angeles, California Alan R. Light, Ph.D., Department of Anesthesiology, University of Utah, Salt Lake City, Utah Advances in the Neuroscience of Addiction Cynthia M. Kuhn, Duke University Medical Center, Durham, North Carolina George F. Koob, The Scripps Research Institute, La Jolla, California
Neurobiology of Huntington’s Disease: Applications to Drug Discovery Donald C. Lo, Duke University Medical Center, Durham, North Carolina Robert E. Hughes, Buck Institute for Age Research, Novato, California Neurobiology of Sensation and Reward Jay A. Gottfried, Northwestern University, Chicago, Illinois The Neural Bases of Multisensory Processes Micah M. Murray, CIBM, Lausanne, Switzerland Mark T. Wallace, Vanderbilt Brain Institute, Nashville, Tennessee
The NEURAL BASES of MULTISENSORY PROCESSES Edited by
Micah M. Murray Center for Biomedical Imaging Lausanne, Switzerland
Mark T. Wallace Vanderbilt University Nashville, Tennessee
Boca Raton London New York
CRC Press is an imprint of the Taylor & Francis Group, an informa business
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works
International Standard Book Number-13: 978-1-4398-1219-8 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
Contents Series Preface.................................................................................................................................. xiii Introduction....................................................................................................................................... xv Editors..............................................................................................................................................xix Contributors.....................................................................................................................................xxi
Section I Anatomy Chapter 1 Structural Basis of Multisensory Processing: Convergence.........................................3 H. Ruth Clemo, Leslie P. Keniston, and M. Alex Meredith Chapter 2 Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay........ 15 Céline Cappe, Eric M. Rouiller, and Pascal Barone Chapter 3 What Can Multisensory Processing Tell Us about the Functional Organization of Auditory Cortex?..................................................................................................... 31 Jennifer K. Bizley and Andrew J. King
Section II Neurophysiological Bases Chapter 4 Are Bimodal Neurons the Same throughout the Brain?............................................. 51 M. Alex Meredith, Brian L. Allman, Leslie P. Keniston, and H. Ruth Clemo Chapter 5 Audiovisual Integration in Nonhuman Primates: A Window into the Anatomy and Physiology of Cognition....................................................................................... 65 Yoshinao Kajikawa, Arnaud Falchier, Gabriella Musacchia, Peter Lakatos, and Charles E. Schroeder Chapter 6 Multisensory Influences on Auditory Processing: Perspectives from fMRI and Electrophysiology.......................................................................................99 Christoph Kayser, Christopher I. Petkov, Ryan Remedios, and Nikos K. Logothetis Chapter 7 Multisensory Integration through Neural Coherence............................................... 115 Andreas K. Engel, Daniel Senkowski, and Till R. Schneider Chapter 8 The Use of fMRI to Assess Multisensory Integration.............................................. 131 Thomas W. James and Ryan A. Stevenson ix
x
Contents
Chapter 9 Perception of Synchrony between the Senses........................................................... 147 Mirjam Keetels and Jean Vroomen Chapter 10 Representation of Object Form in Vision and Touch................................................ 179 Simon Lacey and Krish Sathian
Section III Combinatorial Principles and Modeling Chapter 11 Spatial and Temporal Features of Multisensory Processes: Bridging Animal and Human Studies................................................................................................... 191 Diana K. Sarko, Aaron R. Nidiffer, Albert R. Powers III, Dipanwita Ghose, Andrea Hillock-Dunn, Matthew C. Fister, Juliane Krueger, and Mark T. Wallace Chapter 12 Early Integration and Bayesian Causal Inference in Multisensory Perception......... 217 Ladan Shams Chapter 13 Characterization of Multisensory Integration with fMRI: Experimental Design, Statistical Analysis, and Interpretation........................................................ 233 Uta Noppeney Chapter 14 Modeling Multisensory Processes in Saccadic Responses: Time- Window-of- Integration Model......................................................................... 253 Adele Diederich and Hans Colonius
Section IV Development and Plasticity Chapter 15 The Organization and Plasticity of Multisensory Integration in the Midbrain........ 279 Thomas J. Perrault Jr., Benjamin A. Rowland, and Barry E. Stein Chapter 16 Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony on Interaction Dynamics between Primary Auditory and Primary Visual Cortex.............................................................................................. 301 Antje Fillbrandt and Frank W. Ohl Chapter 17 Development of Multisensory Temporal Perception................................................. 325 David J. Lewkowicz Chapter 18 Multisensory Integration Develops Late in Humans................................................ 345 David Burr and Monica Gori
xi
Contents
Chapter 19 Phonetic Recalibration in Audiovisual Speech......................................................... 363 Jean Vroomen and Martijn Baart Chapter 20 Multisensory Integration and Aging......................................................................... 381 Jennifer L. Mozolic, Christina E. Hugenschmidt, Ann M. Peiffer, and Paul J. Laurienti
Section V Clinical Manifestations Chapter 21 Neurophysiological Mechanisms Underlying Plastic Changes and Rehabilitation following Sensory Loss in Blindness and Deafness.......................... 395 Ella Striem-Amit, Andreja Bubic, and Amir Amedi Chapter 22 Visual Abilities in Individuals with Profound Deafness: A Critical Review............ 423 Francesco Pavani and Davide Bottari Chapter 23 Peripersonal Space: A Multisensory Interface for Body–Object Interactions..........449 Claudio Brozzoli, Tamar R. Makin, Lucilla Cardinali, Nicholas P. Holmes, and Alessandro Farnè Chapter 24 Multisensory Perception and Bodily Self-Consciousness: From Out-of-Body to Inside-Body Experience............................................................................................ 467 Jane E. Aspell, Bigna Lenggenhager, and Olaf Blanke
Section VI Attention and Spatial Representations Chapter 25 Spatial Constraints in Multisensory Attention.......................................................... 485 Emiliano Macaluso Chapter 26 Cross-Modal Spatial Cueing of Attention Influences Visual Perception.................. 509 John J. McDonald, Jessica J. Green, Viola S. Störmer, and Steven A. Hillyard Chapter 27 The Colavita Visual Dominance Effect.................................................................... 529 Charles Spence, Cesare Parise, and Yi-Chuan Chen Chapter 28 The Body in a Multisensory World........................................................................... 557 Tobias Heed and Brigitte Röder
xii
Contents
Section VII Naturalistic Multisensory Processes: Motion Signals Chapter 29 Multisensory Interactions during Motion Perception: From Basic Principles to Media Applications............................................................................................... 583 Salvador Soto-Faraco and Aleksander Väljamäe Chapter 30 Multimodal Integration during Self-Motion in Virtual Reality................................603 Jennifer L. Campos and Heinrich H. Bülthoff Chapter 31 Visual–Vestibular Integration for Self-Motion Perception........................................ 629 Gregory C. DeAngelis and Dora E. Angelaki
Section VIII Naturalistic Multisensory Processes: Communication Signals Chapter 32 Unity of the Senses for Primate Vocal Communication........................................... 653 Asif A. Ghazanfar Chapter 33 Convergence of Auditory, Visual, and Somatosensory Information in Ventral Prefrontal Cortex....................................................................................................... 667 Lizabeth M. Romanski Chapter 34 A Multisensory Perspective on Human Auditory Communication.......................... 683 Katharina von Kriegstein
Section IX Naturalistic Multisensory Processes: Flavor Chapter 35 Multimodal Chemosensory Interactions and Perception of Flavor.......................... 703 John Prescott Chapter 36 A Proposed Model of a Flavor Modality.................................................................. 717 Dana M. Small and Barry G. Green Chapter 37 Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor......................................................................................................................... 739 Massimiliano Zampini and Charles Spence
Series Preface FRONTIERS IN NEUROSCIENCE The Frontiers in Neuroscience Series presents the insights of experts on emerging experimental technologies and theoretical concepts that are or will be at the vanguard of neuroscience. The books cover new and exciting multidisciplinary areas of brain research and describe breakthroughs in fields such as insect sensory neuroscience, primate audition, and biomedical imaging. The most recent books cover the rapidly evolving fields of multisensory processing and reward. Each book is edited by experts and consists of chapters written by leaders in a particular field. Books are richly illustrated and contain comprehensive bibliographies. Chapters provide substantial background material relevant to the particular subject. The goal is for these books to be the references neuroscientists use in order to acquaint themselves with new methodologies in brain research. We view our task as series editors to produce outstanding products and to contribute to the field of neuroscience. We hope that, as the volumes become available, the effort put in by us, the publisher, the book editors, and individual authors will contribute to further development of brain research. To the extent that you learn from these books, we will have succeeded. Sidney A. Simon, PhD Miguel A.L. Nicolelis, MD, PhD
xiii
Introduction The field of multisensory research continues to grow at a dizzying rate. Although for those of us working in the field this is extraordinarily gratifying, it is also a bit challenging to keep up with all of the exciting new developments in such a multidisciplinary topic at such a burgeoning stage. For those a bit peripheral to the field, but with an inherent interest in the magic of multisensory interactions to shape our view of the world, the task is even more daunting. Our objectives for this book are straightforward—to provide those working within the area a strong overview of the current state-of-the field, while at the same time providing those a bit outside of the field with a solid introduction to multisensory processes. We feel that the current volume meets these objectives, largely through a choice of topics that span the single cell to the clinic and through the expertise of our authors, each of whom has done an exceptional job explaining their research to an interdisciplinary audience. The book is organized thematically, with the themes generally building from the more basic to the more applied. Hence, a reader interested in the progression of ideas and approaches can start at the beginning and see how the basic science informs the clinical and more applied sciences by reading each chapter in sequence. Alternatively, one can choose to learn more about a specific theme and delve directly into that section. Regardless of your approach, we hope that this book will serve as an important reference related to your interests in multisensory processes. The following narrative provides a bit of an overview to each of the sections and the chapters contained within them. Section I (Anatomy) focuses on the essential building blocks for any understanding of the neural substrates of multisensory processing. In Chapter 1, Clemo and colleagues describe how neural convergence and synaptology in multisensory domains might account for the diversity of physiological response properties, and provide elegant examples of structure/function relationships. Chapter 2, from Cappe and colleagues, details the anatomical substrates supporting the growing functional evidence for multisensory interactions in classical areas of unisensory cortex, and which highlights the possible thalamic contributions to these processes. In Chapter 3, Bizley and King focus on the unisensory cortical domain that has been best studied for these multisensory influences—auditory cortex. They highlight how visual inputs into the auditory cortex are organized, and detail the possible functional role(s) of these inputs. Section II, organized around Neurophysiological Bases, provides an overview of how multisensory stimuli can dramatically change the encoding processes for sensory information. Chapter 4, by Meredith and colleagues, addresses whether bimodal neurons throughout the brain share the same integrative characteristics, and shows marked differences in these properties between subcortex and cortex. Chapter 5, from Kajikawa and colleagues, focuses on the nonhuman primate model and bridges what is known about the neural integration of auditory–visual information in monkey cortex with the evidence for changes in multisensory-mediated behavior and perception. In Chapter 6, Kayser and colleagues also focus on the monkey model, with an emphasis now on auditory cortex and the merging of classical neurophysiological analyses with neuroimaging methods used in human subjects (i.e., functional magnetic resonance imaging (fMRI)). This chapter emphasizes not only early multisensory interactions, but also the transformations that take place as one ascends the processing hierarchy as well as the distributed nature of multisensory encoding. The final four chapters in this section then examine evidence from humans. In Chapter 7, Engel and colleagues present compelling evidence for a role of coherent oscillatory activity in linking unisensory and multisensory brain regions and improving multisensory encoding processes. This is followed by a contribution from James and Stevenson (Chapter 8), which focuses on fMRI measures of multisensory integration and which proposes a new criterion based on inverse effectiveness in evaluating and xv
xvi
Introduction
interpreting the BOLD signal. Chapter 9, by Keetels and Vroomen, reviews the psychophysical and neuroimaging evidence associated with the perception of the temporal relationships (i.e., synchrony and asynchrony) between multisensory cues. Finally, this section closes with a chapter from Lacey and Sathian (Chapter 10), which reviews our current neuroimaging knowledge concerning the mental representations of objects across vision and touch. Section III, Combinatorial Principles and Modeling, focuses on efforts to gain a better mechanistic handle on multisensory operations and their network dynamics. In Chapter 11, Sarko and colleagues focus on spatiotemporal analyses of multisensory neurons and networks as well as commonalities across both animal and human model studies. This is followed by a contribution from Shams, who reviews the psychophysical evidence for multisensory interactions and who argues that these processes can be well described by causal inference and Bayesian modeling approaches. In Chapter 13, Noppeney returns to fMRI and illustrates the multiple methods of analyses of fMRI datasets, the interpretational caveats associated with these approaches, and how the combined use of methods can greatly strengthen the conclusions that can be drawn. The final contribution (Chapter 14), from Diederich and Colonius, returns to modeling and describes the time-window-ofintegration (TWIN) model, which provides an excellent framework within which to interpret the speeding of saccadic reaction times seen under multisensory conditions. Section IV encompasses the area of Development and Plasticity. Chapter 15, from Perrault and colleagues, describes the classic model for multisensory neural studies, the superior colliculus, and highlights the developmental events leading up to the mature state. In Chapter 16, Fillbrandt and Ohl explore temporal plasticity in multisensory networks and shows changes in the dynamics of interactions between auditory and visual cortices following prolonged exposure to fixed auditory– visual delays. The next two contributions focus on human multisensory development. In Chapter 17, Lewkowicz details the development of multisensory temporal processes, highlighting the increasing sophistication in these processes as infants grow and gain experience with the world. Chapter 18, by Burr and Gori, reviews the neurophysiological, behavioral and imaging evidence that illustrates the surprisingly late development of human multisensory capabilities, a finding that they posit is a result of the continual need for cross-modal recalibration during development. In Chapter 19, Vroomen and Baart also discuss recalibration, this time in the context of language acquisition. They argue that in the process of phonetic recalibration, the visual system instructs the auditory system to build phonetic boundaries in the presence of ambiguous sound sources. Finally, Chapter 20 focuses on what can be considered the far end of the developmental process—normal aging. Here, Mozolic and colleagues review the intriguing literature suggesting enhanced multisensory processing in aging adults, and highlight a number of possible reasons for these apparent improvements in sensory function. Section V, Clinical Manifestations, addresses how perception and action are affected by altered sensory experience. In Chapter 21, Striem-Amit and colleagues focus on sensory loss, placing particular emphasis on plasticity following blindness and on efforts to introduce low-cost sensory substitution devices as rehabilitation tools. The functional imaging evidence they review provides a striking example of training-induced plasticity. In Chapter 22, Pavani and Bottari likewise consider sensory loss, focusing on visual abilities in profoundly deaf individuals. One contention in their chapter is that deafness results in enhanced speed of reactivity to visual stimuli, rather than enhanced visual perceptual abilities. In Chapter 23, Brozzoli and colleagues use the case of visuotactile interactions as an example of how multisensory brain mechanisms can be rendered plastic both in terms of sensory as well as motor processes. This plasticity is supported by the continuous and active monitoring of peripersonal space, including both one’s own body and the objects in its vicinity. In Chapter 24, Aspell and colleagues address the topic of bodily self-consciousness both in neurological patients and healthy participants, showing how the perception of one’s “self” can be distorted by multisensory conflicts. Section VI encompasses the topic of Attention and Spatial Representations. A contribution from Macaluso opens this section by reviewing putative neural mechanisms for multisensory links in the
Introduction
xvii
control of spatial attention as revealed by functional neuroimaging in humans. He puts particular emphasis on there likely being multiple functional–anatomic routes for these links, which in turn can provide a degree of flexibility in the manner by which sensory information at a given location is selected and processed. In Chapter 26, McDonald and colleagues follow this with a review of studies showing how nonvisual cues impact the subsequent processing (i.e., sensitivity, perceptual awareness, and subjective experiences) of visual stimuli, demonstrating how such effects can manifest within the first 200 ms of visual processing. Chapter 27, by Spence and colleagues, provides a review of the Colavita visual dominance effect, including the proposition of an account for this effect based on biased competition. Finally, in Chapter 28 Heed and Röder conclude this section with a consideration of how the body schema is established and how an established body schema in turn impacts the manner in which multisensory stimuli are treated. Section VII focuses on Naturalistic Multisensory Processes in the context of motion signals. In Chapter 29, Soto-Faraco and Väljamäe open this section with a consideration of how motion information conveyed by audition and vision is integrated. First, they address the basic phenomenology and behavioral principles. They then review studies examining the neurophysiologic bases for the integration of multisensory motion signals. Finally, they discuss how laboratory findings can be extended to media applications. In Chapter 30, Campos and Bülthoff address the topic of selfmotion perception. They describe and evaluate experimental settings and technologies for studying self-motion, including the empirical findings that these methods and paradigms have produced. The section concludes with a contribution from DeAngelis and Angelaki (Chapter 31), who review their studies of visual–vestibular interactions in the dorsal medial superior temporal area (MSTd) of macaque monkeys. Their review progresses from the characterization of heading-sensitive multisensory neurons, to a mathematical description of the visual–vestibular integration within MSTd neurons, and finally to describing the links between neuronal and behavioral processes. Section VIII continues the focus on Naturalistic Multisensory Processes, now with a particular concentration on multisensory contributions to the perception and generation of communication signals. In Chapter 32, Ghazanfar challenges Geschwind’s proposition that speech functions in humans are intrinsically linked to the unique ability of humans to form multisensory associations. He reviews the multisensory contributions to communication signals in nonhuman primates as well as the role of auditory cortex in processing such signals. In Chapter 33, Romanski details the auditory, visual, and somatosensory anatomical projections to the prefrontal cortex (VLPFC) as well as neuronal responsiveness within this region with respect to communication signals and object processing. The section closes with Chapter 34 by von Kriegstein that considers how unisensory auditory communication is impacted by previous multisensory auditory–visual encoding as well as by auditory-driven activity within nominally visual brain regions. One implication is that the processing of auditory communication signals is achieved using not only auditory but also visual brain areas. The final section, Section IX, Naturalistic Multisensory Processes, concentrates on how the perception of flavor is generated. In a pair of complementary chapters, psychophysical and neural models of flavor perception are reviewed. In Chapter 35, Prescott focuses on psychophysical findings and covers processes ranging from basic sensation through learned olfactory–taste associations, as well as the roles of synthetic versus fused perceptions, attention, and hedonics. Chapter 36, by Small and Green, focuses largely on evidence from functional brain imaging. They propose that a distributed network of regions is responsible for generating the perceived flavors of objects. Finally, in Chapter 37, Zampini and Spence conclude with a review of evidence for the impact of visual and acoustic features on the perception of flavor. They distinguish between preingestive effects of vision, which are more likely linked to expectancy, and effects of audition that coincide with ingestion. In parallel, they discuss how auditory and visual influences can occur without awareness, highlighting the necessity for increased neuroscientific investigation of these processes. We hope that the reader enjoys this book as much as we have enjoyed assembling it. We have both learned much during this endeavor, and have gained an even deeper fascination and appreciation for
xviii
Introduction
our chosen field of inquiry. We are delighted by the diversity of experimental models, methodological approaches, and conceptual frameworks that are used in the study of multisensory processes, and that are reflected in the current volume. Indeed, in our opinion, the success of our field and its rapid growth are attributable to this highly multidisciplinary philosophy, and bode well for the future of multisensory science. Micah M. Murray Lausanne, Switzerland Mark T. Wallace Nashville, Tennessee
Editors Micah M. Murray earned a double BA in psychology and English from The Johns Hopkins University. In 2001, he received his PhD with honors from the Neuroscience Department, Albert Einstein College of Medicine of Yeshiva University. He worked as a postdoctoral scientist in the Neurology Clinic and Rehabilitation Department, University Hospital of Geneva, Switzerland. Since 2003 he has held a position within the Department of Clinical Neurosciences and Department of Radiology at the University Hospital of Lausanne, Switzerland. Currently, he is an associate professor within these departments, adjunct associate professor at Vanderbilt University, as well as associate director of the EEG Brain Mapping Core of the Center for Biomedical Imaging in Lausanne, Switzerland. Dr. Murray has a contiguous record of grant support from the Swiss National Science Foundation. He has received awards for his research from the Leenaards Foundation (2005 Prize for the Promotion of Scientific Research), the faculty of Biology and Medicine at the University of Lausanne (2008 Young Investigator Prize), and from the Swiss National Science Foundation (bonus of excellence in research). His research has been widely covered by the national and international media. He currently holds editorial board positions at Brain Topography (editor-in-chief), Journal of Neuroscience (associate editor), Frontiers in Integrative Neuroscience (associate editor), Frontiers in Auditory Cognitive Neuroscience (associate editor), and the Scientific World Journal. Dr. Murray has authored more than 80 articles and book chapters. His group’s research primarily focuses on multisensory interactions, object recognition, learning and plasticity, electroencephalogram-correlated functional MRI (EEG/fMRI) methodological developments, and systems/cognitive neuroscience in general. Research in his group combines psychophysics, EEG, fMRI, and transcranial magnetic simulation in healthy and clinical populations. Mark T. Wallace received his BS in biology from Temple University in 1985, and his PhD in neuroscience from Temple University in 1990, where he was the recipient of the Russell Conwell Presidential Fellowship. He did a postdoctoral fellowship with Dr. Barry Stein at the Medical College of Virginia, where he began his research looking at the neural mechanisms of multisensory integration. Dr. Wallace moved to the Wake Forest University School of Medicine in 1995. In 2006, Dr. Wallace came to Vanderbilt University, and was named the director of the Vanderbilt Brain Institute in 2008. He is professor of hearing and speech sciences, psychology, and psychiatry, and the associate director of the Vanderbilt Silvio O. Conte Center for Basic Neuroscience Research. He is a member of the Center for Integrative and Cognitive Neuroscience, the Center for Molecular Neuroscience, the Vanderbilt Kennedy Center, and the Vanderbilt Vision Research Center. Dr. Wallace has received a number of awards for both research and teaching, including the Faculty Excellence Award of Wake Forest University and being named the Outstanding Young Investigator in the Basic Sciences. Dr. Wallace has an established record of research funding from the National Institutes of Health, and is the author of more than 125 research presentations and publications. He currently serves on the editorial board of several journals including Brain Topography, Cognitive Processes, and Frontiers in Integrative Neuroscience. His work has employed a multidisciplinary approach to examining multisensory processing, and focuses upon the neural architecture of multisensory integration, its development, and its role in guiding human perception and performance.
xix
Contributors Brian L. Allman Department of Anatomy and Neurobiology Virginia Commonwealth University School of Medicine Richmond, Virginia Amir Amedi Department of Medical Neurobiology, Institute for Medical Research Israel–Canada Hebrew University–Hadassah Medical School Jerusalem, Israel Dora E. Angelaki Department of Anatomy and Neurobiology Washington University School of Medicine St. Louis, Missouri Jane E. Aspell Laboratory of Cognitive Neuroscience Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland Martijn Baart Department of Medical Psychology and Neuropsychology Tilburg University Tilburg, the Netherlands Pascal Barone Centre de Recherche Cerveau et Cognition (UMR 5549) CNRS, Faculté de Médecine de Rangueil Université Paul Sabatier Toulouse 3 Toulouse, France Jennifer K. Bizley Department of Physiology, Anatomy, and Genetics University of Oxford Oxford, United Kingdom
Olaf Blanke Laboratory of Cognitive Neuroscience Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland Davide Bottari Center for Mind/Brain Sciences University of Trento Rovereto, Italy Claudio Brozzoli Institut National de la Santé et de la Recherche Médicale Bron, France Andreja Bubic Department of Medical Neurobiology, Institute for Medical Research Israel–Canada Hebrew University–Hadassah Medical School Jerusalem, Israel Heinrich H. Bülthoff Department of Human Perception, Cognition, and Action Max Planck Institute for Biological Cybernetics Tübingen, Germany David Burr Dipartimento di Psicologia Università Degli Studi di Firenze Florence, Italy Jennifer L. Campos Department of Psychology Toronto Rehabilitation Institute University of Toronto Toronto, Ontario, Canada Céline Cappe Laboratory of Psychophysics Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland
xxi
xxii
Lucilla Cardinali Institut National de la Santé et de la Recherche Médicale Bron, France Yi-Chuan Chen Crossmodal Research Laboratory Department of Experimental Psychology University of Oxford Oxford, United Kingdom H. Ruth Clemo Department of Anatomy and Neurobiology Virginia Commonwealth University School of Medicine Richmond, Virginia Hans Colonius Department of Psychology Oldenburg University Oldenburg, Germany Gregory C. DeAngelis Department of Brain and Cognitive Sciences Center for Visual Science University of Rochester Rochester, New York Adele Diederich School of Humanities and Social Sciences Jacobs University Bremen, Germany Andreas K. Engel Department of Neurophysiology and Pathophysiology University Medical Center Hamburg–Eppendorf Hamburg, Germany Arnaud Falchier Nathan S. Kline Institute for Psychiatric Research Orangeburg, New York Alessandro Farnè Institut National de la Santé et de la Recherche Médicale Bron, France
Contributors
Antje Fillbrandt Leibniz Institute for Neurobiology Magdeburg, Germany Matthew C. Fister Vanderbilt Kennedy Center Vanderbilt University Nashville, Tennessee Asif A. Ghazanfar Departments of Psychology and Ecology and Evolutionary Biology Neuroscience Institute Princeton University Princeton, New Jersey Dipanwita Ghose Department of Psychology Vanderbilt University Nashville, Tennessee Monica Gori Department of Robotics Brain and Cognitive Science Italian Institute of Technology Genoa, Italy Barry G. Green The John B. Pierce Laboratory and Yale University New Haven, Connecticut Jessica J. Green Duke University Durham, North Carolina Tobias Heed Biological Psychology and Neuropsychology University of Hamburg Hamburg, Germany Andrea Hillock-Dunn Department of Hearing and Speech Sciences Vanderbilt University Nashville, Tennessee Steven A. Hillyard University of California San Diego San Diego, California
xxiii
Contributors
Nicholas P. Holmes Institut National de la Santé et de la Recherche Médicale Bron, France
Simon Lacey Department of Neurology Emory University Atlanta, Georgia
Christina E. Hugenschmidt Center for Diabetes Research Wake Forest University School of Medicine Winston-Salem, North Carolina
Peter Lakatos Nathan S. Kline Institute for Psychiatric Research Orangeburg, New York
Thomas W. James Department of Psychological and Brain Sciences Indiana University Bloomington, Indiana
Paul J. Laurienti Department of Radiology Wake Forest University School of Medicine Winston-Salem, North Carolina
Yoshinao Kajikawa Nathan S. Kline Institute for Psychiatric Research Orangeburg, New York Christoph Kayser Max Planck Institute for Biological Cybernetics Tübingen, Germany Mirjam Keetels Department of Medical Psychology and Neuropsychology Tilburg University Tilburg, The Netherlands Leslie P. Keniston Department of Anatomy and Neurobiology Virginia Commonwealth University School of Medicine Richmond, Virginia Andrew J. King Department of Physiology, Anatomy and Genetics University of Oxford Oxford, United Kingdom Katharina von Kriegstein Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, Germany Juliane Krueger Neuroscience Graduate Program Vanderbilt University Nashville, Tennessee
Bigna Lenggenhager Laboratory of Cognitive Neuroscience Ecole Polytechnique Fédérale de Lausanne Lausanne, Switzerland David J. Lewkowicz Department of Psychology Florida Atlantic University Boca Raton, Florida Nikos K. Logothetis Max Planck Institute for Biological Cybernetics Tübingen, Germany Emiliano Macaluso Neuroimaging Laboratory Santa Lucia Foundation Rome, Italy Tamar R. Makin Institut National de la Santé et de la Recherche Médicale Bron, France John J. McDonald Simon Fraser University Burnaby, British Columbia, Canada M. Alex Meredith Department of Anatomy and Neurobiology Virginia Commonwealth University School of Medicine Richmond, Virginia
xxiv
Contributors
Jennifer L. Mozolic Department of Psychology Warren Wilson College Asheville, North Carolina
Albert R. Powers III Neuroscience Graduate Program Vanderbilt University Nashville, Tennessee
Gabriella Musacchia Nathan S. Kline Institute for Psychiatric Research Orangeburg, New York
John Prescott School of Psychology University of Newcastle Ourimbah, Australia
Aaron R. Nidiffer Department of Hearing and Speech Sciences Vanderbilt University Nashville, Tennessee Uta Noppeney Max Planck Institute for Biological Cybernetics Tübingen, Germany Frank W. Ohl Leibniz Institute for Neurobiology Magdeburg, Germany Cesare Parise Department of Experimental Psychology Crossmodal Research Laboratory University of Oxford Oxford, United Kingdom Francesco Pavani Department of Cognitive Sciences and Education Center for Mind/Brain Sciences University of Trento Rovereto, Italy Ann M. Peiffer Department of Radiology Wake Forest University School of Medicine Winston-Salem, North Carolina Thomas J. Perrault Jr. Department of Neurobiology and Anatomy Wake Forest School of Medicine Winston-Salem, North Carolina Christopher I. Petkov Institute of Neuroscience University of Newcastle Newcastle upon Tyne, United Kingdom
Ryan Remedios Max Planck Institute for Biological Cybernetics Tübingen, Germany Brigitte Röder Biological Psychology and Neuropsychology University of Hamburg Hamburg, Germany Lizabeth M. Romanski Department of Neurobiology and Anatomy University of Rochester Rochester, New York Eric M. Rouiller Unit of Physiology and Program in Neurosciences Department of Medicine, Faculty of Sciences University of Fribourg Fribourg, Switzerland Benjamin A. Rowland Department of Neurobiology and Anatomy Wake Forest School of Medicine Winston-Salem, North Carolina Diana K. Sarko Department of Hearing and Speech Sciences Vanderbilt University Nashville, Tennessee Krish Sathian Department of Neurology Emory University Atlanta, Georgia Till R. Schneider Department of Neurophysiology and Pathophysiology University Medical Center Hamburg–Eppendorf Hamburg, Germany
xxv
Contributors
Charles E. Schroeder Nathan S. Kline Institute for Psychiatric Research Orangeburg, New York Daniel Senkowski Department of Neurophysiology and Pathophysiology University Medical Center Hamburg–Eppendorf Hamburg, Germany Ladan Shams Department of Psychology University of California, Los Angeles Los Angeles, California Dana M. Small The John B. Pierce Laboratory and Department of Psychiatry Yale University School of Medicine New Haven, Connecticut Salvador Soto-Faraco Departament de Tecnologies de la Informació i les Comunicacions Institució Catalana de Reserca i Estudis Avançats Universitat Pompeu Fabra Barcelona, Spain Charles Spence Department of Experimental Psychology Crossmodal Research Laboratory University of Oxford Oxford, United Kingdom Barry E. Stein Department of Neurobiology and Anatomy Wake Forest School of Medicine Winston-Salem, North Carolina
Ryan A. Stevenson Department of Psychological and Brain Sciences Indiana University Bloomington, Indiana Viola S. Störmer Max Planck Institute of Human Development Berlin, Germany Ella Striem-Amit Department of Medical Neurobiology, Institute for Medical Research Israel–Canada Hebrew University–Hadassah Medical School Jerusalem, Israel Aleksander Väljamäe Institute of Audiovisual Studies Universitat Pompeu Fabra Barcelona, Spain Jean Vroomen Department of Medical Psychology and Neuropsychology Tilburg University Tilburg, The Netherlands Mark T. Wallace Vanderbilt Brain Institute Vanderbilt University Nashville, Tennessee Massimiliano Zampini Centre for Mind/Brain Sciences University of Trento Rovereto, Italy
Section I Anatomy
1
Structural Basis of Multisensory Processing Convergence H. Ruth Clemo, Leslie P. Keniston, and M. Alex Meredith
CONTENTS 1.1 Introduction...............................................................................................................................3 1.2 Multiple Sensory Projections: Sources......................................................................................3 1.2.1 Multiple Sensory Projections: Termination Patterns.....................................................6 1.2.2 Supragranular Termination of Cross-Modal Projections.............................................. 7 1.3 Do All Cross-Modal Projections Generate Multisensory Integration?.....................................9 1.4 Synaptic Architecture of Multisensory Convergence.............................................................. 10 1.5 Summary and Conclusions...................................................................................................... 11 Acknowledgments............................................................................................................................. 12 References......................................................................................................................................... 12
1.1 INTRODUCTION For multisensory processing, the requisite, defining step is the convergence of inputs from different sensory modalities onto individual neurons. This arrangement allows postsynaptic currents evoked by different modalities access to the same membrane, to collide and integrate there on the common ground of an excitable bilayer. Naturally, one would expect a host of biophysical and architectural features to play a role in shaping those postsynaptic events as they spread across the membrane, but much more can be written about what is unknown of the structural basis for multisensory integration than of what is known. Historically, however, what has primarily been the focus of anatomical investigations of multisensory processing has been the identification of sources of inputs that converge in multisensory regions. Although a few recent studies have begun to assess the features of convergence (see below), most of what is known about the structural basis of multisensory processing lies in the sources and pathways essentially before convergence.
1.2 MULTIPLE SENSORY PROJECTIONS: SOURCES Multisensory processing is defined as the influence of one sensory modality on activity generated by another modality. However, for most of its history, the term “multisensory” had been synonymous with the term “bimodal” (describing a neuron that can be activated by the independent presentation of stimuli from more than one modality). Hence, studies of multisensory connections first identified areas that were bimodal, either as individual neurons (Horn and Hill 1966) or areal responses to different sensory stimuli (e.g., Toldi et al. 1984). Not surprisingly, the bimodal (and trimodal) areas of the superior temporal sulcus (STS) in monkeys (e.g., Benevento et al. 1977; Bruce et al. 1981; Hikosaka et al. 1988) were readily identified. Among the first comprehensive 3
4
The Neural Bases of Multisensory Processes
assessments of multisensory pathways were those that injected tracers into the STS and identified the different cortical sources of inputs to that region. With tracer injections into the upper “polysensory” STS bank, retrogradely labeled neurons were identified in adjoining auditory areas of the STS, superior temporal gyrus, and supratemporal plane, and in visual areas of the inferior parietal lobule and the lateral intraparietal sulcus, with a somewhat more restricted projection from the parahippocampal gyrus and the inferotemporal visual area, as illustrated in Figure 1.1 (Seltzer and Pandya 1994; Saleem et al. 2000). Although inconclusive about potential somatosensory inputs to the STS, this study did mention the presence of retrogradely labeled neurons in the inferior parietal lobule, an area that processes both visual and somatosensory information (e.g., Seltzer and Pandya 1980). Like the STS, the feline anterior ectosylvian sulcus (AES) is located at the intersection of the temporal, parietal, and frontal lobes, contains multisensory neurons (e.g., Rauschecker and Korte 1993; Wallace et al. 1992; Jiang et al. 1994), and exhibits a higher-order visual area within its lower (ventral) bank (Mucke et al. 1982; Olson and Graybiel 1987). This has led to some speculation that these regions might be homologous. However, a fourth somatosensory area (SIV) representation (Clemo and Stein 1983) is found anterior along the AES, whereas somatosensory neurons are predominantly found in the posterior STS (Seltzer and Pandya 1994). The AES also contains distinct modality-specific regions (somatosensory SIV, visual AEV, and auditory FAES) with multisensory neurons found primarily at the intersection between these different representations (Meredith 2004; Wallace et al. 2004; Carriere et al. 2007; Meredith and Allman 2009), whereas the subdivisions of the upper STS bank are largely characterized by multisensory neurons (e.g., Benevento et al. 1977; Bruce et al. 1981; Hikosaka et al. 1988). Further distinctions between the STS and the AES reside in the cortical connectivity of the latter, as depicted in Figure 1.2. Robust somatosensory inputs reach the AES from somatosensory areas SI–SIII (Burton and Kopf 1984; Reinoso-Suarez and Roda 1985) and SV (Mori et al. 1996; Clemo and Meredith 2004); inputs to AEV arrive from the extrastriate visual area posterolateral lateral suprasylvian (PLLS), with smaller contributions from the anterolateral lateral suprasylvian (ALLS) and the posteromedial lateral suprasylvian (PMLS) visual areas (Olson and Graybiel 1987); auditory inputs to the FAES project from the rostral suprasylvian sulcus (RSS), second auditory area (AII), and posterior auditory field (PAF) (Clemo et al. 2007; Lee and Winer 2008). The laminar origin of these projections is provided in only a few of these reports.
CS
Superior
STS
LF
Posterior
FIGURE 1.1 Cortical afferents to monkey STS. On this lateral view of monkey brain, the entire extent of STS is opened (dashed lines) to reveal upper and lower banks. On upper bank, multisensory regions TP0–4 are located (not depicted). Auditory inputs (black arrows) from adjoining superior temporal gyrus, planum temporale, preferentially target anterior portions of upper bank. Visual inputs, primarily from parahippocampal gyrus (medium gray arrow) but also from inferior parietal lobule (light gray arrow), also target upper STS bank. Somatosensory inputs were comparatively sparse, limited to posterior aspects of STS, and may arise from part of inferior parietal lobule (light gray arrow). Note that inputs intermingle within their areas of termination.
Superior
Structural Basis of Multisensory Processing
5
AES
Posterior
FIGURE 1.2 Cortical afferents to cat AES. On this lateral view of cat cortex, the AES is opened (dashed lines) to reveal dorsal and ventral banks. The somatosensory representation SIV on the anterior dorsal bank receives inputs (light gray arrow) from somatosensory areas SI, SII, SII and SV. The auditory field of the AES (FAES) in the posterior end of the sulcus receives inputs (black arrows) primarily from the rostral suprasylvian auditory field, and sulcal portion of the anterior auditory field as well as portions of dorsal zone of the auditory cortex, AII, and PAF. The ectosylvian visual (AEV) area in the ventral bank receives visual inputs (dark gray arrow) primarily from PLLS and, to a lesser extent, from adjacent ALLS and PMLS visual areas. Note that the SIV, FAES, and AEV domains, as well as their inputs, are largely segregated from one another.
The AES is not alone as a cortical site of convergence of inputs from representations of different sensory modalities, as the posterior ectosylvian gyrus (an auditory–visual area; Bowman and Olson 1988), PLLS visual area (an auditory–visual area; Yaka et al. 2002; Allman and Meredith 2007), and the rostral suprasylvian sulcus (an auditory–somatosensory area; Clemo et al. 2007) have had their multiple sensory sources examined. Perhaps the most functionally and anatomically studied multisensory structure is not in the cortex, but the midbrain. This six-layered region contains spatiotopic representations of visual, auditory, and somatosensory modalities within its intermediate and deep layers (for review, see Stein and Meredith 1993). Although unisensory, bimodal, and trimodal neurons are intermingled with one another in this region, the multisensory neurons predominate (63%; Wallace and Stein 1997). Despite their numbers, structure–function relationships have been determined for only a few multisensory neurons. The largest, often most readily identifiable on cross section (or via recording) are the tectospinal and tectoreticulospinal neurons, with somata averaging 35 to 40 µm in diameter whose dendritic arbors can extend up to 1.4 mm (Moschovakis and Karabelas 1985; Behan et al. 1988). These large multipolar neurons have a high incidence of multisensory properties, usually as visual–auditory or visual–somatosensory bimodal neurons (Meredith and Stein 1986). Another form of morphologically distinct superior colliculus (SC) neuron also shows multisensory properties: the nitric oxide synthase (NOS)-positive interneuron. These excitatory local circuit neurons have been shown to receive bimodal inputs largely from the visual and auditory modalities (FuentesSantamaria et al. 2008). Thus, unlike most other structures identified as multisensory, the SC contains morphological classes of neurons that highly correlate with multisensory activity. Ultimately, this could contribute to understanding how multisensory circuits are formed and their relation to particular features of multisensory processing. Because the SC is a multisensory structure, anatomical tracers injected into it have identified numerous cortical and subcortical areas representing different sensory modalities that supply its inputs. However, identification of the sources of multiple sensory inputs to this, or any, area provides little more than anatomical confirmation that projections from different sensory modalities were involved. More pertinent is the information relating to the other end of the projection, the
6
The Neural Bases of Multisensory Processes
axon terminals, whose influence is responsible for the generation of multisensory effects on the postsynaptic membrane. Despite the fact that axon terminals are at the physical point of multisensory convergence, few studies of multisensory regions outside of the SC have addressed this specific issue.
1.2.1 Multiple Sensory Projections: Termination Patterns Unlike much of the multisensory cortex, the pattern of terminal projections to the SC is well described, largely through the efforts of Harting’s group (Harting and Van Leishout 1991; Harting et al. 1992, 1997). Historically, this work represented a conceptual leap from the identification of multisensory sources to the convergent arrangement of those inputs that potentially generate multisensory effects. These and other orthograde studies (e.g., Illing and Graybiel 1986) identified a characteristic, patchy arrangement of input terminals that occupied specific domains within the SC. Somatosensory inputs, whether from the somatosensory cortex or the trigeminal nucleus, terminated in an interrupted series of puffs across the mediolateral extent of the middle portion of the intermediate layers (Harting and Van Leishout 1991; Harting et al. 1992, 1997). On the other hand, visual inputs from, for example, the AEV, avoided the central aspects of the intermediate layers while occupying patches above and below. These relationships among distributions of axon terminals from different sensory modalities are depicted in Figure 1.3. This patchy, discontinuous pattern of termination characterized most projections to the deeper SC layers and was so consistent that some investigators came to regard them as a device by which the different inputs were compartmentalized within individually distinct functional domains (Illing and Graybiel 1986). Although this interpretation has some validity, it is also true (as mentioned above) that some of the multisensory neurons exhibit dendritic arbors of up to 1.4 mm. With this extensive branching pattern (as illustrated in Figure 1.3), it would be difficult for a neuron to avoid contacting the domains of different sensory inputs to the SC. In fact, it would appear that a multisensory tectoreticulospinal neuron would likely sample repeated input domains from several modalities, and it is difficult to imagine why there are not more SC trimodal neurons (9%; Wallace and Stein 1997). Ultimately,
SO
SGI
Visual - AEV Somatosensory - SIV 1 mm
FIGURE 1.3 Sensory segregation and multisensory convergence in SC. This coronal section through cat SC shows alternating cellular and fibrous layers (SO, stratum opticum; SGI, stratum griseum intermediale). Terminal boutons form a discontinuous, patchy distribution across multisensory layers with somatosensory (dark gray, from SIV) and visual (light gray, from AEV) inputs that largely occupy distinct, nonoverlapping domains. (Redrawn from Harting, J.K. et al., J. Comp. Neurol., 324, 379–414, 1992.) A tectoreticulospinal neuron (redrawn from Behan, M. et al., J. Comp. Neurol., 270, 171–184, 1988.) is shown, to scale, repeating across the intermediate layer where dendritic arbor virtually cannot avoid contacting multiple input domains from different modalities. Accordingly, tectoreticulospinal neurons are known for their multisensory properties.
7
Structural Basis of Multisensory Processing 1.
2.
3.
24
4.
AES SIV
1 mm
FAES injection
1 3 Inj.
FIGURE 1.4 Supragranular cross-modal projections from auditory FAES (black injection site) to somato sensory SIV. Coronal sections through SIV correlate with levels shown on lateral diagram of ferret cortex; location of AES is indicated by arrow. On each coronal section, SIV region is denoted by dashed lines roughly perpendicular to pial surface, and location of layer IV (granular layer) is indicated by dashed line essentially parallel to the gray-white border. Each dot is equivalent to one bouton labeled from FAES; note that a preponderance of labeled axon terminals are found in the supragranular layers. (Redrawn from Dehner, L.R. et al., Cereb. Cortex, 14, 387–403, 2004.)
these different input patterns suggest a complex spatial relationship with the recipient neurons and may provide a useful testing ground on which to determine the synaptic architecture underlying multisensory processing. With regard to cortical multisensory areas, only relatively recent studies have examined the termination patterns of multiple sensory projections (e.g., projections from auditory and visual sources to a target area) or cross-modal projections (e.g., projections from an auditory source to a visual target area). It had been observed that tracer injections into the anterior dorsal bank of the AES, where the somatosensory area SIV is located, produced retrograde labeling in the posterior aspects of the AES, where auditory field AES is found (Reinoso-Suarez and Roda 1985). This potential crossmodal projection was further examined by Dehner et al. (2004), who injected tracers in auditory FAES and identified orthograde projection terminals in SIV (see Figure 1.4). These experiments were repeated with the tracer systematically placed in different portions of the FAES, showing the constancy of the projection’s preference for terminating in the upper, supragranular layers of SIV (Dehner et al. 2004). Functionally, such a cross-modal projection between auditory and somatosensory areas would be expected to generate bimodal auditory–somatosensory neurons. However, such bimodal neurons have rarely been observed in SIV (Clemo and Stein 1983; Rauschecker and Korte 1993; Dehner et al. 2004) and stimulation of FAES (through indwelling electrodes) failed to elicit a single example of orthodromic activation via this cross-modal pathway (Dehner et al. 2004). Eventually, single- and combined-modality stimulation revealed that somatosensory SIV neurons received subthreshold influences from auditory inputs, which was described as a “new” form of multisensory convergence that was distinct from the well-known bimodal patterns identified in the SC and elsewhere (Dehner et al. 2004). These functional distinctions are depicted in Figure 1.5, where hypothetical circuits that produce different multisensory effects are illustrated. Ultimately, these experiments (Dehner et al. 2004) indicate that bimodal neurons are not the only form of multisensory neuron.
1.2.2 Supragranular Termination of Cross-Modal Projections The possibility that cross-modal projections underlying subthreshold multisensory processing might be generalizable to brain regions other than the SIV was examined in several subsequent investigations. Somatosensory area SIV was found to exhibit a reciprocal cross-modal projection to auditory FAES, where subthreshold somatosensory effects were observed in approximately 25%
8
The Neural Bases of Multisensory Processes
A
Bimodal
Subthreshold
Responds “A” Responds “B” Integrates “A+B”
Responds “A” No response “B” “B” facilitates “A”
B
A
Unisensory
B
Responds “A” No response “B” “B” no effect on “A” A
FIGURE 1.5 Different patterns of sensory convergence result in different forms of processing. In each panel, neuron (gray) receives inputs (black) from sensory modalities “A” and/or “B.” In bimodal condition (left), neuron receives multiple inputs from both modalities, such that it can be activated by stimulus “A” alone or by stimulus “B” alone. Furthermore, when both “A + B” are stimulated together, inputs converge on the same neuron and their responses integrate. In subthreshold condition (center), neuron still receives inputs from both modalities, but inputs from modality “B” are so reduced and occur at low-priority locations that stimulation of “B” alone fails to activate the neuron. However, when “B” is combined with “A,” activity is modulated (facilitation or suppression). In contrast, unisensory neurons (right) receive inputs from only a single modality “A” and stimulation of “B” has no effect alone or in combination with “A.”
of the samples (Meredith et al. 2006). These projections also showed a preference for supragranular termination, as illustrated in Figure 1.6. In another study (Clemo et al. 2008), several auditory corticocortical projections were demonstrated to terminate in the visual PLLS area, but only those projections from FAES were present within the entire extent of the PLLS corresponding with the distribution of subthreshold multisensory neurons (Allman and Meredith 2007). These projections from FAES to PLLS showed an overwhelming preference for termination in the supragranular (a) Boutons in RSS from: AI AAF AII
SIV
SV
PLLS
PAF
PMLS
FAES
AEV
(b) Boutons in PLLS from: A1 sAAF
PAF
FAES
(c) Boutons in FAES from: SIV RSS
FIGURE 1.6 Corticocortical projections to multisensory areas preferentially terminate in supragranular layers. In “A,” all panels represent coronal sections through RSS with layer IV approximated by dashed line. For each area injected (e.g., AI, SIV, AEV, etc.), each dot represents one labeled axon terminal (bouton). (Redrawn from Clemo, H.R. et al., J. Comp. Neurol., 503, 110–127, 2007; Clemo, H.R. et al., Exp. Brain Res., 191, 37–47, 2008; Meredith, M.A. et al., Exp. Brain Res., 172:472–484, 2006.)
Structural Basis of Multisensory Processing
9
layers (see Figure 1.6). Thus, it might seem that cross-modal projections that have supragranular terminations underlie a specific form of multisensory processing. However, in the auditory field of the rostral suprasylvian sulcus (which is part of the rostral suprasylvian sulcal cortex; Clemo et al. 2007), projections from somatosensory area SIV have a similar supragranular distribution, but both subthreshold and bimodal forms of multisensory neurons are present. Therefore, it is not conclusive that the supragranular projections and subthreshold multisensory processing correlate. It is clear, however, that cross-modal corticocortical projections are strongly characterized by supragranular patterns of termination.
1.3 DO ALL CROSS-MODAL PROJECTIONS GENERATE MULTISENSORY INTEGRATION? Some of the cross-modal projections illustrated in the previous section would be described as modest, at best, in their density of termination in the target region. In fact, it has been suggested that this comparative reduction in projection strength may be one feature of convergence that underlies subthreshold multisensory effects (Allman et al. 2009). Other reports of cortical crossmodal projections, specifically those between the auditory and visual cortex in monkeys (Falchier et al. 2002; Rockland and Ojima 2003), have also been characterized by the same sparseness of projection. Nevertheless, in these cases, it seems to be broadly accepted that such sparse projections would not only underlie overt auditory activity in the visual cortex, but would lead to multisensory integration there as well (Falchier et al. 2002). Data from unanesthetized, paralyzed animals have been cited in support of such interpretations (Murata et al. 1965; Bental et al. 1968; Spinelli et al. 1968; Morrell 1972; Fishman and Michael 1973), but it has been argued that these results are inconsistent with auditory sensory activity (Allman et al. 2008). Thus, although the functional effects of such sparse cross-modal projections are under dispute, the presence of these projections among the repertoire of corticocortical connections now seems well established. Therefore, a recent study (Allman et al. 2008) was initiated to examine the functional effects of a modest cross-modal projection from auditory to visual cortices in ferrets. Tracer injections centered on A1 of ferret cortex were shown to label terminal projections in the supragranular layers of visual area 21. However, single-unit recordings were unable to identify the result of that crossmodal convergence in area 21: no bimodal neurons were observed. Furthermore, tests to reveal subthreshold multisensory influences were also unsuccessful. Ultimately, only when local inhibition was pharmacologically blocked (via iontophoresis of bicuculine methiodide, the antagonist of gamma-aminobutyric acid-alpha (GABA-a) was there a statistically significant indication of cross-modal influence on visual processing. These results support the notion that multisensory convergence does lead to multisensory processing effects, but those effects may be subtle and manifest themselves in nontraditional forms (e.g., nonbimodal; Allman et al. 2008). In fact, this interpretation is consistent with the results of a recent study of the effects of auditory stimulation on visual processing in V1 of awake, behaving monkeys (Wang et al. 2008): no bimodal neurons were observed, but responses to visual–auditory stimuli were significantly shorter in latency when compared with those elicited by visual stimuli alone. From another perspective, these data provide additional support to the notion that multisensory convergence is not restricted to bimodal neurons. The well-known pattern of convergence under lying bimodal neurons has already been modified, as shown in Figure 1.4, to include subthreshold multisensory neurons whose functional behavior might be defined by an imbalance of inputs from the two different modalities. When considering the result of multisensory convergence in area 21, it is not much of a design modification to reduce those subthreshold inputs even further, such that they might be effective under specific contexts or conditions. Moreover, reducing the second set of inputs further toward zero essentially converts a multisensory circuit (albeit a weak one) into a unisensory circuit. Thus, it seems logical to propose that patterns of connectivity that produce multisensory properties span a continuum from, at one end, the profuse levels of inputs from different modalities
10
The Neural Bases of Multisensory Processes Bimodal
A
Responds “A” Responds “B” Integrates “A+B”
Subthreshold
B
A
Responds “A” No response “B” ‘B’ facilitates “A”
B
A
Responds “A” No response “B” ‘B’ facilitates “A”
Multisensory continuum
Unisensory
B
A
Responds “A” No response “B” ‘B’ no effect on “A” Unisensory
FIGURE 1.7 Patterns of sensory convergence (black; from modality “A” or “B”) onto individual neurons (gray) result in different forms of processing (similar to Figure 1.4). Synaptic arrangement depicted in middle panel is adjusted such that inputs from modality “B” are light (left center) or very sparse (right center), suggesting a slight difference of effect of modality “B” on responses elicited by “A.” In addition, because each of these effects result from simple yet systematic changes in synaptic arrangement, these patterns suggest that multisensory convergence occurs over a continuum of synaptic arrangements that, on one end, produces bimodal multisensory properties, whereas on the other, it underlies only unisensory processing.
that produce bimodal neurons to, at the other end, the complete lack of inputs from a second modality that defines unisensory neurons (see Figure 1.7).
1.4 SYNAPTIC ARCHITECTURE OF MULTISENSORY CONVERGENCE Implicit in the conclusions derived from the studies cited above is the notion that heavy cross-modal projections underlie bimodal multisensory processing at the target site, whereas modest projections subserve subthreshold multisensory processing. Although this general notion correlating projection strength with specific forms of multisensory effects awaits quantification, it is consistent with the overarching neurophysiological principle that different patterns of connectivity underlie different circuits and behaviors. Another basic feature of neuronal connectivity is the priority of the location at which synapses occur. It is well accepted that synapses located on a neuron’s soma are more likely to influence its spiking activity than synapses occurring out on the dendrites, or those on proximal dendrites will have a higher probability of affecting activity than those occurring at more distal sites. Therefore, the synaptic architecture of multisensory processing should also be considered when assessing the functional effects of cross-modal (and multisensory) projections. However, virtually nothing is known about the structure of multisensory convergence at the neuronal level. In fact, the only electron micrographic documentation of multisensory convergence comes not from the cortex, but from brainstem studies of somatosensory inputs to the dorsal cochlear nucleus (Shore et al. 2000). Although the significance of this observation of multisensory convergence at the first synapse in the auditory projection stream cannot be overstated, the technique of electron microscopy is poorly adapted for making comparisons of multiple synaptic contacts along the same neuron. Confocal laser microscopy, coupled with multiple-fluorescent labeling techniques, can visualize entire neurons as well as magnify areas of synaptic contact to submicron resolution (e.g., see Vinkenoog et al. 2005). This technique was used in a recent study of auditory FAES cross-modal
11
Structural Basis of Multisensory Processing
SIV
FAES 1 µm
1 µm 10 µm
FIGURE 1.8 (See color insert.) Confocal images of a somatosensory SIV neuron (red) contacted by boutons that originated in auditory FAES (green). A three-dimensional rendering of a trimmed confocal stack containing a calretinin-positive SIV neuron (red; scale bar, 10 μm) that was contacted by two axons (green) labeled from auditory area FAES. Each of the axo-dendritic points of contact are enlarged on the right (white arrows; scale bar, 1.0 μm) to reveal the putative bouton swelling. (From Keniston, L.P. et al., Exp. Brain Res., 202, 725–731, 2010. With permission.)
projections to somatosensory area SIV (Keniston et al. 2010). First, a tracer (fluoroemerald, linked to biotinylated dextran amine) was injected into the auditory FAES and allowed to transport to SIV. Next, because inhibitory interneurons represent only about 20% of cortical neurons, immunofluorescent tags of specific subclasses of interneurons would make them stand out against the neuropil. Therefore, immunocytochemical techniques were used to rhodamine-label SIV interneurons containing a calcium-binding protein (e.g., parvalbumin, calbindin, calretinin). Double- labeled tissue sections were examined by a laser-scanning confocal microscope (TCS SP2 AOBS, Leica Microsystems) and high-magnification image stacks were collected, imported into Volocity (Improvision, Lexington, Massachusetts), and deconvolved (AutoQuant, Media Cybernetics). A synaptic contact was defined as an axon swelling that showed no gap between it and the immunopositive neuron. Of the 33 immunopositive neurons identified, a total of 59 contacts were observed with axon terminals labeled from the FAES, two of which are illustrated in Figure 1.8. Sixty-four percent (21 of 33) of interneurons showed one or more contacts; the average was 2.81 (±1.4), with a maximum of 5 found on one neuron. Thus, the anatomical techniques used here visualized crossmodal convergence at the neuronal level as well as obtained some of the first insights into the synaptic architecture of multisensory connections.
1.5 SUMMARY AND CONCLUSIONS Historically, anatomical studies of multisensory processing focused primarily on the source of inputs to structures that showed responses to more than one sensory modality. However, because convergence is the defining step in multisensory processing, it would seem most important to understand
12
The Neural Bases of Multisensory Processes
how the terminations of those inputs generate multisensory effects. Furthermore, because multisensory processing is not restricted to only bimodal (or trimodal) neurons, the synaptic architecture of multisensory convergence may be revealed to be as distinct and varied as the perceptions and behaviors these multisensory circuits subserve.
ACKNOWLEDGMENTS This study was supported by NIH grant NS039460.
REFERENCES Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal subthreshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–549. Allman, B.L., R.E. Bittencourt-Navarrete, L.P. Keniston, A.E. Medina, M.Y. Wang, and M.A. Meredith. 2008. Do cross-modal projections always result in multisensory integration? Cerebral Cortex 18:2066–2076. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2009. Not just for bimodal neurons anymore: The contribution of unimodal neurons to cortical multisensory processing. Brain Topography 21:157–167. Behan, M., P.P. Appell, and M.J. Graper. 1988. Ultrastructural study of large efferent neurons in the superior colliculus of the cat after retrograde labeling with horseradish peroxidase. Journal of Comparative Neurology 270:171–184. Benevento, L.A., J.H. Fallon, B. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57:849–872. Bental, E., N. Dafny, and S. Feldman. 1968. Convergence of auditory and visual stimuli on single cells in the primary visual cortex of unanesthetized unrestrained cats. Experimental Neurology 20:341–351. Bowman, E.M., and C.R. Olson. 1988. Visual and auditory association areas of the cat’s posterior ectosylvian gyrus: Cortical afferents. Journal of Comparative Neurology 272:30–42. Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology 46:369–384. Burton, H., and E.M. Kopf. 1984. Ipsilateral cortical connections from the second and fourth somatic sensory areas in the cat. Journal of Comparative Neurology 225:527–553. Carriere, B.N., D.W. Royal, T.J. Perrault, S.P. Morrison, J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2007. Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology 98:2858–2867. Clemo, H.R., and M.A. Meredith. 2004. Cortico-cortical relations of cat somatosensory areas SIV and SV. Somatosensory & Motor Research 21:199–209. Clemo, H.R., and B.E. Stein. 1983. Organization of a fourth somatosensory area of cortex in cat. Journal of Neurophysiology 50:910–925. Clemo, H.R., B.L. Allman, Donlan M.A., and M.A. Meredith. 2007. Sensory and multisensory representations within the cat rostral suprasylvian cortex. Journal of Comparative Neurology 503:110–127. Clemo, H.R., G.K. Sharma, B.L. Allman, and M.A. Meredith. 2008. Auditory projections to extrastriate visual cortex: Connectional basis for multisensory processing in ‘unimodal’ visual neurons. Experimental Brain Research 191:37–47. Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multisensory convergence. Cerebral Cortex 14:387–403. Falchier, A., C. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience 22:5749–5759. Fishman, M.C., and P. Michael. 1973. Integration of auditory information in the cat’s visual cortex. Vision Research 13:1415–1419. Fuentes-Santamaria, V., J.C. Alvarado, B.E. Stein, and J.G. McHaffie. 2008. Cortex contacts both output neurons and nitrergic interneurons in the superior colliculus: Direct and indirect routes for multisensory integration. Cerebral Cortex 18:1640–1652. Harting, J.K., and D.P. Van Leishout. 1991. Spatial relationships of axons arising from the substantia nigra, spinal trigeminal nucleus, and the pedunculopontine tegmental nucleus within the intermediate gray of the cat superior colliculus. Journal of Comparative Neurology 305:543–558.
Structural Basis of Multisensory Processing
13
Harting, J.K., B.V. Updyke, and D.P. Van Lieshout. 1992. Corticotectal projections in the cat: Anterograde transport studies of twenty-five cortical areas. Journal of Comparative Neurology 324:379–414. Harting, J.K., S. Feig, and D.P. Van Lieshout. 1997. Cortical somatosensory and trigeminal inputs to the cat superior colliculus: Light and electron microscopic analyses. Journal of Comparative Neurology 388:313–326. Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology 60:1615–1637. Horn, G., and R.M. Hill. 1966. Responsiveness to sensory stimulation of units in the superior colliculus and subjacent tectotegmental regions of the rabbit. Experimental Neurology 14:199–223. Illing, R.-B., and A.M. Graybiel. 1986. Complementary and non-matching afferent compartments in the cat’s superior colliculus: Innervation of the acetylcholinesterase-poor domain of the intermediate gray layer. Neuroscience 18:373–394. Jiang, H., F. Lepore, M. Ptito, and J.P. Guillemot. 1994, Sensory interactions in the anterior ectosylvian cortex of cats. Experimental Brain Research 101:385–396. Keniston, L.P., S.C. Henderson, and M.A. Meredith. 2010. Neuroanatomical identification of crossmodal auditory inputs to interneurons in somatosensory cortex. Experimental Brain Research 202:725–731. Lee, C.C., and J.A. Winer. 2008. Connections of cat auditory cortex: III. Corticocortical system. Journal of Comparative Neurology 507:1920–1943. Meredith, M.A. 2004. Cortico-cortical connectivity and the architecture of cross-modal circuits. In Handbook of Multisensory Processes. C. Spence, G. Calvert, and B. Stein, eds. 343–355. Cambridge, MA: MIT Press. Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex. Neuro report 20:126–131. Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56:640–662. Meredith, M.A, L.P. Keniston, L.R. Dehner, and H.R. Clemo. 2006. Crossmodal projections from somatosensory area SIV to the auditory field of the anterior ectosylvian sulcus (FAES) in cat: Further evidence for subthreshold forms of multisensory processing. Experimental Brain Research 172:472–484. Monteiro, G., H.R. Clemo, and M.A. Meredith. 2003. Auditory cortical projections to the rostral suprasylvian sulcal cortex in the cat: Implications for its sensory and multisensory organization. NeuroReport 14: 2139–2145. Mori, A., T. Fuwa, A. Kawai et al. 1996. The ipsilateral and contralateral connections of the fifth somatosensory area (SV) in the cat cerebral cortex. Neuroreport 7:2385–2387. Morrell, F. 1972. Visual system’s view of acoustic space. Nature 238:44–46. Moschovakis, A.K., and A.B. Karabelas. 1985. Observations on the somatodendritic morphology and axonal trajectory of intracellularly HRP-labeled efferent neurons located in the deeper layers of the superior colliculus of the cat. Journal of Comparative Neurology 239:276–308. Mucke, L., M. Norita, G. Benedek, and O. Creutzfeldt. 1982. Physiologic and anatomic investigation of a visual cortical area situated in the ventral bank of the anterior ectosylvian sulcus of the cat. Experimental Brain Research 46:1–11. Murata, K., H. Cramer, and P. Bach-y-Rita. 1965. Neuronal convergence of noxious, acoustic, and visual stimuli in the visual cortex of the cat. Journal of Neurophysiology 28:1223–1239. Olson, C.R., and A.M. Graybiel. 1987. Ectosylvian visual area of the cat: Location, retinotopic organization, and connections. Journal of Comparative Neurology 261:277–294. Rauschecker, J.P., and M. Korte. 1993. Auditory compensation for early blindness in cat cerebral cortex. Journal of Neuroscience 13:4538–4548. Reinoso-Suarez, F., and J.M. Roda. 1985. Topographical organization of the cortical afferent connections to the cortex of the anterior ectosylvian sulcus in the cat. Experimental Brain Research 59:313–324. Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology 50:19–26. Saleem, K.S, W. Suzuki, K. Tanaka, and T. Hashikawa. 2000. Connections between anterior inferotemporal cortex and superior temporal sulcus regions in the macaque monkey. Journal of Neuroscience 20:5083–5101. Seltzer, B., and D.N. Pandya. 1980. Converging visual and somatic sensory input to the intraparietal sulcus of the rhesus monkey. Brain Research 192:339–351. Seltzer, B., and D.N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology 343:445–463.
14
The Neural Bases of Multisensory Processes
Shore, S.E., Z. Vass, N.L. Wys, and R.A. Altschuler. 2000. Trigeminal ganglion innervates the auditory brainstem. Journal of Comparative Neurology 419:271–285. Spinelli, D.N., A. Starr, and T.W. Barrett. 1968. Auditory specificity in unit recordings from cat’s visual cortex. Experimental Neurology 22:75–84. Stein, B.E., and M.A. Meredith. 1993. Merging of the Senses. Cambridge, MA: MIT Press. Toldi, J., O. Feher, and L. Feuer. 1984. Dynamic interactions of evoked potentials in a polysensory cortex of the cat. Neuroscience 13:945–952. Vinkenoog, M., M.C. van den Oever, H.B. Uylings, and F.G. Wouterlood. 2005. Random or selective neuroanatomical connectivity. Study of the distribution of fibers over two populations of identified interneurons in cerebral cortex. Brain Research. Brain Research Protocols 14:67–76. Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience 17:2429–2444. Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. The integration of multiple sensory inputs in cat cortex. Experimental Brain Research 91:484–488. Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences 101:2167–2172. Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo–auditory interactions in the primary visual cortex of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9:79. Yaka, R., N. Notkin, U. Yinon, and Z. Wollberg. 2002. Visual, auditory and bimodal activity in the banks of the lateral suprasylvian sulcus in the cat. Neuroscience and Behavioral Physiology 32:103–108.
2
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay Céline Cappe, Eric M. Rouiller, and Pascal Barone
CONTENTS 2.1 Introduction............................................................................................................................. 15 2.2 Cortical Areas in Multisensory Processes............................................................................... 15 2.2.1 Multisensory Association Cortices.............................................................................. 15 2.2.1.1 Superior Temporal Sulcus............................................................................. 16 2.2.1.2 Intraparietal Sulcus....................................................................................... 16 2.2.1.3 Frontal and Prefrontal Cortex....................................................................... 16 2.2.2 Low-Level Sensory Cortical Areas............................................................................. 17 2.2.2.1 Auditory and Visual Connections and Interactions...................................... 17 2.2.2.2 Auditory and Somatosensory Connections and Interactions........................ 19 2.2.2.3 Visual and Somatosensory Connections and Interactions............................ 19 2.2.2.4 Heteromodal Projections and Sensory Representation................................. 19 2.3 Thalamus in Multisensory Processes......................................................................................20 2.3.1 Thalamocortical and Corticothalamic Connections...................................................20 2.3.2 Role of Thalamus in Multisensory Integration............................................................ 21 2.4 Higher-Order, Lower-Order Cortical Areas and/or Thalamus?.............................................. 23 2.5 Conclusions..............................................................................................................................24 Acknowledgments.............................................................................................................................24 References.........................................................................................................................................24
2.1 INTRODUCTION Numerous studies in both monkey and human provided evidence for multisensory integration at high-level and low-level cortical areas. This chapter focuses on the anatomical pathways contributing to multisensory integration. We first describe the anatomical connections existing between different sensory cortical areas, briefly concerning the well-known connections between associative cortical areas and the more recently described connections targeting low-level sensory cortical areas. Then we focus on the description of the connections of the thalamus with different sensory and motor areas and their potential role in multisensory and sensorimotor integration. Finally, we discuss the several possibilities for the brain to integrate the environmental world with the different senses.
2.2 CORTICAL AREAS IN MULTISENSORY PROCESSES 2.2.1 Multisensory Association Cortices Parietal, temporal, and frontal cortical regions of primates have been reported to be polysensory cortical areas, i.e., related to more than a single sensory modality. We describe here several important 15
16
The Neural Bases of Multisensory Processes
features about these regions, focusing on the superior temporal sulcus (STS), the intraparietal sulcus, and the frontal cortex. 2.2.1.1 Superior Temporal Sulcus Desimone and Gross (1979) found neurons responsive to visual, auditory, and somatosensory stimuli in a temporal region of the STS referred to as superior temporal plane (STP) (see also Bruce et al. 1981; Baylis et al. 1987; Hikosaka et al. 1988). The rostral part of the STS (Bruce et al. 1981; Benevento et al. 1977) appears to contain more neurons with multisensory properties than the caudal part (Hikosaka et al. 1988). The connections of the STP include higher-order visual cortical areas as posterior parietal visual areas (Seltzer and Pandya 1994; Cusick et al. 1995) and temporal lobe visual areas (Kaas and Morel 1993), auditory cortical areas (Pandya and Seltzer 1982), and posterior parietal cortex (Seltzer and Pandya 1994; Lewis and Van Essen 2000). The STS region also has various connections with the prefrontal cortex (Cusick et al. 1995). In humans, numerous neuroimaging studies have shown multisensory convergence in the STS region (see Barraclough et al. 2005 for a review). Recently, studies have focused on the role of the polysensory areas of the STS and their interactions with the auditory cortex in processing primate communications (Ghazanfar 2009). The STS is probably one of the origins of visual inputs to the auditory cortex (Kayser and Logothetis 2009; Budinger and Scheich 2009; Cappe et al. 2009a; Smiley and Falchier 2009) and thus participates in the multisensory integration of conspecific face and vocalizations (Ghazanfar et al. 2008) that occurs in the auditory belt areas (Ghazanfar et al. 2005; Poremba et al. 2003). These findings support the hypothesis of general roles for the STS region in synthesizing perception of speech and general biological motion (Calvert 2001). 2.2.1.2 Intraparietal Sulcus The posterior parietal cortex contains a number of different areas including the lateral intraparietal (LIP) and ventral intraparietal (VIP) areas, located in the intraparietal sulcus. These areas seem to be functionally related and appear to encode the location of objects of interest (Colby and Goldberg 1999). These areas are thought to transform sensory information into signals related to the control of hand and eye movements via projections to the prefrontal, premotor, and visuomotor areas of the frontal lobe (Rizzolatti et al. 1997). Neurons of the LIP area present multisensory properties (Cohen et al. 2005; Russ et al. 2006; Gottlieb 2007). Similarly, neurons recorded in the VIP area exhibit typical multisensory responses (Duhamel et al. 1998; Bremmer et al. 2002; Schlack et al. 2005; Avillac et al. 2007). Anatomically, LIP and VIP are connected with cortical areas of different sensory modalities (Lewis and Van Essen 2000). In particular, VIP receives inputs from posterior parietal areas 5 and 7 and insular cortex in the region of S2, and few inputs from visual regions such as PO and MST (Lewis and Van Essen 2000). Although it is uncertain whether neurons in VIP are responsive to auditory stimuli, auditory inputs may originate from the dorsolateral auditory belt and parabelt (Hackett et al. 1998). The connectivity pattern of LIP (Andersen et al. 1990; Blatt et al. 1990; Lewis and Van Essen 2000) is consistent with neuronal responses related to eye position and visual inputs. Auditory and somatosensory influences appear to be very indirect and visuomotor functions dominate, as the connection pattern suggests. In particular, the ventral part of the LIP is connected with areas dealing with spatial information (Andersen et al. 1997) as well as with the frontal eye field (Schall et al. 1995), whereas the dorsal part of the LIP is connected with areas responsible for the processing of visual information related to the form of objects in the inferotemporal cortex (ventral “what” visual pathway). Both LIP and VIP neurons exhibit task-dependent responses (Linden et al. 1999; Gifford and Cohen 2004), although the strength of this dependence and its rules remain to be determined. 2.2.1.3 Frontal and Prefrontal Cortex The premotor cortex, located in the frontal lobe, contains neurons with responses to somatosensory, auditory, and visual signals, especially its ventral part as shown in monkeys (Fogassi et al. 1996;
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
17
Graziano et al. 1994, 1999). Somatosensory responses may be mediated by connections with somatosensory area S2 and parietal ventral (PV) somatosensory area (Disbrow et al. 2003) and with the posterior parietal cortex, such as areas 5, 7a, 7b, anterior intraparietal area (AIP), and VIP (see Kaas and Collins 2004). Visual inputs could also come from the posterior parietal region. The belt and parabelt auditory areas project to regions rostral to the premotor cortex (Hackett et al. 1999; Romanski et al. 1999) and may contribute to auditory activation, as well as connections from the trimodal portion of area 7b to the premotor cortex (Graziano et al. 1999). Anterior to the premotor cortex, the prefrontal cortex plays a key role in temporal integration and is related to evaluative and cognitive functions (Milner et al. 1985; Fuster 2001). Much of this cortex has long been considered to be multisensory (Bignall 1970) but some regions are characterized by some predominance in one sensory modality, such as an auditory domain in the ventral prefrontal region (Suzuki 1985; Romanski and Goldman-Rakic 2002; Romanski 2004). This region receives projections from auditory, visual, and multisensory cortical regions (e.g., Gaffan and Harrison 1991; Barbas 1986; Romanski et al. 1999; Fuster et al. 2000), which are mediated through different functional streams ending separately in the dorsal and ventral prefrontal regions (Barbas and Pandya 1987; Kaas and Hackett 2000; Romanski et al. 1999). This cortical input arising from different modalities confer to the prefrontal cortex a role in cross-modal association (see Petrides and Iversen 1976; Joseph and Barone 1987; Barone and Joseph 1989; Ettlinger and Wilson 1990) as well as in merging sensory information especially in processing conspecific auditory and visual communication stimuli (Romanski 2007; Cohen et al. 2007).
2.2.2 Low-Level Sensory Cortical Areas Several studies provide evidence that anatomical pathways between low-level sensory cortical areas may represent the anatomical support for early multisensory integration. We will detail these patterns of connections in this part according to sensory interactions. 2.2.2.1 Auditory and Visual Connections and Interactions Recently, the use of anterograde and retrograde tracers in the monkey brain made it possible to highlight direct projections from the primary auditory cortex (A1), the caudal auditory belt and parabelt, and the polysensory area of the temporal lobe (STP) to the periphery of the primary visual cortex (V1, area 17 of Brodmann) (Falchier et al. 2002), as well as from the associative auditory cortex to the primary and secondary visual areas (Rockland and Ojima 2003). These direct projections of the auditory cortex toward the primary visual areas would bring into play connections of the feedback type and may play a role in the “foveation” of a peripheral auditory sound source (Heffner and Heffner 1992). The reciprocity of these connections from visual areas to auditory areas was also tested in a recent study (Falchier et al. 2010) that revealed the existence of projections from visual areas V2 and prostriata to auditory areas, including the caudal medial and lateral belt area, the caudal parabelt area, and the temporoparietal area. Furthermore, in the marmoset, a projection from the high-level visual areas to the auditory cortex was also reported (Cappe and Barone 2005). More precisely, an area anterior to the STS (corresponding to the STP) sends connections toward the auditory core with a pattern of feedback connections. Thus, multiple sources can provide visual input to the auditory cortex in monkeys (see also Smiley and Falchier 2009; Cappe et al. 2009a). Direct connections between the primary visual and auditory areas have been found in rodents, such as in the gerbil (Budinger et al. 2006) or the prairie vole (Campi et al. 2010) as well as in carnivores. For example, the primary auditory cortex of the ferret receives a sparse projection from the visual areas including the primary visual cortex (Bizley et al. 2007). Similarly, in the adult cat, visual and auditory cortices are interconnected but the primary sensory fields are not the main areas involved. Only a minor projection is observed from A1 toward the visual areas A17/18 (Innocenti et al. 1988), the main component arising from the posterior auditory field (Hall and Lomber 2008). It
18
The Neural Bases of Multisensory Processes
is important to note that there is probably a tendency for a decrease in the density of these auditory– visual interconnections when going from rodents to carnivore to primates. This probably means a higher incidence of cross-modal responses in unisensory areas of the rodents (Wallace et al. 2004), whereas such responses are not present in the primary visual or auditory cortex of the monkey (Lakatos et al. 2007; Kayser et al. 2008; Wang et al. 2008). On the behavioral side, in experiments conducted in animals, multisensory integration dealt in most cases with spatial cues, for instance, the correspondence between the auditory space and the visual space. These experiments were mainly conducted in cats (Stein et al. 1989; Stein and Meredith 1993; Gingras et al. 2009). For example, Stein and collaborators (1989) trained cats to move toward visual or auditory targets with weak salience, resulting in poor performance that did not exceed 25% on average. When the same stimuli were presented in spatial and temporal congruence, the percentage of correct detections increased up to nearly 100%. In monkeys, only few experiments have been conducted on behavioral facilitation induced by multimodal stimulation (Frens and Van Opstal 1998; Bell et al. 2005). In line with human studies, simultaneous presentation in monkeys of a sound during a visually guided saccade induced a reduction of about 10% to 15% of saccade latency depending on the visual stimulus contrast level (Wang et al. 2008). Recently, we have shown behavioral evidence for multisensory facilitation between vision and hearing in macaque monkeys (Cappe et al. 2010). Monkeys were trained to perform a simple detection task to stimuli, which were auditory (noise), visual (flash), or auditory–visual (noise and flash) at different intensities. By varying the intensity of individual auditory and visual stimuli, we observed that, when the stimuli are of weak saliency, the multisensory condition had a significant facilitatory effect on reaction times, which disappeared at higher intensities (Cappe et al. 2010). We applied to the behavioral data the “race model” (Raab 1962) that supposes that the faster unimodal modality should be responsible for the shortening in reaction time (“the faster the winner”), which would correspond to a separate activation model (Miller 1982). It turns out that the multisensory benefit at low intensity derives from a coactivation mechanism (Miller 1982) that implies a convergence of hearing and vision to produce multisensory interactions and a reduction in reaction time. The anatomical studies previously described suggest that such a convergence may take place at the lower levels of cortical sensory processing. In humans, numerous behavioral studies, using a large panel of different paradigms and various types of stimuli, showed the benefits of auditory–visual combination stimuli compared to unisensory stimuli (see Calvert et al. 2004 for a review; Romei et al. 2007; Cappe et al. 2009b as recent examples). From a functional point of view, many studies have shown multisensory interactions early in time and in different sensory areas with neuroimaging and electrophysiological methods. Auditory– visual interactions have been revealed in the auditory cortex or visual cortex using electrophysiological or neuroimaging methods in cats and monkeys (Ghazanfar et al. 2005; Bizley et al. 2007; Bizley and King 2008; Cappe et al. 2007; Kayser et al. 2007, 2008; Lakatos et al. 2007; Wang et al. 2008). More specifically, electrophysiological studies in monkeys, revealing multisensory interactions in primary sensory areas such as V1 or A1, showed that cross-modal stimuli (i.e., auditory or visual stimuli, respectively) are rather modulatory on the non-“sensory-specific” response, and/ or acting on the oscillatory activity (Lakatos et al. 2007; Kayser et al. 2008) or on the latency of the neuronal responses (Wang et al. 2008). These mechanisms can enhance the speed of sensory processing and induce a reduction of the reaction times (RTs) during a multisensory stimulation. Neurons recorded in the primary visual cortex showed a significant reduction in visual response latencies, specifically in suboptimal conditions (Wang et al. 2008). It is important to mention that, in the primary sensory areas of the primate, authors have reported the absence of nonspecific sensory responses at the spiking level (Wang et al. 2008; Lakatos et al. 2007; Kayser et al. 2008). These kinds of interactions between hearing and vision were also reported in humans using neuroimaging techniques (Giard and Peronnet 1999; Molholm et al. 2002; Lovelace et al. 2003; Laurienti et al. 2004; Martuzzi et al. 2007).
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
19
2.2.2.2 Auditory and Somatosensory Connections and Interactions The advantage of being able to use a number of distinct tracers allows us to identify connections between several cortical areas. Indeed, we made injections of retrograde tracers into early visual (V2 and MT), somatosensory (1/3b), and auditory (core) cortical areas in marmosets (Cappe and Barone 2005) allowing us to exhibit connections between cortical areas considered as unisensory areas. Projections from visual areas, such as the STP, to the core auditory cortex have been found (Cappe and Barone 2005), as described in Section 2.2.2. Other corticocortical projections, and in particular from somatosensory to auditory cortex, were found, supporting the view that inputs from different modalities are sent to cortical areas that are classically considered to be unimodal (Cappe and Barone 2005). More precisely, our study revealed projections from somatosensory areas S2/ PV to the primary auditory cortex. Another study conducted in gerbils also showed connections between the primary somatosensory cortex and the primary auditory cortex (Budinger et al. 2006). In marmosets and macaques, projections from the retroinsular area of the somatosensory cortex to the caudiomedial belt auditory area were also reported (de la Mothe et al. 2006a; Smiley et al. 2007). Intracranial recordings in the auditory cortex of monkeys have shown the modulation of auditory responses by somatosensory stimuli, consistent with early multisensory convergence (Schroeder et al. 2001; Schroeder and Foxe 2002; Fu et al. 2003). These findings have been extended by a functional magnetic resonance imaging (fMRI) study in anesthetized monkeys, which showed auditory– somatosensory interactions in the caudal lateral belt area (Kayser et al. 2005). In humans, there have been previous demonstrations of a redundant signal effect between auditory and tactile stimuli (Murray et al. 2005; Zampini et al. 2007; Hecht et al. 2008). Functional evidence was mainly found with EEG and fMRI techniques (Foxe et al. 2000, 2002; Murray et al. 2005). In particular, Murray and collaborators (2005) reported in humans that neural responses showed an initial auditory–somatosensory interaction in auditory association areas. 2.2.2.3 Visual and Somatosensory Connections and Interactions Limited research has been focused on interactions between vision and touch. In our experiments, using multiple tracing methods in marmoset monkeys (Cappe and Barone 2005), we found direct projections from visual cortical areas to somatosensory cortical areas. More precisely, after an injection of retrograde tracer in the primary somatosensory cortex (areas 1/3b), we observed projections originating from visual areas (the ventral and dorsal fundus of the superior temporal area, and the middle temporal crescent). On a functional point of view, electrophysiological recordings in the somatosensory cortex of macaque monkeys showed modulations of responses by auditory and visual stimuli (Schroeder and Foxe 2002). Behavioral results in humans demonstrated gain in performance when visual and tactile stimuli were combined (Forster et al. 2002; Hecht et al. 2008). Evidence of functional interactions between vision and touch was observed with neuroimaging techniques in humans (Amedi et al. 2002, 2007; James et al. 2002). In particular, it has been shown that the perception of motion could activate the MT complex in humans (Hagen et al. 2002). It has also been demonstrated that the extrastriate visual cortex area 19 is activated during tactile perception (see Sathian and Zangaladze 2002 for review). 2.2.2.4 Heteromodal Projections and Sensory Representation In somatosensory (Krubitzer and Kaas 1990; Huffman and Krubitzer 2001) and visual systems (Kaas and Morel 1993; Schall et al. 1995; Galletti et al. 2001; Palmer and Rosa 2006), there is evidence for the existence of different connectivity patterns according to sensory representation, especially in terms of the density of connections between areas. This observation also applies to heteromodal connections. We found that the visual projections to areas 1/3b are restricted to the representation of certain body parts (Cappe and Barone 2005). Some visual projections selectively target the face (middle temporal crescent) or the arm (dorsal fundus of the superior temporal area)
20
The Neural Bases of Multisensory Processes
representations in areas 1/3b. Similarly, auditory and multimodal projections to area V1 are prominent toward the representation of the peripheral visual field (Falchier et al. 2002, 2010; Hall and Lomber 2008), and only scattered neurons in the auditory cortex send a projection to foveal V1. The fact that heteromodal connections are coupling specific sensory representations across modalities probably reflects an adaptive process for behavioral specialization. This is in agreement with human and monkey data showing that the neuronal network involved in multisensory integration, as well as its expression at the level of the neuronal activity, is highly dependent on the perceptual task in which the subject is engaged. In humans, the detection or discrimination of bimodal objects, as well as the perceptual expertise of subjects, differentially affect both the temporal aspects and the cortical areas at which multisensory interactions occur (Giard and Peronnet 1999; Fort et al. 2002). Similarly, we have shown that the visuo–auditory interactions observed at the level of V1 neurons are observed only in behavioral situations during which the monkey has to interact with the stimuli (Wang et al. 2008). Such an influence of the perceptual context on the neuronal expression of multisensory interaction is also present when analyzing the phenomena of cross-modal compensation after sensory deprivation in human. In blind subjects (Sadato et al. 1996), the efficiency of somatosensory stimulation on the activation of the visual cortex is at maximum during an active discrimination task (Braille reading). This suggests that the mechanisms of multisensory interaction, at early stages of sensory processing and the cross-modal compensatory mechanisms, are probably mediated through common neuronal pathways involving the heteromodal connections described previously.
2.3 THALAMUS IN MULTISENSORY PROCESSES 2.3.1 Thalamocortical and Corticothalamic Connections Although the cerebral cortex and the superior colliculus (Stein and Meredith 1993) have been shown to be key structures for multisensory interactions, the idea that the thalamus could play a relay role in multisensory processing has been frequently proposed (Ghazanfar and Schroeder 2006 for review; Hackett et al. 2007; Cappe et al. 2009c; see also Cappe et al. 2009a for review). By using anatomical multiple tracing methods in the macaque monkey, we were able to test this hypothesis recently and looked at the relationship and the distribution of the thalamocortical and the corticothalamic (CT) connections between different sensory and motor cortical areas and thalamic nuclei (Cappe et al. 2009c). In this study, we provided evidence for the convergence of different sensory modalities in the thalamus. Based on different injections in somatosensory [in the posterior parietal somatosensory cortex (PE/PEa in area 5)], auditory [in the rostral (RAC) and caudal auditory cortex (CAC)], and premotor cortical areas [dorsal and ventral premotor cortical areas (PMd and PMv)] in the same animal, we were able to assess how connections between the cortex and the different thalamic nuclei are organized. We demonstrated for the first time the existence of overlapping territories of thalamic projections to different sensory and motor areas. We focus our review on thalamic nuclei that are projecting into more than two areas of different attributes rather than on sensory-specific thalamocortical projections. Thalamocortical projections were found from the central lateral (CL) nucleus and the mediodorsal (MD) nucleus to RAC, CAC, PEa, PE, PMd, and PMv. Common territories of projection were observed from the nucleus LP to PMd, PMv, PEa, and PE. The ventroanterior nucleus (VA), known as a motor thalamic nucleus, sends projections to PE and to PEa. Interestingly, projections distinct from the ones arising from specific unimodal sensory nuclei were observed from auditory thalamic nuclei, such as projections from the medial geniculate nucleus to the parietal cortex (PE in particular) and the premotor cortex (PMd/PMv). Last but not least, the medial pulvinar nucleus (PuM) exhibits the most significant overlap across modalities, with projections from superimposed territories to all six cortical areas injected with tracers. Projections from PuM to the auditory cortex were also described by de la Mothe and colleagues (2006b). Hackett and collaborators (2007)
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
21
showed that somatosensory inputs may reach the auditory cortex (CM and CL) through connections coming from the medial part of the medial geniculate nucleus (MGm) or the multisensory nuclei [posterior, suprageniculate, limitans, and medial pulvinar (PuM)]. All these thalamocortical projections are consistent with the presence of thalamic territories possibly integrating different sensory modalities with motor attributes. We calculated the degree of overlap between thalamocortical and CT connections in the thalamus to determine the projections to areas of a same modality, as previously described (Tanné-Gariépy et al. 2002; Morel et al. 2005; Cappe et al. 2009c). The degree of overlap may range between 0% when two thalamic territories projecting to two distinct cortical areas are spatially completely segregated and 100% when the two thalamic territories fully overlap (considering a spatial resolution of 0.5 mm, further details in Cappe et al. 2009c). Thalamic nuclei with spatially intermixed thalamocortical cells projecting to auditory or premotor cortices were located mainly in the PuM, VA, and CL nuclei. The overlap between the projections to the auditory and parietal cortical areas concerned different thalamic nuclei such as PuM, CL, and to a lesser extent, LP and PuL. The projections to the premotor and posterior parietal cortex overlapped primarily in PuM, LP, MD, and also in VA, VLpd, and CL. Quantitatively, we found that projections from the thalamus to the auditory and motor cortical areas overlapped to an extent ranging from 4% to 12% through the rostral thalamus and increased up to 30% in the caudal part of the thalamus. In PuM, the degree of overlap between thalamocortical projections to auditory and premotor cortex ranged from 14% to 20%. PuM is the thalamic nucleus where the maximum of overlap between thalamocortical projections was found. Aside from the thalamocortical connections, CT connections were also investigated in the same study, concerning, in particular, the parietal areas PE and PEa injected with a tracer with anterograde properties (biotinylated dextran amine; Cappe et al. 2007). Indeed, areas PE and PEa send CT projections to the thalamic nuclei PuM, LP, and to a lesser extent, VPL, CM, CL, and MD (PEa only for MD). These thalamic nuclei contained both the small and giant CT endings. The existence of these two different types of CT endings reflect the possibility for CT connections to represent either feedback or feedforward projections (for review, see Rouiller and Welker 2000; Sherman and Guillery 2002, 2005; Sherman 2007). In contrast to the feedback CT projection originating from cortical layer VI, the feedforward CT projection originates from layer V and terminates in the thalamus in the form of giant endings, which can ensure highly secure and rapid synaptic transmission (Rouiller and Welker 2000). Considering the TC and CT projections, some thalamic nuclei (PuM, LP, VPL, CM, CL, and MD) could play a role in the integration of different sensory information with or without motor attributes (Cappe et al. 2007, 2009c). Moreover, parietal areas PE and PEa may send, via the giant endings, feedforward CT projection and transthalamic projections to remote cortical areas in the parietal, temporal, and frontal lobes contributing to polysensory and sensorimotor integration (Cappe et al. 2007, 2009c).
2.3.2 Role of Thalamus in Multisensory Integration The interconnections between the thalamus and the cortex described in the preceding section suggest that the thalamus could play the role of early sensory integrator. An additional role for the thalamus in multisensory interplay may derive from the organization of its CT and thalamocortical connections/loops as evoked in Section 2.3.1 (see also Crick and Koch 1998). Indeed, the thalamus could also have a relay role between different sensory and/or premotor cortical areas. In particular, the pulvinar, mainly its medial part, contains neurons which project to the auditory cortex, the somatosensory cortex, the visual cortex, and the premotor cortex (Romanski et al. 1997; Hackett et al. 1998; Gutierrez et al. 2000; Cappe et al. 2009c; see also Cappe et al. 2009a for a review). The feedforward CT projection originating from different sensory or motor cortical areas, combined with a subsequent TC projection, may allow a transfer of information between remote cortical areas through a “cortico–thalamo–cortical” route (see, e.g., Guillery 1995; Rouiller and Welker 2000; Sherman and Guillery 2002, 2005; Sherman 2007; Cappe et al. 2009c). As described in
22
The Neural Bases of Multisensory Processes
Section 2.3.1, the medial part of the pulvinar nucleus is the main candidate (although other thalamic nuclei such as LP, VPL, MD, or CL may also play a role) to represent an alternative to corticocortical loops by which information can be transferred between cortical areas belonging to different sensory and sensorimotor modalities (see also Shipp 2003). On a functional point of view, neurons in PuM respond to visual stimuli (Gattass et al. 1979) and auditory stimuli (Yirmiya and Hocherman 1987), which is consistent with our hypothesis. Another point is that, as our injections in the different sensory and motor areas included cortical layer I (Cappe et al. 2009c), it is likely that some of these projections providing multimodal information to the cortex originate from the so-called “matrix” calbindin-immunoreactive neurons distributed in all thalamic nuclei and projecting diffusely and relatively widely to the cortex (Jones 1998). Four different mechanisms of multisensory and sensorimotor interplay can be proposed based on the pattern of convergence and divergence of thalamocortical and CT connections (Cappe et al. 2009c). First, some restricted thalamic territories sending divergent projections to cortical areas afford different sensory and/or motor inputs which can be mixed simultaneously. Although such a multimodal integration in the temporal domain cannot be excluded (in case the inputs reach the cerebral cortex at the exact same time), it is less likely to provide massive multimodal interplay than an actual spatial convergence of projections. More convincingly, this pattern could support a temporal coincidence mechanism as a synchronizer between remote cortical areas, allowing a higher perceptual saliency of multimodal stimuli (Fries et al. 2001). Second, thalamic nuclei could be an integrator of multisensory information, rapidly relaying this integrated information to the cortex by their multiple thalamocortical connections. In PuM, considerable mixing of territories projecting to cortical areas belonging to several modalities is in line with previously reported connections with several cortical domains, including visual, auditory, somatosensory, and prefrontal and motor areas. Electrophysiological recordings showed visual and auditory responses in this thalamic nucleus (see Cappe et al. 2009c for an extensive description). According to our analysis, PuM, LP, MD, MGm, and MGd could play the role of integrator (Cappe et al. 2009c). Third, the spatial convergence of different sensory and motor inputs at the cortical level coming from thalamocortical connections of distinct thalamic territories suggests a fast multisensory interplay. In our experiments (Cappe et al. 2009c), the widespread distribution of thalamocortical inputs to the different cortical areas injected could imply that this mechanism of convergence plays an important role in multisensory and motor integration. By their cortical connection patterns, thalamic nuclei PuM and LP, for instance, could play this role for auditory–somatosensory interplay in area 5 (Cappe et al. 2009c). Fourth, the cortico–thalamo–cortical route can support rapid and secure transfer from area 5 (PE/PEa; Cappe et al. 2007) to the premotor cortex via the giant terminals of these CT connections (Guillery 1995; Rouiller and Welker 2000; Sherman and Guillery 2002, 2005; Sherman 2007). These giant CT endings, consistent with this principle of transthalamic loop, have been shown to be present in different thalamic nuclei (e.g., Schwartz et al. 1991; Rockland 1996; Darian-Smith et al. 1999; Rouiller et al. 1998, 2003; Taktakishvili et al. 2002; Rouiller and Durif 2004) and may well also apply to PuM, as demonstrated by the overlap between connections to the auditory cortex and to the premotor cortex, allowing an auditory–motor integration (Cappe et al. 2009c). Thus, recent anatomical findings at the thalamic level (Komura et al. 2005; de la Mothe 2006b; Hackett et al. 2007; Cappe et al. 2007, 2009c) may represent the anatomical support for multisensory behavioral phenomenon as well as multisensory integration at the functional level. Indeed, some nuclei in the thalamus, such as the medial pulvinar, receive either mixed sensory inputs or projections from different sensory cortical areas and project to sensory and premotor areas (Cappe et al. 2009c). Sensory modalities may thus already be fused at the thalamic level before being directly conveyed to the premotor cortex and consequently participating in the redundant signal effect expressed by faster reaction times in response to auditory–visual stimulation (Cappe et al. 2010).
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
23
2.4 HIGHER-ORDER, LOWER-ORDER CORTICAL AREAS AND/OR THALAMUS? When applying the race model to behavioral performance for multisensory tasks, results showed that this model cannot account for the shorter reaction times in auditory–visual conditions (see Cappe et al. 2010 for data in monkeys), a result that imposes a “coactivation” model and implies a convergence of the sensory channels (Miller 1982). The anatomical level at which the coactivation occurs is still under debate (Miller et al. 2001), as it has been suggested to occur early at the sensory level (Miller et al. 2001; Gondan et al. 2005) or late at the motor stage (Giray and Ulrich 1993). However, in humans, analysis of the relationships between behavioral and neuronal indices (Molholm et al. 2002; Sperdin et al. 2009; Jepma et al. 2009) seems to suggest that this convergence of the sensory channels occurs early in sensory processing, before the decision at motor levels (Mordkoff et al. 1996; Gondan et al. 2005), as shown in monkeys (Lamarre et al. 1983; Miller et al. 2001; Wang et al. 2008). Determining the links between anatomic, neurophysiologic, and behavioral indices of multisensory processes is necessary to understand the conditions under which a redundant signal effect is observable. The reality of direct connections from a cortical area considered as unisensory to another one of different modality is a paradox for hierarchical models of sensory processing (Maunsell and Van Essen 1983; Felleman and Van Essen 1991). The most recent findings provided evidence that multisensory interactions can occur shortly after response onset, at the lowest processing stages (see previous paragraphs). These new elements have to be considered and included in view of the sensory system organization. Obviously, it is possible that some connections mediating early-stage multisensory connections have not yet been identified by anatomical methods. Inside a sensory system, the hierarchy relationship between cortical areas have been defined by the nature of the connections in terms of feedforward or feedback although the role of these connections is only partially understood (Salin and Bullier 1995; Bullier 2006). Recent results suggest that multisensory convergence in unisensory areas can intervene with stages of information processing of low levels, through feedback and feedforward circuits (Schroeder et al. 2001; Schroeder and Foxe 2002; Fu et al. 2003; Cappe and Barone 2005). Accordingly, anatomical methods alone are not sufficient to definitely determine the functional distinction of any connections in terms of feedforward–feedback nature, and cannot be used to establish a hierarchy between functional areas of different systems. This review highlights that both higher-order association areas and lower-order cortical areas are multisensory in nature and that the thalamus could also play a role in multisensory processing. Figure 2.1 summarizes and represents schematically the possible scenarios for multisensory integration through anatomical pathways. First, as traditionally proposed, information is processed from the primary “unisensory” cortical areas to “multisensory” association cortical areas, and finally, the premotor and motor cortical areas in a hierarchical way (Figure 2.1a). In these multisensory association areas, the strength and the latencies of neuronal responses are affected by the nature of the stimuli (e.g., Avillac et al. 2007; Romanski 2007; Bizley et al. 2007). Second, recent evidence demonstrated the existence of multisensory interaction at the first level of cortical processing of the information (Figure 2.1b). Third, as we described in this review, the thalamus by its numerous connections could play a role in this processing (Figure 2.1c). Altogether, this model represents the different alternative pathways for multisensory integration. These multiple pathways, which coexist (Figure 2.1d), may be useful to allow different paths according to the task and/or to mediate information of different natures (see Wang et al. 2008 for recent evidence of the influence of a perceptual task on neuronal responses). Taken together, the data reviewed here provide evidence for anatomical pathways possibly involved in multisensory integration at low levels of information processing in the primate and argue against a strict hierarchical model. An alternative for multisensory integration appears to be the thalamus. Indeed, as demonstrated in this chapter, the thalamus, thanks to its multiple connections, appears to belong to a cortico–thalamo–cortical loop. This allows us to consider that it may have a key role in multisensory integration. Finally, higher order association cortical areas, lower order cortical areas,
24
The Neural Bases of Multisensory Processes
(a)
(b)
M
M H
H A (c)
V
S
A (d)
M
V
S
M
H A
V
H S
T
A
V T
S
A: Auditory cortex V: Visual cortex S: Somatosensory cortex M: Premotor and motor cortex H: Higher order multisensory regions T: « non-specific » thalamic nuclei: PuM, LP, VPL, CM, CL and MD as example for connections with auditory and somatosensory cortical areas; PuM as example for connections with A, V and S cortex
FIGURE 2.1 Hypothetical scenarios for multisensory and motor integration through anatomically identified pathways. (a) High-level cortical areas as a pathway for multisensory and motor integration. (b) Low-level cortical areas as a pathway for multisensory integration. (c) Thalamus as a pathway for multisensory and motor integration. (d) Combined cortical and thalamic connections as a pathway for multisensory and motor integration.
as well as the thalamus have now been shown to be part of multisensory integration. The question is now to determine how this system of multisensory integration is organized and how the different parts of the system communicate to allow a unified view of the perception of the world.
2.5 CONCLUSIONS Obviously, we are just beginning to understand the complexity of interactions in the sensory systems and between the sensory and the motor systems. More work is needed in both the neural and perceptual domains. At the neural level, additional studies are needed to understand the extent and hierarchical organization of multisensory interactions. At the perceptual level, further experiments should explore the conditions necessary for cross-modal binding and plasticity, and investigate the nature of the information transfer between sensory systems. Such studies will form the basis for a new comprehension of how the different sensory and/or motor systems function together.
ACKNOWLEDGMENTS This study was supported by the following grants: the CNRS ATIP program (to P.B.), the Swiss National Science Foundation, grants 31-61857.00 (to E.M.R.) and 310000-110005 (to E.M.R.), the Swiss National Science Foundation Center of Competence in Research on “Neural Plasticity and Repair” (to E.M.R.).
REFERENCES Amedi, A., G. Jacobson, T. Hendler, R. Malach, and E. Zohary. 2002. Convergence of visual and tactile shape processing in the human lateral occipital complex. Cerebral Cortex 12:1202–12. Amedi, A., W.M., Stern, J.A. Camprodon et al. 2007. Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex. Nature Neuroscience 10:687–9.
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
25
Andersen, R.A., C., Asanuma, G. Essick, and R.M. Siegel. 1990. Corticocortical connections of anatomically and physiologically defined subdivisions within the inferior parietal lobule. Journal of Comparative Neurology 296:65–113. Andersen, R.A., L.H. Snyder, D.C. Bradley, and J. Xing. 1997. Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience 20: 303–30. Avillac, M., S. Ben Hamed, and J.R. Duhamel. 2007. Multisensory integration in the ventral intraparietal area of the macaque monkey. Journal of Neuroscience 27:1922–32. Barbas, H. 1986. Pattern in the laminar origin of corticocortical connections. Journal of Comparative Neurology 252:415–22. Barbas, H., and D.N. Pandya. 1987. Architecture and frontal cortical connections of the premotor cortex (area 6) in the rhesus monkey. Journal of Comparative Neurology 256:211–28. Barone, P., and J.P. Joseph. 1989. Role of the dorsolateral prefrontal cortex in organizing visually guided behavior. Brain, Behavior and Evolution 33:132–5. Barraclough, N.E., D. Xiao, C.I., Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience 17:377–91. Baylis, G.C., E.T. Rolls, and C.M. Leonard. 1987. Functional subdivisions of the temporal lobe neocortex. Journal of Neuroscience 7:330–42. Bell, A.H., M.A. Meredith, A.J. Van Opstal, and D.P. Munoz. 2005. Crossmodal integration in the primate superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of Neurophysiology 93:3659–73. Benevento, L.A., J., Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57:849–72. Bignall, K.E. 1970. Auditory input to frontal polysensory cortex of the squirrel monkey: Possible pathways. Brain Research 19:77–86. Bizley, J.K., and A.J. King. 2008. Visual–auditory spatial processing in auditory cortical neurons. Brain Research 1242:24–36. Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89. Blatt, G.J., R.A. Andersen, and G.R. Stoner. 1990. Visual receptive field organization and cortico-cortical connections of the lateral intraparietal area (area LIP) in the macaque. Journal of Comparative Neurology 299:421–45. Bremmer, F., F. Klam, J.R. Duhamel, S. Ben Hamed, and W. Graf. 2002. Visual-vestibular interactive responses in the macaque ventral intraparietal area (VIP). European Journal of Neuroscience 16:1569–86. Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology 46:369–84. Budinger, E., and H. Scheich. 2009. Anatomical connections suitable for the direct processing of neuronal information of different modalities via the rodent primary auditory cortex (review). Hearing Research 258:16–27. Budinger, E., P. Heil, A. Hess, and H. Scheich. 2006. Multisensory processing via early cortical stages: Connec tions of the primary auditory cortical field with other sensory systems. Neuroscience 143:1065–83. Bullier, J. 2006. What is feed back? In 23 Problems in Systems Neuroscience, ed. J.L. van Hemmen and T.J. Sejnowski, 103–132. New York: Oxford University Press. Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies (review). Cerebral Cortex 11:1110–23. Calvert, G., C. Spence, and B.E. Stein, eds. 2004. The Handbook of Multisensory Processes. Cambridge, MA: MIT Press. Campi, K.L., K.L. Bales, R. Grunewald, and L. Krubitzer. 2010. Connections of auditory and visual cortex in the prairie vole (Microtus ochrogaster): Evidence for multisensory processing in primary sensory areas. Cerebral Cortex 20:89–108. Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience 22:2886–902. Cappe, C., A. Morel, and E.M. Rouiller. 2007. Thalamocortical and the dual pattern of corticothalamic projections of the posterior parietal cortex in macaque monkeys. Neuroscience 146:1371–87. Cappe, C., E.M. Rouiller, and P. Barone. 2009a. Multisensory anatomic pathway (review). Hearing Research 258:28–36.
26
The Neural Bases of Multisensory Processes
Cappe, C., G. Thut, V. Romei, and M.M. Murray. 2009b. Selective integration of auditory-visual looming cues by humans. Neuropsychologia 47:1045–52. Cappe, C., A. Morel, P. Barone, and E.M. Rouiller. 2009c. The thalamocortical projection systems in primate: An anatomical support for multisensory and sensorimotor integrations. Cerebral Cortex 19:2025–37. Cappe, C., M.M. Murray, P. Barone, and E.M. Rouiller. 2010. Multisensory facilitation of behavior in monkeys: Effects of stimulus intensity. Journal of Cognitive Neuroscience 22:2850–63. Cohen, Y.E., B.E. Russ, and G.W. Gifford 3rd. 2005. Auditory processing in the posterior parietal cortex (review). Behavioral and Cognitive Neuroscience Reviews 4:218–31. Cohen, Y.E., F. Theunissen, B.E. Russ, and P. Gill. 2007. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology 97:1470–84. Colby, C.L., and M.E. Goldberg. 1999. Space and attention in parietal cortex (review). Annual Review of Neuroscience 22:319–49. Crick, F., and C. Koch. 1998. Constraints on cortical and thalamic projections: The no-strong-loops hypothesis. Nature 391:245–50. Cusick, C.G., B. Seltzer, M. Cola, and E. Griggs. 1995. Chemoarchitectonics and corticocortical terminations within the superior temporal sulcus of the rhesus monkey: Evidence for subdivisions of superior temporal polysensory cortex. Journal of Comparative Neurology 360:513–35. Darian-Smith, C., A. Tan, and S. Edwards. 1999. Comparing thalamocortical and corticothalamic microstructure and spatial reciprocity in the macaque ventral posterolateral nucleus (VPLc) and medial pulvinar. Journal of Comparative Neurology 410:211–34. de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006a. Cortical connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:27–71. de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006b. Thalamic connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:72–96. Desimone, R., and C.G. Gross. 1979. Visual areas in the temporal cortex of the macaque. Brain Research 178:363–80. Disbrow, E., E. Litinas, G.H. Recanzo, J. Padberg, and L. Krubitzer. 2003. Cortical connections of the second somatosensory area and the parietal ventral area in macaque monkeys. Journal of Comparative Neurology 462:382–99. Duhamel, J.R., C.L. Colby, and M.E. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent visual and somatic response properties. Journal of Neurophysiology 79:126–36. Ettlinger, G., and W.A. Wilson. 1990. Cross-modal performance: Behavioural processes, phylogenetic considerations and neural mechanisms (review). Behavioural Brain Research 40:169–92. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience 22:5749–59. Falchier, A., C.E. Schroeder, T.A. Hackett et al. 2010. Low level intersensory connectivity as a fundamental feature of neocortex. Cerebral Cortex 20:1529–38. Felleman, D.J., and D.C. Van Essen. 1991. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex 1:1–47. Fogassi, L., V. Gallese, L. Fadiga, G. Luppino, M. Matelli, and G. Rizzolatti. 1996. Coding of peripersonal space in inferior premotor cortex (area F4). Journal of Neurophysiology 76:141–57. Fort, A., C. Delpuech, J. Pernier, and M.H. Giard. 2002. Dynamics of corticosubcortical cross-modal operations involved in audio-visual object detection in humans. Cerebral Cortex 12:1031–39. Forster, B., C. Cavina-Pratesi, S.M. Aglioti, and G. Berlucchi. 2002. Redundant target effect and intersensory facilitation from visual-tactile interactions in simple reaction time. Experimental Brain Research 143:480–487. Foxe, J.J., I.A. Morocz, M.M. Murray, B.A. Higgins, D.C. Javitt, and C.E. Schroeder. 2000. Multisensory auditory–somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Brain Research. Cognitive Brain Research 10:77–83. Foxe, J.J., G.R. Wylie, A. Martinez et al. 2002. Auditory–somatosensory multisensory processing in auditory association cortex: An fMRI study. Journal of Neurophysiology 88:540–3. Frens, M.A., and A.J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in monkey superior colliculus. Brain Research Bulletin 46:211–24. Fries, P., S. Neuenschwander, A.K. Engel, R. Goebel, and W. Singer. 2001. Rapid feature selective neuronal synchronization through correlated latency shifting. Nature Neuroscience 4:194–200. Fu, K.M., T.A. Johnston, A.S. Shah et al. 2003. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience 23:7510–5. Fuster, J.M. 2001. The prefrontal cortex—an update: Time is of the essence (review). Neuron 30:319–33.
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
27
Fuster, J.M., M. Bodner, and J.K. Kroger. 2000. Cross-modal and cross-temporal association in neurons of frontal cortex. Nature 405:347–51. Gaffan, D., and S. Harrison. 1991. Auditory–visual associations, hemispheric specialization and temporal– frontal interaction in the rhesus monkey. Brain 114:2133–44. Galletti, C., M. Gamberini, D.F. Kutz, P. Fattori, G. Luppino, M. Matelli. 2001. The cortical connections of area V6: An occipito-parietal network processing visual information. European Journal of Neuroscience 13:1572–88. Gattass, R., E. Oswaldo-Cruz, and A.P. Sousa. 1979. Visual receptive fields of units in the pulvinar of cebus monkey. Brain Research 160:413–30. Ghazanfar, A.A. 2009. The multisensory roles for auditory cortex in primate vocal communication (review). Hearing Research 258:113–20. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? (review). Trends in Cognitive Sciences 10:278–85. Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12. Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience 28:4457–69. Giard, M.H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–90. Gifford 3rd, G.W., and Y.E. Cohen. 2004. Effect of a central fixation light on auditory spatial responses in area LIP. Journal of Neurophysiology 91:2929–33. Gingras, G., B.A. Rowland, and B.E. Stein. 2009. The differing impact of multisensory and unisensory integration on behavior. Journal of Neuroscience 29:4897–902. Giray, M., and R. Ulrich. 1993. Motor coactivation revealed by response force in divided and focused attention. Journal of Experimental Psychology. Human Perception and Performance 19:1278–91. Gondan, M., B. Niederhaus, F. Rösler, and B. Röder. 2005. Multisensory processing in the redundant-target effect: A behavioral and event-related potential study. Perception & Psychophysics 67:713–26. Gottlieb, J. 2007. From thought to action: The parietal cortex as a bridge between perception, action, and cognition (review). Neuron 53:9–16. Graziano, M.S., G.S. Yap, and C.G. Gross. 1994. Coding of visual space by premotor neurons. Science 266:1054–7. Graziano, M.S., L.A. Reiss, and C.G. Gross. 1999. A neuronal representation of the location of nearby sounds. Nature 397:428–30. Guillery, R.W. 1995. Anatomical evidence concerning the role of the thalamus in corticocortical communication: A brief review. Journal of Anatomy 187:583–92. Gutierrez, C., M.G. Cola, B. Seltzer, and C. Cusick. 2000. Neurochemical and connectional organization of the dorsal pulvinar complex in monkeys. Journal of Comparative Neurology 419:61–86. Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1998. Thalamocortical connections of the parabelt auditory cortex in macaque monkeys. Journal of Comparative Neurology 400:271–86. Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain Research 817:45–58. Hackett, T.A., L.A. de La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C.E. Schroeder. 2007. Multisensory convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology 502:924–52. Hagen, M.C., O. Franzén, F. McGlone, G. Essick, C. Dancer, and J.V. Pardo. 2002. Tactile motion activates the human middle temporal/V5 (MT/V5) complex. European Journal of Neuroscience 16:957–64. Hall, A.J., and S.G. Lomber. 2008. Auditory cortex projections target the peripheral field representation of primary visual cortex. Experimental Brain Research 190:413–30. Hecht, D., M. Reiner, and A. Karni. 2008. Enhancement of response times to bi- and tri-modal sensory stimuli during active movements. Experimental Brain Research 185:655–65. Heffner, R.S., and H.E. Heffner. 1992. Visual factors in sound localization in mammals. Journal of Comparative Neurology 317:219–32. Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology 60:1615–37. Huffman, K.J., and L. Krubitzer. 2001. Area 3a: topographic organization and cortical connections in marmoset monkeys. Cerebral Cortex 11:849–67.
28
The Neural Bases of Multisensory Processes
Innocenti, G.M., P. Berbel, and S. Clarke. 1988. Development of projections from auditory to visual areas in the cat. Journal of Comparative Neurology 272:242–59. James, T.W., G.K. Humphrey, J.S. Gati, P. Servos, R.S. Menon, and M.A. Goodale. 2002. Haptic study of threedimensional objects activates extrastriate visual areas. Neuropsychologia 40:1706–14. Jepma, M., E.J. Wagenmakers, G.P. Band, and S. Nieuwenhuis. 2009. The effects of accessory stimuli on information processing: Evidence from electrophysiology and a diffusion model analysis. Journal of Cognitive Neuroscience 21:847–64. Jones, E.G. 1998. Viewpoint: The core and matrix of thalamic organization. Neuroscience 85:331–45. Joseph, J.P., and P. Barone. 1987. Prefrontal unit activity during a delayed oculomotor task in the monkey. Experimental Brain Research 67:460–8. Kaas, J.H., and C.E. Collins. 2001. Evolving ideas of brain evolution. Nature 411:141–2. Kaas, J., and C.E. Collins. 2004. The resurrection of multisensory cortex in primates: connection patterns that integrates modalities. In The Handbook of Multisensory Processes, ed. G. Calvert, C. Spence, and B.E. Stein, 285–93. Cambridge, MA: MIT Press. Kaas, J.H., and T.A. Hackett. 2000. Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences of the United States of America 97:11793–9. Kaas, J.H., and A. Morel. 1993. Connections of visual areas of the upper temporal lobe of owl monkeys: The MT crescent and dorsal and ventral subdivisions of FST. Journal of Neuroscience 13:534–46. Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices and their role in sensory integration (review). Frontiers in Integrative Neuroscience 3:7. doi: 10.3389/ neuro.07.007.2009. Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48:373–84. Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation of specific fields in auditory cortex. Journal of Neuroscience 27:1824–35. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–74. Komura, Y., R. Tamura, T. Uwano, H. Nishijo, and T. Ono. 2005. Auditory thalamus integrates visual inputs into behavioral gains. Nature Neuroscience 8:1203–9. Krubitzer, L.A., and J.H. Kaas. 1990. The organization and connections of somatosensory cortex in marmosets. Journal of Neuroscience 10:952–74. Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–92. Lamarre, Y., L. Busby, and G. Spidalieri. 1983. Fast ballistic arm movements triggered by visual, auditory, and somesthetic stimuli in the monkey: I. Activity of precentral cortical neurons. Journal of Neurophysiology 50:1343–58. Laurienti, P.J., R.A. Kraft, J.A. Maldjian, J.H. Burdette, and M.T. Wallace. 2004. Semantic congruence is a critical factor in multisensory behavioral performance. Experimental Brain Research 158:405–14. Lewis, J.W., and D.C. Van Essen. 2000. Corticocortical connections of visual, sensorimotor, and multimodal processing areas in the parietal lobe of the macaque monkey. Journal of Comparative Neurology 428:112–37. Linden, J.F., A. Grunewald, and R.A. Andersen. 1999. Responses to auditory stimuli in macaque lateral intraparietal area: II. Behavioral modulation. Journal of Neurophysiology 82:343–58. Lovelace, C.T., B.E. Stein, and M.T. Wallace. 2003. An irrelevant light enhances auditory detection in humans: A psychophysical analysis of multisensory integration in stimulus detection. Brain Research. Cognitive Brain Research 17:447–53. Martuzzi, R., M.M. Murray, C.M. Michel et al. 2007. Multisensory interactions within human primary cortices revealed by BOLD dynamics. Cerebral Cortex 17:1672–9. Maunsell, J.H., and D.C. Van Essen. 1983. The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey. Journal of Neuroscience 3:2563–86. Miller, J. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology 14:247–79. Miller, J., R. Ulrich, and Y. Lamarre. 2001. Locus of the redundant-signals effect in bimodal divided attention: A neurophysiological analysis. Perception & Psychophysics 63:555–62. Milner, B., M. Petrides, and M.L. Smith. 1985. Frontal lobes and the temporal organization of memory. Human Neurobiology 4:137–42. Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory– visual interactions during early sensory processing in humans: A high-density electrical mapping study. Brain Research. Cognitive Brain Research 14:115–28.
Cortical and Thalamic Pathways for Multisensory and Sensorimotor Interplay
29
Mordkoff, J.T., J. Miller, and A.C. Roch. 1996. Absence of coactivation in the motor component: Evidence from psychophysiological measures of target detection. Journal of Experimental Psychology. Human Perception and Performance 22:25–41. Morel, A., J. Liu, T. Wannier, D. Jeanmonod, and E.M. Rouiller. 2005. Divergence and convergence of thalamocortical projections to premotor and supplementary motor cortex: A multiple tracing study in macaque monkey. European Journal of Neuroscience 21:1007–29. Murray, M.M., S. Molholm, C.M. Michel et al. 2005. Grabbing your ear: Rapid auditory–somatosensory multi sensory interactions in low-level sensory cortices are not constrained by stimulus alignment. Cerebral Cortex 15:963–74. Palmer, S.M., and M.G. Rosa. 2006. A distinct anatomical network of cortical areas for analysis of motion in far peripheral vision. European Journal of Neuroscience 24:2389–405. Pandya, D.N., and B. Seltzer. 1982. Intrinsic connections and architectonics of posterior parietal cortex in the rhesus monkey. Journal of Comparative Neurology 204:196–210. Petrides, M., and S.D. Iversen. 1976. Cross-modal matching and the primate frontal cortex. Science 192:1023–4. Poremba, A., R.C. Saunders, A.M. Crane, M. Cook, L. Sokoloff, and M. Mishkin. 2003. Functional mapping of the primate auditory system. Science 299:568–72. Raab, D.H. 1962. Statistical facilitation of simple reaction times. Transactions of the New York Academy of Sciences 24:574–90. Rizzolatti, G., L. Fogassi, and V. Gallese. 1997. Parietal cortex: From sight to action (review). Current Opinion in Neurobiology 7:562–7. Rockland, K.S. 1996. Two types of corticopulvinar terminations: Round (type 2) and elongate (type 1). Journal of Comparative Neurology 368:57–87. Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology 50:19–26. Romanski, L.M. 2004. Domain specificity in the primate prefrontal cortex (review). Cognitive, Affective & Behavioral Neuroscience 4:421–9. Romanski, L.M. 2007. Representation and integration of auditory and visual stimuli in the primate ventral lateral prefrontal cortex. Cerebral Cortex 17 Suppl. no. 1, i61–9. Romanski, L.M., M. Giguere, J.F. Bates, and P.S. Goldman-Rakic. 1997. Topographic organization of medial pulvinar connections with the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 379:313–32. Romanski, L.M., J.F. Bates, and P.S. Goldman-Rakic. 1999. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 403:141–57. Romanski, L.M., and P.S. Goldman-Rakic. 2002. An auditory domain in primate prefrontal cortex. Nature Neuroscience 5:15–6. Romei, V., M.M. Murray, L.B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions. Journal of Neuroscience 27:11465–72. Rouiller, E.M., and C. Durif. 2004. The dual pattern of corticothalamic projection of the primary auditory cortex in macaque monkey. Neuroscience Letters 358:49–52. Rouiller, E.M., J. Tanné, V. Moret, I. Kermadi, D. Boussaoud, and E. Welker. 1998. Dual morphology and topography of the corticothalamic terminals originating from the primary, supplementary motor, and dorsal premotor cortical areas in macaque monkeys. Journal of Comparative Neurology 396:169–85. Rouiller, E.M., and E. Welker. 2000. A comparative analysis of the morphology of corticothalamic projections in mammals. Brain Research Bulletin 53:727–41. Rouiller, E.M., T. Wannier, and A. Morel. 2003. The dual pattern of corticothalamic projection of the premotor cortex in macaque monkeys. Thalamus & Related Systems 2:189–97. Russ, B.E., A.M. Kim, K.L. Abrahamsen, R. Kiringoda, and Y.E. Cohen. 2006. Responses of neurons in the lateral intraparietal area to central visual cues. Experimental Brain Research 174:712–27. Sadato, N., A. Pascual-Leone, J. Grafman et al. 1996. Activation of the primary visual cortex by Braille reading in blind subjects. Nature 380:526–8. Salin, P.A., and J. Bullier. 1995. Corticocortical connections in the visual system: Structure and function. Physiological Reviews 75:107–54. Sathian, K., and A. Zangaladze. 2002. Feeling with the mind’s eye: Contribution of visual cortex to tactile perception (review). Behavioural Brain Research 135:127–32. Schall, J.D., A. Morel, D.J. King, and J. Bullier. 1995. Topography of visual cortex connections with frontal eye field in macaque: Convergence and segregation of processing streams. Journal of Neuroscience 15:4464–87.
30
The Neural Bases of Multisensory Processes
Schlack, A., S.J. Sterbing-D’Angelo, K. Hartung, K.P. Hoffmann, and F. Bremmer. 2005. Multisensory space representations in the macaque ventral intraparietal area. Journal of Neuroscience 25:4616–25. Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Cognitive Brain Research 14:187–98. Schroeder, C.E., R.W. Lindsley, C. Specht, A. Marcovici, J.F. Smiley, and D.C. Javitt. 2001. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85:1322–7. Schwartz, M.L., J.J. Dekker, and P.S. Goldman-Rakic. 1991. Dual mode of corticothalamic synaptic termination in the mediodorsal nucleus of the rhesus monkey. Journal of Comparative Neurology 309:289–304. Seltzer, B., and D.N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology 343:445–63. Sherman, S.M. 2007. The thalamus is more than just a relay. Current Opinion in Neurobiology 17:417–22. Sherman, S.M., and R.W. Guillery. 2002. The role of the thalamus in the flow of information to the cortex. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 357:1695–708. Sherman, S.M., and R.W. Guillery. 2005. Exploring the Thalamus and Its Role in Cortical Function. Cambridge: MIT Press. Shipp, S. 2003. The functional logic of cortico-pulvinar connections. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 358:1605–24. Smiley, J.F., T.A. Hackett, I. Ulbert, G. Karmas, P. Lakatos, D.C. Javitt, and C.E. Schroeder. 2007. Multisensory convergence in auditory cortex, I. Cortical connections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology 502:894–923. Smiley, J.F., and A. Falchier. 2009. Multisensory connections of monkey auditory cerebral cortex. Hearing Research 258:37–46. Sperdin, H., C. Cappe, J.J. Foxe, and M.M. Murray. 2009. Early, low-level auditory–somatosensory multisensory interactions impact reaction time speed. Frontiers in Integrative Neuroscience 3:2. doi:10.3389/ neuro.07.002.2009. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., M.A. Meredith, W.S. Huneycutt, and L. Mcdade. 1989. Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience 1:12–24. Suzuki, H. 1985. Distribution and organization of visual and auditory neurons in the monkey prefrontal cortex. Vision Research 25:465–9. Tanné-Gariépy, J., E.M. Rouiller, and D. Boussaoud. 2002. Parietal inputs to dorsal versus ventral premotor areas in the macaque monkey: Evidence for largely segregated visuomotor pathways. Experimental Brain Research 145:91–103. Taktakishvili, O., E. Sivan-Loukianova, K. Kultas-Ilinsky, and I.A. Ilinsky. 2002. Posterior parietal cortex projections to the ventral lateral and some association thalamic nuclei in Macaca mulatta. Brain Research Bulletin 59:135–50. Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences of the United States of America 101:2167–72. Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo–auditory interactions in the primary visual cortex of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9:79. Yirmiya, R., and S. Hocherman. 1987. Auditory- and movement-related neural activity interact in the pulvinar of the behaving rhesus monkey. Brain Research 402:93–102. Zampini, M., D. Torresan, C. Spence, and M.M. Murray. 2007. Auditory–somatosensory multisensory interactions in front and rear space. Neuropsychologia 45:1869–77.
3
What Can Multisensory Processing Tell Us about the Functional Organization of Auditory Cortex? Jennifer K. Bizley and Andrew J. King
CONTENTS 3.1 Introduction............................................................................................................................. 31 3.2 Functional Specialization within Auditory Cortex?................................................................ 32 3.3 Ferret Auditory Cortex: A Model for Multisensory Processing.............................................. 33 3.3.1 Organization of Ferret Auditory Cortex...................................................................... 33 3.3.2 Surrounding Cortical Fields........................................................................................ 35 3.3.3 Sensitivity to Complex Sounds.................................................................................... 36 3.3.4 Visual Sensitivity in Auditory Cortex......................................................................... 36 3.3.5 Visual Inputs Enhance Processing in Auditory Cortex............................................... 39 3.4 Where Do Visual Inputs to Auditory Cortex Come From?.....................................................40 3.5 What Are the Perceptual Consequences of Multisensory Integration in the Auditory Cortex?..................................................................................................................................... 41 3.5.1 Combining Auditory and Visual Spatial Representations in the Brain....................... 42 3.5.2 A Role for Auditory Cortex in Spatial Recalibration?................................................. 43 3.6 Concluding Remarks...............................................................................................................44 References.........................................................................................................................................44
3.1 INTRODUCTION The traditional view of sensory processing is that the pooling and integration of information across different modalities takes place in specific areas of the brain only after extensive processing within modality-specific subcortical and cortical regions. This seems like a logical arrangement because our various senses are responsible for transducing different forms of energy into neural activity and give rise to quite distinct perceptions. To a large extent, each of the sensory systems can operate independently. We can, after all, understand someone speaking by telephone or read a book perfectly well without recourse to cues provided by other modalities. It is now clear, however, that multisensory convergence is considerably more widespread in the brain, and particularly the cerebral cortex, than was once thought. Indeed, even the primary cortical areas in each of the main senses have been claimed as part of the growing network of multisensory regions (Ghazanfar and Schroeder 2006). It is clearly beneficial to be able to combine information from the different senses. Although the perception of speech is based on the processing of sound, what we actually hear can be influenced by visual cues provided by lip movements. This can result in an improvement in speech intelligibility 31
32
The Neural Bases of Multisensory Processes
in the presence of other distracting sounds (Sumby and Pollack 1954) or even a subjective change in the speech sounds that are perceived (McGurk and MacDonald 1976). Similarly, the accuracy with which the source of a sound can be localized is affected by the availability of both spatially congruent (Shelton and Searle 1980; Stein et al. 1989) and conflicting (Bertelson and Radeau 1981) visual stimuli. With countless other examples of cross-modal interactions at the perceptual level (Calvert and Thesen 2004), it is perhaps not surprising that multisensory convergence is so widely found throughout the cerebral cortex. The major challenge that we are now faced with is to identify the function of multisensory integration in different cortical circuits, and particularly at early levels of the cortical hierarchy—the primary and secondary sensory areas—which are more likely to be involved in general-purpose processing relating to multiple sound parameters than in task-specific computational operations (Griffiths et al. 2004; King and Nelken 2009). In doing so, we have to try and understand how other modalities influence the sensitivity or selectivity of cortical neurons in those areas while retaining the modality specificity of the percepts to which the activity of the neurons contributes. By investigating the sources of origin of these inputs and the way in which they interact with the dominant input modality for a given cortical area, we can begin to constrain our ideas about the potential functions of multisensory integration in early sensory cortex. In this article, we focus on the organization and putative functions of visual inputs to the auditory cortex. Although anatomical and physiological studies have revealed multisensory interactions in visual and somatosensory areas, it is arguably the auditory cortex where most attention has been paid and where we may be closest to answering these questions.
3.2 FUNCTIONAL SPECIALIZATION WITHIN AUDITORY CORTEX? A common feature of all sensory systems is that they comprise multiple cortical areas that can be defined both physiologically and anatomically, and which are collectively involved in the processing of the world around us. Although most studies on the cortical auditory system have focused on the primary area, A1, there is considerable interest in the extent to which different sound features are represented in parallel in distinct functional streams that extend beyond A1 (Griffiths et al. 2004). Research on this question has been heavily influenced by studies of the visual cortex and, in particular, by the proposal that a division of function exists, with separate dorsal and ventral pathways involved in visuomotor control and object identification, respectively. The dorsal processing stream, specialized for detecting object motion and discriminating spatial relationships, includes the middle temporal (MT) and medial superior temporal (MST) areas, whereas the ventral stream comprises areas responsible for color, form, and pattern discrimination. Although the notion of strict parallel processing of information, originating subcortically in the p and m pathways and terminating in temporal and parietal cortical areas, is certainly an oversimplification (Merigan and Maunsell 1993), the perception–action hypothesis is supported by neuroimaging, human neuropsychology, monkey neurophysiology, and human psychophysical experiments (reviewed by Goodale and Westwood 2004). A popular, if controversial, theory seeks to impose a similar organizational structure onto the auditory cortex. Within this framework, Rauschecker and Tian (2000) proposed that the auditory cortex can be divided into a rostral processing stream, responsible for sound identification, and a caudal processing stream, involved in sound localization. Human functional imaging data provide support for this idea (Alain et al. 2001; Barrett and Hall 2006; Maeder et al. 2001; Warren and Griffiths 2003), and there is evidence for regional differentiation based on the physiological response properties of single neurons recorded in the auditory cortex of nonhuman primates (Tian et al. 2001; Recanzone 2000; Woods et al. 2006; Bendor and Wang 2005). However, the most compelling evidence for a division of labor has been provided by the specific auditory deficits induced by transiently deactivating different cortical areas in cats. Thus, normal sound localization in this species requires the activation of A1, the posterior auditory field (PAF), the anterior ectosylvian sulcus and the dorsal zone of the auditory cortex, whereas other areas, notably the anterior auditory
Auditory Cortex according to Multisensory Processing
33
field (AAF), ventral PAF (VPAF), and secondary auditory cortex (A2) do not appear to contribute to this task (Malhotra and Lomber 2007). Moreover, a double dissociation between PAF and AAF in the same animals has been demonstrated, with impaired sound localization produced by cooling of PAF but not AAF, and impaired temporal pattern discrimination resulting from inactivation of AAF but not PAF (Lomber and Malhotra 2008). Lastly, anatomical projection patterns in nonhuman primates support differential roles for rostral and caudal auditory cortex, with each of those areas having distinct prefrontal targets (Hackett et al. 1999; Romanski et al. 1999). Despite this apparent wealth of data in support of functional specialization within the auditory cortex, there are a number of studies that indicate that sensitivity to both spatial and nonspatial sound attributes is widely distributed across different cortical fields (Harrington et al. 2008; Stecker et al. 2003; Las et al. 2008; Hall and Plack 2009; Recanzone 2008; Nelken et al. 2008; Bizley et al. 2009). Moreover, in humans, circumscribed lesions within the putative “what” and “where” pathways do not always result in the predicted deficits in sound recognition and localization (Adriani et al. 2003). Clearly defined output pathways from auditory cortex to prefrontal cortex certainly seem to exist, but what the behavioral deficits observed following localized deactivation or damage imply about the functional organization of the auditory cortex itself is less clear-cut. Loss of activity in any one part of the network will, after all, affect both upstream cortical areas and potentially the responses of subcortical neurons that receive descending projections from that region of the cortex (Nakamoto et al. 2008). Thus, a behavioral deficit does not necessarily reflect the specialized properties of the neurons within the silenced cortical area per se, but rather the contribution of the processing pathways that the area is integral to. Can the distribution and nature of multisensory processing in the auditory cortex help reconcile the apparently contrasting findings outlined above? If multisensory interactions in the cortex are to play a meaningful role in perception and behavior, it is essential that the neurons can integrate the corresponding multisensory features of individual objects or events, such as vocalizations and their associated lip movements or the visual and auditory cues originating from the same location in space. Consequently, the extent to which spatial and nonspatial sound features are processed in parallel in the auditory cortex should also be apparent in both the multisensory response properties of the neurons found there and the sources of origin of its visual inputs. Indeed, evidence for taskspecific activation of higher cortical areas by different stimulus modalities has recently been provided in humans (Renier et al. 2009). In the next section, we focus on the extent to which anatomical and physiological studies of multisensory convergence and processing in the auditory cortex of the ferret have shed light on this issue. In recent years, this species has gained popularity for studies of auditory cortical processing, in part because of its particular suitability for behavioral studies.
3.3 FERRET AUDITORY CORTEX: A MODEL FOR MULTISENSORY PROCESSING 3.3.1 Organization of Ferret Auditory Cortex Ferret auditory cortex consists of at least six acoustically responsive areas: two core fields, A1 and AAF, which occupy the middle ectosylvian gyrus; two belt areas on the posterior ectosylvian gyrus, the posterior pseudosylvian field (PPF) and posterior suprasylvian field (PSF); plus two areas on the anterior ectosylvian gyrus, the anterior dorsal field (ADF) and the anterior ventral field (AVF) (Bizley et al. 2005; Figure 3.1a). A1, AAF, PPF, and PSF are all tonotopically organized: the neurons found there respond to pure tones and are most sensitive to particular sound frequencies, which vary systemically in value with neuron location within each cortical area. There is little doubt that an equivalent area to the region designated as A1 is found in many different mammalian species, including humans. AAF also appears to be homologous to AAF in other species including the gerbil (Thomas et al. 1993) and the cat (Imaizumi et al. 2004), and is characterized by an underrepresentation of neurons preferring middle frequencies and having shorter response latencies compared to A1.
34
The Neural Bases of Multisensory Processes (a) PPr S1 body S1 face
sss
S RS S M LRS
AAF A1
ADF
AVF
PPF
ssd
C
A AL MLS LS?
21
PSF
VP
19
18
17
PS SSY PLLS?
D
PPc
SIII
20a
20b
5 mm
(b)
(c)
I-IV V-VI wm V-VI I-IV
wm
wm
CTβ BDA
V1, V2 (sparse) Area 20 (visual form) SSY (visual motion)
R
CTβ BDA
1 mm
(d) Visual cortical input
D
A1 AAF PPF PSF ADF
FIGURE 3.1 Visual inputs to ferret auditory cortex. (a) Ferret sensory cortex. Visual (areas 17–20, PS, SSY, AMLS), posterior parietal (PPr, PPc), somatosensory (S1, SIII, MRSS), and auditory areas (A1, AAF, PPF, PSF, and ADF) have been identified. In addition, LRSS and AVF are multisensory regions, although many of the areas classified as modality specific also contain some multisensory neurons. (b) Location of neurons in visual cortex that project to auditory cortex. Tracer injections made into core auditory cortex (A1: BDA, shown in black, and AAF: CTβ, shown in gray) result in retrograde labeling in early visual areas. Every fifth section (50 µm thick) was examined, but for the purpose of illustration, labeling from four sections was collapsed onto single sections. Dotted lines mark the limit between cortical layers IV and V; dashed lines delimit the white matter (wm). (c) Tracer injections made into belt auditory cortex. Retrograde labeling after an injection of CTβ into the anterior fields (on the borders of ADF and AVF) is shown in gray, and retrograde labeling resulting from a BDA injection into the posterior fields PPF and PSF is shown in black. Note the difference in the extent and distribution of labeling after injections into the core and belt areas of auditory cortex. Scale bars in (b) and (c), 1 mm. (d) Summary of sources of visual cortical input to auditory cortex. (Anatomical data adapted with permission from Bizley, J.K. et al., Cereb. Cortex, 17, 2172–89, 2007.)
Auditory Cortex according to Multisensory Processing
35
Neurons in the posterior fields can be distinguished from those in the primary areas by the temporal characteristics of their responses; discharges are often sustained and they vary in latency and firing pattern in a stimulus-dependent manner. The frequency response areas of posterior field neurons are often circumscribed, exhibiting tuning for sound level as well as frequency. As such, the posterior fields in the ferret resemble PAF and VPAF in the cat (Stecker et al. 2003; Phillips and Orman 1984; Loftus and Sutter 2001) and cortical areas R and RT in the marmoset monkey (Bizley et al. 2005; Bendor and Wang 2008), although whether PPF and PSF actually correspond to these fields is uncertain. Neurons in ADF also respond to pure tones, but are not tonotopically organized (Bizley et al. 2005). The lack of tonotopicity and the broad, high-threshold frequency response areas that characterize this field are also properties of cat A2 (Schreiner and Cynader 1984). However, given that ferret ADF neurons seem to show relatively greater spatial sensitivity than those in surrounding cortical fields (see following sections), which is not a feature of cat A2, it seems unlikely that these areas are homologous. Ventral to ADF lies AVF. Although many of the neurons that have been recorded there are driven by sound, the high incidence of visually responsive neurons (see Section 3.3.4) makes it likely that AVF should be regarded as a parabelt or higher multisensory field. Given its proximity to the somatosensory area on the medial bank of the rostral suprasylvian sulcus (MRSS) (Keniston et al. 2009), it is possible that AVF neurons might also be influenced by tactile stimuli, but this remains to be determined. Other studies have also highlighted the multisensory nature of the anterior ectosylvian gyrus. For example, Ramsay and Meredith (2004) described an area surrounding the pseudosylvian sulcus that receives largely segregated inputs from the primary visual and somatosensory cortices, which they termed the pseudosylvian sulcal cortex. Manger et al. (2005) reported that a visually responsive area lies parallel to the pseudosylvian sulcus on the posterolateral half of the anterior ectosylvian gyrus, which also contains bisensory neurons that respond either to both visual and tactile or to visual and auditory stimulation. They termed this area AEV, following the terminology used for the visual region within the cat’s anterior ectosylvian sulcus. Because this region overlaps in part with the acoustically responsive areas that we refer to as ADF and AVF, further research using a range of stimuli will be needed to fully characterize this part of the ferret’s cortex. However, the presence of a robust projection from AVF to the superior colliculus (Bajo et al. 2010) makes it likely that this is equivalent to the anterior ectosylvian sulcus in the cat.
3.3.2 Surrounding Cortical Fields The different auditory cortical areas described in the previous section are all found on the ectosylvian gyrus (EG), which is enclosed by the suprasylvian sulcus (Figure 3.1a). The somatosensory cortex lies rostral to the EG (Rice et al. 1993; McLaughlin et al. 1998), extrastriate visual areas are located caudally (Redies et al. 1990), and the parietal cortex is found dorsal to the EG (Manger et al. 2002). The suprasylvian sulcus therefore separates the different auditory fields from functionally distinct parts of the cerebral cortex. Within the suprasylvian sulcus itself, several additional cortical fields have been characterized (Philipp et al. 2006; Manger et al. 2004, 2008; Cantone et al. 2006; Keniston et al. 2008). Beginning at the rostral border between the auditory and somatosensory cortices, field MRSS (Keniston et al. 2009) and the lateral bank of the rostral suprasylvian sulcus (LRSS) (Keniston et al. 2008) form the medial and lateral sides of the suprasylvian sulcus, respectively. Field LRSS has been identified as an auditory–somatosensory area, whereas MRSS is more modality specific and is thought to be a higher somatosensory field. Field MRSS is bordered by the anteromedial lateral suprasylvian visual area (AMLS), which lines the medial or dorsal bank of the suprasylvian sulcus (Manger et al. 2008). Two more visually responsive regions, the suprasylvian visual area (SSY) (Cantone et al. 2006; Philipp et al. 2006) and the posterior suprasylvian area (PS) (Manger et al. 2004) are found on the caudal side of the sulcus. SSY corresponds in location to an area described by Philipp et al.
36
The Neural Bases of Multisensory Processes
(2005) as the ferret homologue of primate motion-processing area MT. This region has also been described by Manger et al. (2008) as the posteromedial suprasylvian visual area, but we will stay with the terminology used in our previous articles and refer to it as SSY. PS has not been comprehensively investigated and, to our knowledge, neither of these sulcal fields have been tested with auditory or somatosensory stimuli. On the lateral banks of the suprasylvian sulcus, at the dorsal and caudal edges of the EG, remains an area of uninvestigated cortex. On the basis of its proximity to AMLS and SSY, this region has tentatively been divided into the anterolateral lateral suprasylvian visual area (ALLS) and the posterolateral lateral suprasylvian visual area (PLLS) by Manger et al. (2008). However, because these regions of the sulcal cortex lie immediately adjacent to the primary auditory fields, it is much more likely that they are multisensory in nature.
3.3.3 Sensitivity to Complex Sounds In an attempt to determine whether spatial and nonspatial stimulus attributes are represented within anatomically distinct regions of the ferret auditory cortex, we investigated the sensitivity of neurons in both core and belt areas to stimulus periodicity, timbre, and spatial location (Bizley et al. 2009). Artificial vowel sounds were used for this purpose, as they allowed each of these stimulus dimensions to be varied parametrically. Recordings in our laboratory have shown that ferret vocalizations cover the same frequency range as the sounds used in this study. Vowel identification involves picking out the formant peaks in the spectral envelope of the sound, and is therefore a timbre discrimination task. The periodicity of the sound corresponds to its perceived pitch and conveys information about speaker identity (males tend to have lower pitch voices than females) and emotional state. Neuronal sensitivity to timbre and pitch should therefore be found in cortical areas concerned with stimulus identification. Neurons recorded throughout the five cortical areas (A1, AAF, PPF, PSF, and ADF) examined were found to be sensitive to the pitch, timbre, and location of the sound source, implying a distributed representation of both spatial and nonspatial sound properties. Nevertheless, significant inter areal differences were observed. Sensitivity to sound pitch and timbre was most pronounced in the primary and posterior auditory fields (Bizley et al. 2009). By contrast, relatively greater sensitivity to sound-source location was found in A1 and in the areas around the pseudosylvian sulcus, which is consistent with the finding that the responses of neurons in ADF carry more information about sound azimuth than those in other auditory cortical areas (Bizley and King 2008). The variance decomposition method used in the study by Bizley et al. (2009) to quantify the effects of each stimulus parameter on the responses of the neurons was very different from the measures used to define a pitch center in marmoset auditory cortex (Bendor and Wang 2005). We did not, for example, test whether pitch sensitivity was maintained for periodic stimuli in which the fundamental frequency had been omitted. Consequently, the distributed sensitivity we observed is not incompatible with the idea that there might be a dedicated pitch-selective area. However, in a subsequent study, we did find that the spiking responses of single neurons and neural ensembles throughout the auditory cortex can account for the ability of trained ferrets to detect the direction of a pitch change (Bizley et al. 2010). Although further research is needed, particularly in awake, behaving animals, these electrophysiological data are consistent with support the results of an earlier intrinsic optical imaging study (Nelken et al. 2008) in providing only limited support for a division of labor across auditory cortical areas in the ferret.
3.3.4 Visual Sensitivity in Auditory Cortex Visual inputs into auditory cortex have been described in several species, including humans (Calvert et al. 1999; Giard and Peronnet 1999; Molholm et al. 2002), nonhuman primates (Brosch et al. 2005; Ghazanfar et al. 2005; Schroeder and Foxe 2002; Kayser et al. 2007), ferrets (Bizley and King 2008, 2009; Bizley et al. 2007), gerbils (Cahill et al. 1996), and rats (Wallace et al. 2004). In our studies on the ferret, the responses of single neurons and multineuron clusters were recorded to simplistic
37
Auditory Cortex according to Multisensory Processing
artificial stimuli presented under anesthesia. Sensitivity to visual stimulation was defined as a statistically significant change in spiking activity after the presentation of light flashes from a light-emitting diode (LED) positioned in the contralateral hemifield or by a significant modulation of the response to auditory stimulation even if the LED by itself was apparently ineffective in driving the neuron. Although the majority of neurons recorded in the auditory cortex were classified as auditory alone, the activity of more than one quarter was found to be influenced by visual stimulation. Figure 3.2a shows the relative proportion of different response types observed in the auditory cortex as a whole. (a)
(b)
7%
AV mod 14%
Visual
Auditory
Proportion of total units
AV
100
7%
72%
140 284 143 127 225 105
75
Visual Bisensory Auditory
50 25 0
A1 AAF PPF PSF ADF AVF
Area
(c)
(d)
Auditory Visual Audio-visual
1
MI (bits) mean spike latency
A1 AAF PPF PSF ADF AVF Enhancement
0
0.25
Suppression
0.5
Proportion of cells
0.75
0.8 0.6 0.4 0.2 0
0
0.2
0.4
0.6
MI (bits) spike count
0.8
1
FIGURE 3.2 Visual–auditory interactions in ferret auditory cortex. (a) Proportion of neurons (n = 716) that responded to contralaterally presented noise bursts (auditory), to light flashes from an LED positioned in the contralateral visual field (visual), to both of these stimuli (AV), or whose responses to the auditory stimulus were modulated by the presentation of the visual stimulus, which did not itself elicit a response (AVmod). (b) Bar graph showing the relative proportions of unisensory auditory (white), unisensory visual (black), and bisensory (gray) neurons recorded in each auditory field. The actual numbers of neurons recorded are given at the top of each column. (c) Proportion of neurons whose spike rates in response to combined visual–auditory stimulation were enhanced or suppressed. Total number of bisensory neurons in each field: A1, n = 9; AAF, n = 16; PPF, n = 13; PSF, n = 32; ADF, n = 32; AVF, n = 24. (d) Distribution of mutual information (MI) values obtained when two reduced spike statistics were used: spike count and mean spike latency. Points above the unity line indicate that mean response latency was more informative about the stimulus than spike count. This was increasingly the case for all three stimulus conditions when the spike counts were low. (Anatomical data adapted from Bizley, J.K. et al., Cereb. Cortex, 17, 2172–89, 2007 and Bizley, J.K., and King, A.J., Hearing Res., 258, 55–63, 2009.)
38
The Neural Bases of Multisensory Processes
Bisensory neurons comprised both those neurons whose spiking responses were altered by auditory and visual stimuli and those whose auditory response was modulated by the simultaneously presented visual stimulus. The fact that visual stimuli can drive spiking activity in the auditory cortex has also been described in highly trained monkeys (Brosch et al. 2005). Nevertheless, this finding is unusual, as most reports emphasize the modulatory nature of nonauditory inputs on the cortical responses to sound (Ghazanfar 2009; Musacchia and Schroeder 2009). At least part of the explanation for this is likely to be that we analyzed our data by calculating the mutual information between the neural responses and the stimuli that elicited them. Information (in bits) was estimated by taking into account the temporal pattern of the response rather than simply the overall spike count. This method proved to be substantially more sensitive than a simple spike count measure, and allowed us to detect subtle, but nonetheless significant, changes in the neural response produced by the presence of the visual stimulus. Although neurons exhibiting visual–auditory interactions are found in all six areas of the ferret cortex, the proportion of such neurons varies in different cortical areas (Figure 3.2b). Perhaps not surprisingly, visual influences are least common in the primary areas, A1 and AAF. Nevertheless, approximately 20% of the neurons recorded in those regions were found to be sensitive to visual stimulation, and even included some unisensory visual responses. In the fields on the posterior ectosylvian gyrus and ADF, 40% to 50% of the neurons were found to be sensitive to visual stimuli. This rose to 75% in AVF, which, as described in Section 3.3.1, should probably be regarded as a multisensory rather than as a predominantly auditory area. We found that visual stimulation could either enhance or suppress the neurons’ response to sound and, in some cases, increased the precision in their spike timing without changing the overall firing rate (Bizley et al. 2007). Analysis of all bisensory neurons, including both neurons in which there was a spiking response to each sensory modality and those in which concurrent auditory–visual stimulation modulated the response to sound alone, revealed that nearly two-thirds produced stronger responses to bisensory than to unisensory auditory stimulation. Figure 3.2c shows the proportion of response types in each cortical field. Although the sample size in some areas was quite small, the relative proportions of spiking responses that were either enhanced or suppressed varied across the auditory cortex. Apart from the interactions in A1, the majority of the observed interactions were facilitatory rather than suppressive. Although a similar trend for a greater proportion of sites to show enhancement as compared with suppression has been reported for local field potential data in monkey auditory cortex, analysis of spiking responses revealed that suppressive interactions are more common (Kayser et al. 2008). This trend was found across four different categories of naturalistic and artificial stimuli, so the difference in the proportion of facilitatory and suppressive interactions is unlikely to reflect the use of different stimuli in the two studies. By systematically varying onset asynchronies between the visual and auditory stimuli, we did observe in a subset of neurons that visual stimuli could have suppressive effects when presented 100 to 200 ms before the auditory stimuli, which were not apparent when the two modalities were presented simultaneously (Bizley et al. 2007). This finding, along with the results of several other studies (Meredith et al. 2006; Dehner et al. 2004; Allman et al. 2008), emphasizes the importance of using an appropriate combination of stimuli to reveal the presence and nature of cross-modal interactions. Examination of the magnitude of cross-modal facilitation in ferret auditory cortex showed that visual–auditory interactions are predominantly sublinear. In other words, both the mutual information values (in bits) and the spike rates in response to combined auditory–visual stimulation are generally less than the linear sum of the responses to the auditory and visual stimuli presented in isolation, although some notable exceptions to this have been found (e.g., Figure 2E, F of Bizley et al. 2007). This is unsurprising as the stimulus levels used in that study were well above threshold and, according to the “inverse effectiveness principle” (Stein et al. 1988), were unlikely to produce supralinear responses to combined visual–auditory stimulation. Consistent with this is the observation of Kayser et al. (2008), showing that, across stimulus types, multisensory facilitation is more common for those stimuli that are least effective in driving the neurons.
39
Auditory Cortex according to Multisensory Processing
As mentioned above, estimates of the mutual information between the neural responses and each of the stimuli that produce them take into account the full spike discharge pattern. It is then possible to isolate the relative contributions of spike number and spike timing to the neurons’ sensitivity to multisensory stimulation. It has previously been demonstrated in both ferret and cat auditory cortex that the stimulus information contained in the complete spike pattern is conveyed by a combination of spike count and mean spike latency (Nelken et al. 2005). By carrying out a similar analysis of the responses to the brief stimuli used to characterize visual–auditory interactions in ferret auditory cortex, we found that more than half the neurons transmitted more information in the timing of their responses than in their spike counts (Bizley et al. 2007). This is in agreement with the results of Nelken et al. (2005) for different types of auditory stimuli. We found that this was equally the case for unisensory auditory or visual stimuli and for combined visual–auditory stimulation (Figure 3.2d).
3.3.5 Visual Inputs Enhance Processing in Auditory Cortex To probe the functional significance of the multisensory interactions observed in the auditory cortex, we systematically varied the spatial location of the stimuli and calculated the mutual information between the neural responses and the location of unisensory visual, unisensory auditory, and spatially and temporally coincident auditory–visual stimuli (Bizley and King 2008). The majority of the visual responses were found to be spatially restricted, and usually carried more location-related information than was the case for the auditory responses. The amount of spatial information available in the neural responses varied across the auditory cortex (Figure 3.3). For all three stimulus (a)
(b) Auditory
Visual 2 Mutual information (bits)
Bisensory 1.8
0.6
1.8
1.6
1.6
0.5
1.4
1.4 1.2
0.4
1.2 1
1 0.8
0.3
0.8 0.6
0.6
0.2
0.4 0.2 0
(c)
0.4 0.2
0.1 A1
AAF PPF PSF Cortical field
ADF
A1
AAF PPF
PSF
ADF
0
A1
AAF PPF
PSF
ADF
FIGURE 3.3 Box plots displaying the amount of information transmitted by neurons in each of five ferret cortical fields about LED location (a), sound-source location (b), or the location of temporally and spatially congruent auditory–visual stimuli (c). Only neurons for which there was a significant unisensory visual or auditory response are plotted in (a) and (b), respectively, whereas (c) shows the multisensory mutual information values for all neurons recorded, irrespective of their response to unisensory stimulation. The box plots show the median (horizontal bar), interquartile range (boxes), spread of data (tails), and outliers (cross symbols). The notch indicates the distribution of data about the median. There were significant differences in the mutual information values in different cortical fields (Kruskal–Wallis test; LED location, p = .0001; auditory location, p = .0035; bisensory stimulus location, p < .0001). Significant post hoc pairwise differences (Tukey– Kramer test, p < .05) between individual cortical fields are shown by the lines above each box plot. Note that neurons in ADF transmitted the most spatial information irrespective of stimulus modality. (Adapted with permission from Bizley, J.K., and King, A.J., Brain Res., 1242, 24–36, 2008.)
40
The Neural Bases of Multisensory Processes
conditions, spatial sensitivity was found to be highest in ADF, supporting the notion that there is some functional segregation across the auditory cortex, with the anterior fields more involved in spatial processing. Relative to the responses to sound alone, the provision of spatially coincident visual cues frequently altered the amount of information conveyed by the neurons about stimulus location. Bisensory stimulation reduced the spatial information in the response in one third of these cases, but increased it in the remaining two thirds. Thus, overall, visual inputs to the auditory cortex appear to enhance spatial processing. Because of the simple stimuli that were used in these studies, it was not possible to determine whether or how visual inputs might affect the processing of nonspatial information in ferret auditory cortex. However, a number of studies in primates have emphasized the benefits of visual influences on auditory cortex in terms of the improved perception of vocalizations. In humans, lip reading has been shown to activate the auditory cortex (Molholm et al. 2002; Giard and Peronnet 1999; Calvert et al. 1999), and a related study in macaques has shown that presenting a movie of a monkey vocalizing can modulate the auditory cortical responses to that vocalization (Ghazanfar et al. 2005). These effects were compared to a visual control condition in which the monkey viewed a disk that was flashed on and off to approximate the movements of the animal’s mouth. In that study, the integration of face and voice stimuli was found to be widespread in both core and belt areas of the auditory cortex. However, to generate response enhancement, a greater proportion of recording sites in the belt areas required the use of a real monkey face, whereas nonselective modulation of auditory cortical responses was more common in the core areas. Because a number of cortical areas have now been shown to exhibit comparable sensitivity to monkey calls (Recanzone 2008), it would be of considerable interest to compare the degree to which face and non-face visual stimuli can modulate the activity of the neurons found there. This should help us determine the relative extent to which each area might be specialized for processing communication signals.
3.4 WHERE DO VISUAL INPUTS TO AUDITORY CORTEX COME FROM? Characterizing the way in which neurons are influenced by visual stimuli and their distribution within the auditory cortex is only a first step in identifying their possible functions. It is also necessary to know where those visual inputs originate. Potentially, visual information might gain access to the auditory cortex in a number of ways. These influences could arise from direct projections from the visual cortex or they could be inherited from multisensory subcortical nuclei, such as nonlemniscal regions of the auditory thalamus. A third possibility includes feedback connections from higher multisensory association areas in temporal, parietal, or frontal cortex. Anatomical evidence from a range of species including monkeys (Smiley et al. 2007; Hackett et al. 2007a; Cappe et al. 2009), ferrets (Bizley et al. 2007), prairie voles (Campi et al. 2010), and gerbils (Budinger et al. 2006) has shown that subcortical as well as feedforward and feedback corticortical inputs could underpin multisensory integration in auditory cortex. To determine the most likely origins of the nonauditory responses in the auditory cortex, we therefore need to consider studies of anatomical connectivity in conjunction with information about the physiological properties of the neurons, such as tuning characteristics or response latencies. Previous studies have demonstrated direct projections from core and belt auditory cortex into visual areas V1 and V2 in nonhuman primates (Rockland and Ojima 2003; Falchier et al. 2002) and, more recently, in cats (Hall and Lomber 2008). The reciprocal projection, from V1 to A1, remains to be described in primates, although Hackett et al. (2007b) have found evidence for a pathway terminating in the caudomedial belt area of the auditory cortex from the area prostriata, adjacent to V1, which is connected with the peripheral visual field representations in V1, V2, and MT. Connections between early auditory and visual cortical fields have also been described in gerbils (Budinger et al. 2006, 2008) and prairie voles (Campi et al. 2010). By placing injections of neural tracer into physiologically identified auditory fields in the ferret, we were able to characterize the potential sources of visual input (Bizley et al. 2007; Figure 3.1b, c).
Auditory Cortex according to Multisensory Processing
41
These data revealed a clear projection pattern whereby specific visual cortical fields innervate specific auditory fields. A sparse direct projection exists from V1 to the core auditory cortex (A1 and AAF), which originates from the region of V1 that represents the peripheral visual field. This finding mirrors that of the reciprocal A1 to V1 projection in monkeys and cats, which terminates in the peripheral field representation of V1 (Rockland and Ojima 2003; Falchier et al. 2002; Hall and Lomber 2008). Ferret A1 and AAF are also weakly innervated by area V2. The posterior auditory fields, PPF and PSF, are innervated principally by areas 20a and 20b, thought to be part of the visual form-processing pathway (Manger et al. 2004). In contrast, the largest inputs to the anterior fields, ADF and AVF, come from SSY, which is regarded as part of the visual “where” processing stream (Philipp et al. 2006). Interestingly, this difference in the sources of cortical visual input, which is summarized in Figure 3.1d, appears to reflect the processing characteristics of the auditory cortical fields concerned. As described above, the fields on the posterior ectosylvian gyrus are more sensitive to pitch and timbre, parameters that contribute to the identification of a sound source, whereas spatial sensitivity for auditory, visual, and multisensory stimuli is greatest in ADF (Figure 3.3). This functional distinction therefore matches the putative roles of the extrastriate areas that provide the major sources of cortical visual input to each of these regions. These studies appear to support the notion of a division of labor across the nonprimary areas of ferret auditory cortex, but it would be premature to conclude that distinct fields are responsible for the processing of spatial and nonspatial features of the world. Thus, although PSF is innervated by nonspatial visual processing areas 20a and 20b (Figure 3.1c), the responses of a particularly large number of neurons found there show an increase in transmitted spatial information when a spatially congruent visual stimulus is added to the auditory stimulus (Bizley and King 2008). This could be related to a need to integrate spatial and nonspatial cues when representing objects and events in the auditory cortex. The possibility that connections between the visual motion-sensitive area SSY and the fields on the anterior ectosylvian gyrus are involved in processing spatial information provided by different sensory modalities is supported by a magnetoencephalography study in humans showing that audio–visual motion signals are integrated in the auditory cortex (Zvyagintsev et al. 2009). However, we must not forget that visual motion also plays a key role in the perception of communication calls. By making intracranial recordings in epileptic patients, Besle et al. (2008) found that the visual cues produced by lip movements activate MT followed, approximately 10 ms later, by secondary auditory areas, where they alter the responses to sound in ways that presumably influence speech perception. Thus, although the influence of facial expressions on auditory cortical neurons is normally attributed to feedback from the superior temporal sulcus (Ghazanfar et al. 2008), the availability of lower-level visual signals that provide cues to sound onset and offset may be important as well.
3.5 WHAT ARE THE PERCEPTUAL CONSEQUENCES OF MULTISENSORY INTEGRATION IN THE AUDITORY CORTEX? The concurrent availability of visual information presumably alters the representation in the auditory cortex of sources that can be seen as well as heard in ways that are relevant for perception and behavior. Obviously, the same argument applies to the somatosensory inputs that have also been described there (Musacchia and Schroeder 2009). By influencing early levels of cortical processing, these nonauditory inputs may play a fairly general processing role by priming the cortex to receive acoustic signals. It has, for example, been proposed that visual and somatosensory inputs can modulate the phase of oscillatory activity in the auditory cortex, potentially amplifying the response to related auditory signals (Schroeder et al. 2008). But, as we have seen, visual inputs can also have more specific effects, changing the sensitivity and even the selectivity of cortical responses to stimulus location and, at least in primates, to vocalizations where communication
42
The Neural Bases of Multisensory Processes
relies on both vocal calls and facial gestures. The role of multisensory processing in receptive auditory communication is considered in more detail in other chapters in this volume. Here, we will focus on the consequences of merging spatial information across different sensory modalities in the auditory cortex.
3.5.1 Combining Auditory and Visual Spatial Representations in the Brain There are fundamental differences in the ways in which source location is extracted by the visual and auditory systems. The location of visual stimuli is represented topographically, first by the distribution of activity across the retina and then at most levels of the central visual pathway. By contrast, auditory space is not encoded explicitly along the cochlea. Consequently, sound-source location has to be computed within the brain on the basis of the relative intensity and timing of sounds at each ear (“binaural cues”), coupled with the location-dependent filtering of sounds by the external ear (King et al. 2001). By tuning neurons to appropriate combinations of these cues, a “visual-like” map of auditory space is constructed in the superior colliculus, allowing spatial information from different sensory modalities to be represented in a common format (King and Hutchings 1987; Middlebrooks and Knudsen 1984). This arrangement is particularly advantageous for facilitating the integration of multisensory cues from a common source for the purpose of directing orienting behavior (Stein and Stanford 2008). However, because spatial signals provided by each sensory modality are initially encoded using different reference frames, with visual signals based on eye-centered retinal coordinates and auditory signals being head centered, information about current eye position has to be incorporated into the activity of these neurons in order to maintain map alignment (Hartline et al. 1995; Jay and Sparks 1987). In contrast to the topographic representation of auditory space in the superior colliculus, there is no space map in the auditory cortex (King and Middlebrooks 2010), posing an even greater challenge for the integration of visual and auditory spatial signals at the cortical level. The integrity of several auditory cortical areas is essential for normal sound localization (Malhotra and Lomber 2007), but we still have a very incomplete understanding of how neural activity in those regions contributes to the percept of where a sound source is located. The spatial receptive fields of individual cortical neurons are frequently very broad and, for the most part, occupy the contralateral side of space. However, several studies have emphasized that sound-source location can also be signaled by the timing of spikes (Jenison 2000; Nelken et al. 2005; Stecker et al. 2003). Our finding that the presence of spatially congruent visual stimuli leads to auditory cortical neurons becoming more informative about the source location, and that this greater spatial selectivity is based on both the timing and number of spikes evoked, is clearly consistent with this. Whatever the relative contributions of different neural coding strategies might be, it seems that sound-source location is signaled by the population response of neurons in the auditory cortex (Woods et al. 2006). The approach used by Allman and colleagues (2009) to estimate the response facilitation produced in a population of cortical neurons by combining visual and auditory stimuli might therefore be useful for characterizing the effects on spatial processing at this level. We pointed out above that meaningful interactions between different sensory modalities can take place only if the different reference frames used to encode modality-specific spatial signals are brought together. Further evidence for the multisensory representation of spatial signals in the auditory cortex is provided by the demonstration that gaze direction can change the activity of neurons in the auditory cortex (Fu et al. 2004; Werner-Reiss et al. 2003). A modulatory influence of eye position on auditory responses has been observed as early as the inferior colliculus (Groh et al. 2001), indicating that these effects could be inherited from the midbrain rather than created de novo in the auditory cortex. On the other hand, the timing and laminar profile of eye-position effects in the auditory cortex is more consistent with an origin from nonlemniscal regions of the thalamus or via feedback projections from the parietal or frontal cortices (Fu et al. 2004). As in the superior colliculus, varying eye position does not change auditory cortical spatial tuning in a manner consistent
Auditory Cortex according to Multisensory Processing
43
with a straightforward transformation into eye-centered coordinates. Rather, spatial tuning seems to take on an intermediate form between eye-centered and head-centered coordinates (Werner-Reiss et al. 2003).
3.5.2 A Role for Auditory Cortex in Spatial Recalibration? One possibility that has attracted recent attention is that visual–auditory interactions in early sensory cortex could be involved in the visual recalibration of auditory space. The representation of auditory space in the brain is inherently plastic, even in adulthood, and there are several welldocumented examples in which the perceived location of sound sources can be altered so as to conform to changes in visual inputs (King 2009; King et al. 2001). The most famous of these is the ventriloquism illusion, whereby synchronous but spatially disparate visual cues can “capture” the location of a sound source, so that it is incorrectly perceived to arise from near the seen location (Bertelson and Radeau 1981). Repeated presentation of consistently misaligned visual and auditory cues results in a shift in the perception of auditory space that can last for tens of minutes once the visual stimulus is removed. This aftereffect has been reported in humans (Recanzone 1998; Radeau and Bertelson 1974; Lewald 2002) and in nonhuman primates (Woods and Recanzone 2004). Given the widespread distribution of visual–auditory interactions in the cortex, a number of sites could potentially provide the neural substrate for this cross-modal spatial illusion. The finding that the ventriloquism aftereffect does not transfer across sound frequency (Lewald 2002; Recanzone 1998; Woods and Recanzone 2004) implies the involvement of a tonotopically organized region, i.e., early auditory cortex. On the other hand, generalization across frequencies has been observed in another study (Frissen et al. 2005), so this conclusion may not stand. However, neuroimaging results in humans have shown that activity levels in the auditory cortex vary on a trial-by-trial basis according to whether a spatially discrepant visual stimulus is presented at the same time (Bonath et al. 2007). Furthermore, the finding by Passamonti et al. (2009) that patients with unilateral lesions of the visual cortex fail to show the ventriloquism aftereffect in the affected hemifield, whereas patients with parietotemporal lesions still do, is consistent with the possibility that connections between the visual and auditory cortices are involved. On the other hand, the hemianopic patients did show improved sound localization accuracy when visual and auditory stimuli are presented at the same location in space, implying that different neural circuits may underlie these cross-modal spatial effects. Visual capture of sound-source location is thought to occur because visual cues normally provide more reliable and higher-resolution spatial information. If the visual stimuli are blurred, however, so that this is no longer the case, spatially conflicting auditory cues can then induce systematic errors in visual localization (Alais and Burr 2004). Nothing is known about the neural basis for reverse ventriloquism, but it is tempting to speculate that auditory influences on visual cortex might be involved. Indeed, the influence of sound on perceptual learning in a visual motion discrimination task has been shown to be limited to locations in visual space that match those of the sound source, implying an auditory influence on processing in a visual area that is retinotopically organized (Beer and Watanabe 2009). Behavioral studies have shown that adult humans and other mammals can adapt substantially to altered auditory spatial cues produced, for example, by reversibly occluding or changing the shape of the external ear (reviewed by Wright and Zhang 2006). Because visual cues provide a possible source of sensory feedback about the accuracy of acoustically guided behavior, one potential role of visual inputs to the auditory cortex is to guide the plasticity observed when localization cues are altered. However, Kacelnik et al. (2006) found that the capacity of adult ferrets to relearn to localize sound accurately after altering binaural cues by reversible occlusion of one ear is not dependent on visual feedback. It has been suggested that instead of being guided by vision, this form of adaptive plasticity could result from unsupervised sensorimotor learning, in which the dynamic acoustic inputs resulting from an animal’s own movements help stabilize the brain’s representation
44
The Neural Bases of Multisensory Processes
of auditory space (Aytekin et al. 2008). Although vision is not essential for the recalibration of auditory space in monaurally occluded ferrets, it is certainly possible that training with congruent multisensory cues might result in faster learning than that seen with auditory cues alone, as shown in humans for a motion detection task (Kim et al. 2008).
3.6 CONCLUDING REMARKS There is now extensive anatomical and physiological evidence from a range of species that multisensory convergence occurs at the earliest levels of auditory cortical processing. These nonauditory influences therefore have to be taken into account in any model of what the auditory cortex actually does. Indeed, one of the consequences of visual, somatosensory, and eye-position effects on the activity of neurons in core and belt areas of the auditory cortex is that those influences will be passed on to each of the brain regions to which these areas project. Multiple sources of input have been implicated in multisensory integration within auditory cortex, and a more detailed characterization of those inputs will help determine the type of information that they provide and what effect this might have on auditory processing. Some of those inputs are likely to provide low-level temporal or spatial cues that enhance auditory processing in a fairly general way, whereas others provide more complex information that is specifically related, for example, to the processing of communication signals. Revealing where those inputs come from and where they terminate will help unravel the relative contributions of different auditory cortical areas to perception. Indeed, the studies that have been carried out to date have provided additional support for the standpoint that there is some functional segregation across the different parts of the auditory cortex. In order to take this further, however, it will also be necessary to examine the behavioral and physiological effects of experimentally manipulating activity in those circuits if we are to understand how visual inputs influence auditory processing and perception.
REFERENCES Adriani, M., P. Maeder, R. Meuli et al. 2003. Sound recognition and localization in man: Specialized cortical networks and effects of acute circumscribed lesions. Experimental Brain Research 153:591–604. Alain, C., S.R. Arnott, S. Hevenor, S. Graham, and C.L. Grady. 2001. “What” and “where” in the human auditory system. Proceedings of the National Academy of Sciences of the United States of America 98:12301–6. Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14:257–62. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008. Subthreshold auditory inputs to extrastriate visual neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific coding. Brain Research 1242:95–101. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2009. Adult deafness induces somatosensory conversion of ferret auditory cortex. Proceedings of the National Academy of Sciences of the United States of America 106:5925–30. Aytekin, M., C.F. Moss, and J.Z. Simon. 2008. A sensorimotor approach to sound localization. Neural Computation 20:603–35. Bajo, V.M., F.R. Nodal, J.K. Bizley, and A.J. King. 2010. The non-lemniscal auditory cortex in ferrets: Convergence of corticotectal inputs in the superior colliculus. Frontiers in Neuroanatomy 4:18. Barrett, D.J., and D.A. Hall. 2006. Response preferences for “what” and “where” in human non-primary auditory cortex. NeuroImage 32:968–77. Beer, A.L., and T. Watanabe. 2009. Specificity of auditory-guided visual perceptual learning suggests crossmodal plasticity in early visual cortex. Experimental Brain Research 198:353–61. Bendor, D., and X. Wang. 2005. The neuronal representation of pitch in primate auditory cortex. Nature 436:1161–5. Bendor, D., and X. Wang. 2008. Neural response properties of primary, rostral, and rostrotemporal core fields in the auditory cortex of marmoset monkeys. Journal of Neurophysiology 100:888–906.
Auditory Cortex according to Multisensory Processing
45
Bertelson, P., and M. Radeau. 1981. Cross-modal bias and perceptual fusion with auditory-visual spatial discordance. Perception & Psychophysics 29:578–84. Besle, J., C. Fischer, A. Bidet-Caulet, F. Lecaignard, O. Bertrand, and M.H. Giard. 2008. Visual activation and audiovisual interactions in the auditory cortex during speech perception: Intracranial recordings in humans. Journal of Neuroscience 28:14301–10. Bizley, J.K., and A.J. King. 2008. Visual-auditory spatial processing in auditory cortical neurons. Brain Research 1242:24–36. Bizley, J.K., and A.J. King. 2009. Visual influences on ferret auditory cortex. Hearing Research 258:55–63. Bizley, J.K., F.R. Nodal, I. Nelken, and A.J. King. 2005. Functional organization of ferret auditory cortex. Cerebral Cortex 15:1637–53. Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89. Bizley, J.K., K.M. Walker, B.W. Silverman, A.J. King, and J.W. Schnupp. 2009. Interdependent encoding of pitch, timbre, and spatial location in auditory cortex. Journal of Neuroscience 29:2064–75. Bizley, J.K., and K.M. Walker, A.J. King, and J.W. Schnupp. 2010. Neural ensemble codes for stimulus periodicity in auditory cortex. Journal of Neuroscience 30:5078–91. Bonath, B., T. Noesselt, A. Martinez et al. 2007. Neural basis of the ventriloquist illusion. Current Biology 17:1697–703. Brosch, M., E. Selezneva, and H. Scheich. 2005. Nonauditory events of a behavioral procedure activate auditory cortex of highly trained monkeys. Journal of Neuroscience 25:6797–806. Budinger, E., P. Heil, A. Hess, and H. Scheich. 2006. Multisensory processing via early cortical stages: Connections of the primary auditory cortical field with other sensory systems. Neuroscience 143: 1065–83. Budinger, E., A. Laszcz, H. Lison, H. Scheich, and F.W. Ohl. 2008. Non-sensory cortical and subcortical connections of the primary auditory cortex in Mongolian gerbils: Bottom-up and top-down processing of neuronal information via field AI. Brain Research 1220:2–32. Cahill, L., F. Ohl, and H. Scheich. 1996. Alteration of auditory cortex activity with a visual stimulus through conditioning: a 2-deoxyglucose analysis. Neurobiology of Learning and Memory 65:213–22. Calvert, G.A., and T. Thesen. 2004. Multisensory integration: Methodological approaches and emerging principles in the human brain. Journal of Physiology, Paris 98:191–205. Calvert, G.A., M.J. Brammer, E.T. Bullmore, R. Campbell, S.D. Iversen, and A.S. David. 1999. Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport 10:2619–23. Campi, K.L., K.L. Bales, R. Grunewald, and L. Krubitzer. 2010. Connections of auditory and visual cortex in the prairie vole (Microtus ochrogaster): Evidence for multisensory processing in primary sensory areas. Cerebral Cortex 20:89–108. Cantone, G., J. Xiao, and J.B. Levitt. 2006. Retinotopic organization of ferret suprasylvian cortex. Visual Neuroscience 23:61–77. Cappe, C., A. Morel, P. Barone, and E.M. Rouiller. 2009. The thalamocortical projection systems in primate: An anatomical support for multisensory and sensorimotor interplay. Cerebral Cortex 19:2025–37. Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multisensory convergence. Cerebral Cortex 14:387–403. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience 22:5749–59. Frissen, I., J. Vroomen, B. De Gelder, and P. Bertelson. 2005. The aftereffects of ventriloquism: Generalization across sound-frequencies. Acta Psychologica 118:93–100. Fu, K.M., A.S. Shah, M.N. O’Connell et al. 2004. Timing and laminar profile of eye-position effects on auditory responses in primate auditory cortex. Journal of Neurophysiology 92:3522–31. Ghazanfar, A.A. 2009. The multisensory roles for auditory cortex in primate vocal communication. Hearing Research 258:113–20. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10:278–85. Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12. Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience 28:4457–69.
46
The Neural Bases of Multisensory Processes
Giard, M.H., and F. Peronnet. 1999. Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–90. Goodale, M.A., and D.A. Westwood. 2004. An evolving view of duplex vision: Separate but interacting cortical pathways for perception and action. Current Opinion in Neurobiology 14:203–11. Griffiths, T.D., J.D. Warren, S.K. Scott, I. Nelken, and A.J. King. 2004. Cortical processing of complex sound: A way forward? Trends in Neuroscience 27:181–5. Groh J.M., A.S. Trause, A.M. Underhill, K.R. Clark, and S. Inati. 2001. Eye position influences auditory responses in primate inferior colliculus. Neuron 29:509–18. Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain Research 817:45–58. Hackett, T.A., L.A. De La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C.E. Schroeder. 2007a. Multisensory convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology 502:924–52. Hackett, T.A., J.F. Smiley, I. Ulbert et al. 2007b. Sources of somatosensory input to the caudal belt areas of auditory cortex. Perception 36:1419–30. Hall, A.J., and S.G. Lomber. 2008. Auditory cortex projections target the peripheral field representation of primary visual cortex. Experimental Brain Research 190:413–30. Hall, D.A., and C.J. Plack. 2009. Pitch processing sites in the human auditory brain. Cerebral Cortex 19:576–85. Harrington, I.A., G.C. Stecker, E.A. Macpherson, and J.C. Middlebrooks. 2008. Spatial sensitivity of neurons in the anterior, posterior, and primary fields of cat auditory cortex. Hearing Research 240:22–41. Hartline, P.H., R.L. Vimal, A.J. King, D.D. Kurylo, and D.P. Northmore. 1995. Effects of eye position on auditory localization and neural representation of space in superior colliculus of cats. Experimental Brain Research 104:402–8. Imaizumi, K., N.J. Priebe, P.A. Crum, P.H. Bedenbaugh, S.W. Cheung, and C.E. Schreiner. 2004. Modular functional organization of cat anterior auditory field. Journal of Neurophysiology 92:444–57. Jay, M.F., and D.L. Sparks. 1987. Sensorimotor integration in the primate superior colliculus: II. Coordinates of auditory signals. Journal of Neurophysiology 57:35–55. Jenison, R.L. 2000. Correlated cortical populations can enhance sound localization performance. Journal of the Acoustical Society of America 107:414–21. Kacelnik, O., F.R. Nodal, C.H. Parsons, and A.J. King. 2006. Training-induced plasticity of auditory localization in adult mammals. PLoS Biology 4:627–38. Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation of specific fields in auditory cortex. Journal of Neuroscience 27:1824–35. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–74. Keniston, L.P., B.L. Allman, and M.A. Meredith. 2008. The rostral suprasylvian sulcus (RSSS) of the ferret: A ‘new’ multisensory area. Society for Neuroscience Abstracts 38:457.10. Keniston, L.P., B.L. Allman, M.A. Meredith, and H.R. Clemo. 2009. Somatosensory and multisensory properties of the medial bank of the ferret rostral suprasylvian sulcus. Experimental Brain Research 196:239–51. Kim, R.S., A.R. Seitz, and L. Shams. 2008. Benefits of stimulus congruency for multisensory facilitation of visual learning. PLoS ONE 3:e1532. King, A.J. 2009. Visual influences on auditory spatial learning. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 364:331–9. King, A.J., and M.E. Hutchings. 1987. Spatial response properties of acoustically responsive neurons in the superior colliculus of the ferret: A map of auditory space. Journal of Neurophysiology 57:596–624. King, A.J., and I. Nelken. 2009. Unraveling the principles of auditory cortical processing: Can we learn from the visual system? Nature Neuroscience 12:698–701. King, A.J., and J.C. Middlebrooks. 2011. Cortical representation of auditory space. In The Auditory Cortex, eds. J.A. Winer and C.E. Schreiner, 329–41. New York: Springer. King, A.J., J.W. Schnupp, and T.P. Doubell. 2001. The shape of ears to come: Dynamic coding of auditory space. Trends in Cognitive Sciences 5:261–70. Las, L., A.H. Shapira, and I. Nelken. 2008. Functional gradients of auditory sensitivity along the anterior ectosylvian sulcus of the cat. Journal of Neuroscience 28:3657–67. Lewald, J. 2002. Rapid adaptation to auditory–visual spatial disparity. Learning and Memory 9:268–78. Loftus, W.C., and M.L. Sutter. 2001. Spectrotemporal organization of excitatory and inhibitory receptive fields of cat posterior auditory field neurons. Journal of Neurophysiology 86:475–91.
Auditory Cortex according to Multisensory Processing
47
Lomber, S.G., and S. Malhotra. 2008. Double dissociation of ‘what’ and ‘where’ processing in auditory cortex. Nature Neuroscience 11:609–16. Maeder, P.P., R.A. Meuli, M. Adriani et al. 2001. Distinct pathways involved in sound recognition and localization: A human fMRI study. Neuroimage 14:802–16. Malhotra, S., and S.G. Lomber. 2007. Sound localization during homotopic and heterotopic bilateral cooling deactivation of primary and nonprimary auditory cortical areas in the cat. Journal of Neurophysiology 97:26–43. Manger, P.R., I. Masiello, and G.M. Innocenti. 2002. Areal organization of the posterior parietal cortex of the ferret (Mustela putorius). Cerebral Cortex 12:1280–97. Manger, P.R., H. Nakamura, S. Valentiniene, and G.M. Innocenti. 2004. Visual areas in the lateral temporal cortex of the ferret (Mustela putorius). Cerebral Cortex 14:676–89. Manger, P.R., G. Engler, C.K. Moll, and A.K. Engel. 2005. The anterior ectosylvian visual area of the ferret: A homologue for an enigmatic visual cortical area of the cat? European Journal of Neuroscience 22:706–14. Manger, P.R., G. Engler, C.K. Moll, and A.K. Engel. 2008. Location, architecture, and retinotopy of the anteromedial lateral suprasylvian visual area (AMLS) of the ferret (Mustela putorius). Visual Neuroscience 25:27–37. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–8. McLaughlin, D.F., R.V. Sonty, and S.L. Juliano. 1998. Organization of the forepaw representation in ferret somatosensory cortex. Somatosensory & Motor Research 15:253–68. Meredith, M.A., L.R. Keniston, L.R. Dehner, and H.R. Clemo. 2006. Crossmodal projections from somatosensory area SIV to the auditory field of the anterior ectosylvian sulcus (FAES) in cat: Further evidence for subthreshold forms of multisensory processing. Experimental Brain Research 172:472–84. Merigan, W.H., and J.H. Maunsell. 1993. How parallel are the primate visual pathways? Annual Review of Neuroscience 16:369–402. Middlebrooks, J.C., and E.I. Knudsen. 1984. A neural code for auditory space in the cat’s superior colliculus. Journal of Neuroscience 4:2621–34. Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditoryvisual interactions during early sensory processing in humans: A high-density electrical mapping study. Brain Research Cognitive Brain Research 14:115–28. Musacchia, G., and C.E. Schroeder. 2009. Neuronal mechanisms, response dynamics and perceptual functions of multisensory interactions in auditory cortex. Hearing Research 258:72–9. Nakamoto, K.T., S.J. Jones, and A.R. Palmer. 2008. Descending projections from auditory cortex modulate sensitivity in the midbrain to cues for spatial position. Journal of Neurophysiology 99:2347–56. Nelken, I., G. Chechik, T.D. Mrsic-Flogel, A.J. King, and J.W. Schnupp. 2005. Encoding stimulus information by spike numbers and mean response time in primary auditory cortex. Journal of Computational Neuroscience 19:199–221. Nelken, I., J.K. Bizley, F.R. Nodal, B. Ahmed, A.J. King, and J.W. Schnupp. 2008. Responses of auditory cortex to complex stimuli: Functional organization revealed using intrinsic optical signals. Journal of Neurophysiology 99:1928–41. Passamonti, C., C. Bertini, and E. Ladavas. 2009. Audio-visual stimulation improves oculomotor patterns in patients with hemianopia. Neuropsychologia 47:546–55. Philipp, R., C. Distler, and K.P. Hoffmann. 2006. A motion-sensitive area in ferret extrastriate visual cortex: An analysis in pigmented and albino animals. Cerebral Cortex 16:779–90. Phillips, D.P., and S.S. Orman. 1984. Responses of single neurons in posterior field of cat auditory cortex to tonal stimulation. Journal of Neurophysiology 51:147–63. Radeau, M., and P. Bertelson. 1974. The after-effects of ventriloquism. Quarterly Journal of Experimental Psychology 26:63–71. Ramsay, A.M., and M.A. Meredith. 2004. Multiple sensory afferents to ferret pseudosylvian sulcal cortex. Neuroreport 15:461–5. Rauschecker, J.P., and B. Tian. 2000. Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences of the United States of America 97:11800–6. Recanzone, G.H. 1998. Rapidly induced auditory plasticity: The ventriloquism aftereffect. Proceedings of the National Academy of Sciences of the United States of America 95:869–75. Recanzone, G.H. 2000. Spatial processing in the auditory cortex of the macaque monkey. Proceedings of the National Academy of Sciences of the United States of America 97:11829–35. Recanzone, G.H. 2008. Representation of con-specific vocalizations in the core and belt areas of the auditory cortex in the alert macaque monkey. Journal of Neuroscience 28:13184–93.
48
The Neural Bases of Multisensory Processes
Redies, C., M. Diksic, and H. Riml. 1990. Functional organization in the ferret visual cortex: A double-label 2-deoxyglucose study. Journal of Neuroscience 10:2791–803. Renier, L.A., I. Anurova, A.G. De Volder, S. Carlson, J. Vanmeter, and J.P. Rauschecker. 2009. Multisensory integration of sounds and vibrotactile stimuli in processing streams for “what” and “where.” Journal of Neuroscience 29:10950–60. Rice, F.L., C.M. Gomez, S.S. Leclerc, R.W. Dykes, J.S. Moon, and K. Pourmoghadam. 1993. Cytoarchitecture of the ferret suprasylvian gyrus correlated with areas containing multiunit responses elicited by stimulation of the face. Somatosensory & Motor Research 10:161–88. Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology 50:19–26. Romanski, L.M., B. Tian, J. Fritz, M. Mishkin, P.S. Goldman-Rakic, and J.P. Rauschecker. 1999. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience 2:1131–6. Schreiner, C.E., and M.S. Cynader. 1984. Basic functional organization of second auditory cortical field (AII) of the cat. Journal of Neurophysiology 51:1284–305. Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Research Cognitive Brain Research 14:187–98. Schroeder, C.E., P. Lakatos, Y. Kajikawa, S. Partan, and A. Puce. 2008. Neuronal oscillations and visual amplification of speech. Trends in Cognitive Sciences 12:106–13. Shelton, B.R., and C.L. Searle. 1980. The influence of vision on the absolute identification of sound-source position. Perception & Psychophysics 28:589–96. Smiley, J.F., T.A. Hackett, I. Ulbert et al. 2007. Multisensory convergence in auditory cortex, I. Cortical connections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology 502:894–923. Stecker, G.C., B.J. Mickey, E.A. Macpherson, and J.C. Middlebrooks. 2003. Spatial sensitivity in field PAF of cat auditory cortex. Journal of Neurophysiology 89:2889–903. Stein, B.E., and T.R. Stanford. 2008. Multisensory intergration: Current issues from the perspective of the single neuron. Nature Reviews. Neuroscience 9:1477–85. Stein, B.E., W.S. Huneycutt, and M.A. Meredith. 1988. Neurons and behavior: The same rules of multisensory integration apply. Brain Research 448:355–8. Stein, B.E., M.A. Meredith, W.S. Huneycott, and L. Mcdade. 1989. Behavioral indices of multisensory integration: Orientation of visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience 1:12–24. Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26:212–15. Tian, B., D. Reser, A. Durham, A. Kustov, and J.P. Rauschecker. 2001. Functional specialization in rhesus monkey auditory cortex. Science 292:290–3. Thomas, H., J. Tillein, P. Heil, and H. Scheich. 1993. Functional organization of auditory cortex in the mongolian gerbil (Meriones unguiculatus). I. Electrophysiological mapping of frequency representation and distinction of fields. European Journal of Neuroscience 5:882–97. Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences of the United States of America 101:2167–72. Warren, J.D., and T.D. Griffiths. 2003. Distinct mechanisms for processing spatial sequences and pitch sequences in the human auditory brain. Journal of Neuroscience 23:5799–804. Werner-Reiss, U., K.A. Kelly, A.S. Trause, A.M. Underhill, and J.M. Groh. 2003. Eye position affects activity in primary auditory cortex of primates. Current Biology 13:554–62. Woods, T.M., and G.H. Recanzone. 2004. Visually induced plasticity of auditory spatial perception in macaques. Current Biology 14:1559–64. Woods, T.M., S.E. Lopez, J.H. Long, J.E. Rahman, and G.H. Recanzone. 2006. Effects of stimulus azimuth and intensity on the single-neuron activity in the auditory cortex of the alert macaque monkey. Journal of Neurophysiology 96:3323–37. Wright, B.A., and Y. Zhang. 2006. A review of learning with normal and altered sound-localization cues in human adults. International Journal of Audiology 45 Suppl 1, S92–8. Zvyagintsev, M., A.R. Nikolaev, H. Thonnessen, O. Sachs, J. Dammers, and K. Mathiak. 2009. Spatially congruent visual motion modulates activity of the primary auditory cortex. Experimental Brain Research 198:391–402.
Section II Neurophysiological Bases
4
Are Bimodal Neurons the Same throughout the Brain? M. Alex Meredith, Brian L. Allman, Leslie P. Keniston, and H. Ruth Clemo
CONTENTS 4.1 Introduction............................................................................................................................. 51 4.2 Methods................................................................................................................................... 52 4.2.1 Surgical Procedures..................................................................................................... 52 4.2.2 Recording..................................................................................................................... 52 4.2.3 Data Analysis............................................................................................................... 53 4.3 Results...................................................................................................................................... 54 4.3.1 Anterior Ectosylvian Sulcal Cortex............................................................................. 54 4.3.2 Posterolateral Lateral Suprasylvian Cortex................................................................. 54 4.3.3 Rostral Suprasylvian Sulcal Cortex............................................................................. 59 4.3.4 Superior Colliculus...................................................................................................... 59 4.4 Discussion................................................................................................................................60 4.4.1 Bimodal Neurons with Different Integrative Properties.............................................60 4.4.2 Bimodal Neurons in SC and Cortex Differ.................................................................60 4.4.3 Bimodal Neurons in Different Cortical Areas Differ..................................................60 4.4.4 Population Contribution to Areal Multisensory Function........................................... 61 4.4.5 Methodological Considerations................................................................................... 62 4.5 Conclusions.............................................................................................................................. 63 Acknowledgments............................................................................................................................. 63 References......................................................................................................................................... 63
4.1 INTRODUCTION It is a basic tenet of neuroscience that different neural circuits underlie different functions or behaviors. For the field of multisensory processing, however, this concept appears to be superseded by the system’s requirements: convergence of inputs from different sensory modalities onto individual neurons is the requisite, defining step. This requirement is fulfilled by the bimodal neuron, which has been studied for half a century now (Horn and Hill 1966) and has come to represent the basic unit of multisensory processing (but see Allman et al. 2009). Bimodal neurons are ubiquitous: they are found throughout the neuraxis and in nervous systems across the animal kingdom (for review, see Stein and Meredith 1993). Bimodal (and trimodal) neurons exhibit suprathreshold responses to stimuli from more than one sensory modality, and often integrate (a significant response change when compared with unisensory responses) those responses when the stimuli are combined. As revealed almost exclusively by studies of the superior colliculus (SC), bimodal neurons integrate multisensory information according to the spatial, temporal, and physical parameters of the stimuli involved (for review, see Stein and Meredith 1993). The generality of these principles and the 51
52
The Neural Bases of Multisensory Processes Rostral suprasylvian
Posterolateral lateral suprasylvian Superior colliculus
Anterior ectosylvian
FIGURE 4.1 Lateral view of cat brain depicts multisensory recording sites in cortex and midbrain.
broadness of their applicability appeared to be confirmed by similar findings in cortical bimodal neurons (Wallace et al. 1992) and overt multisensory behaviors (Stein et al. 1989). Although it has been generally assumed that bimodal neurons are essentially the same, an insightful study of multisensory integration in bimodal SC neurons demonstrated that bimodal neurons exhibit different functional ranges (Perrault et al. 2005). Some bimodal neurons were highly integrative and exhibited integrated, superadditive (combined response > sum of unisensory responses) responses to a variety of stimulus combinations, whereas others never produced superadditive levels despite the full range of stimuli presented. In this highly integrative structure, approximately 28% of the bimodal neurons showed multisensory integration in the superadditive range. Thus, within the SC, there was a distribution of bimodal neurons with different functional ranges. Hypothetically, if this distribution were altered, for example, in favor of low-integrating bimodal neurons, then it would be expected that the overall SC would exhibit lower levels of multisensory processing. Because many studies of cortical multisensory processing reveal few examples of superadditive levels of integration (e.g., Meredith et al. 2006; Clemo et al. 2007; Allman and Meredith 2007; Meredith and Allman 2009), it seems possible that bimodal cortical neurons also exhibit functional ranges like those observed in the SC, but do so in different proportions. Therefore, the present investigation reviewed single-unit recording data derived from several different cortical areas and the SC (as depicted in Figure 4.1) to address the possibility that bimodal neurons in different parts of the brain might exhibit different integrative properties that occur in area-specific proportions.
4.2 METHODS 4.2.1 Surgical Procedures A two-part implantation/recording procedure was used as described in detail in previous reports (Meredith and Stein 1986; Meredith et al. 2006). First, the animals were anesthetized (pentobarbital, 40 mg/kg) and their heads were secured in a stereotaxic frame. Sterile techniques were used to perform a craniotomy that exposed the targeted recording area and a recording well was implanted over the opening. The scalp was then sutured closed around the implant and routine postoperative care was provided. Approximately 7 to 10 days elapsed before the recording experiment.
4.2.2 Recording Recording experiments were initiated by anesthetizing the animal (ketamine, 35 mg/kg, and acepromazine, 3.5 mg/kg initial dose; 8 with 1 mg kg−1 h−1 supplements, respectively) and securing the implant to a supporting bar. A leg vein was cannulated for continuous administration of fluids, supplemental anesthetics, and to prevent spontaneous movements, a muscle relaxant (pancronium bromide, 0.3 mg/kg initial dose; 0.2 mg kg−1 h−1 supplement). The animal was intubated through
Are Bimodal Neurons the Same throughout the Brain?
53
the mouth and maintained on a ventilator; expired CO2 was monitored and maintained at ~4.5%. A glass-insulated tungsten electrode (impedance <1.0 MΩ) was used for recording. A hydraulic microdrive was used to advance the electrode and to record the depth of identified neurons. Neuronal activity was amplified and routed through a counter (for SC recordings) or to a PC for storage and analysis (for cortical recordings). Neurons were identified by their spontaneous activity and by their responses to somatosensory (puffs of air through a pipette, brush strokes and taps, manual pressure and joint movement, taps, and stroking by calibrated von Frey hairs), auditory (claps, clicks, whistles, and hisses), and/or visual (flashed or moving spots or bars of light from a handheld ophthalmoscope projected onto the translucent hemisphere, or dark stimuli from a rectangular piece of black cardboard) search stimuli. Sensory receptive fields were mapped using adequate stimuli in each modality and were graphically recorded. During recording, the depth of each identified neuron was noted and tabulated along with its sensory responsivity (e.g., auditory, visual, somatosensory, bimodal, or trimodal) and level of evoked stimulation activity obtained during quantitative tests (see below). Multiple recording penetrations were performed in a single experiment and successful recording penetrations were marked with a small electrolytic lesion. At the conclusion of the experiment, the animal was euthanized and the brain fixed and blocked stereotaxically. Standard histological techniques were used to stain and mount the tissue. A projecting microscope was used to trace sections and to reconstruct recording penetrations from the lesion sites. For selected neurons in each recording area, quantitative tests were conducted to document their responses to sensory/multisensory stimulation. Electronically gated, repeatable somatosensory, auditory, and visual stimuli were presented. Somatosensory stimuli were produced by an electronically driven, modified shaker (Ling, 102A) whose amplitude, velocity, and temporal delay were independently set to either indent the skin or deflect hairs. Auditory stimulation consisted of a white noise burst, 100 ms duration, generated by a solenoid-gated air hose (for some SC recordings), or an electronic waveform played through a hoop-mounted speaker (for all other recordings) positioned in contralateral auditory space. Visual stimuli were generated by a projector that cast an image of a light bar through a rotating prism (to determine angle of trajectory) onto a galvanometer-driven mirror (to affect delay, amplitude, and velocity of movement). This image was projected onto a translucent Plexiglas hemisphere (92 cm diameter) positioned in front of the animal. Visual stimuli of effective size and luminosity were moved through the visual receptive field at an effective orientation, direction, and speed. These controlled somatosensory, auditory, and visual stimuli were presented alone and in paired combinations (i.e., visual–auditory, auditory–somatosensory, visual– somatosensory). An interstimulus interval of 7 to 15 s was used to avoid habituation; each test was repeated 10 to 25 times.
4.2.3 Data Analysis For cortical recordings, neuronal activity was digitized (rate >25 kHz) using Spike2 (Cambridge Electronic Design) software and sorted by waveform template for analysis. Then, for each test condition (somatosensory alone, somatosensory–auditory combined, etc.), a peristimulus time histogram was generated from which the mean spike number per trial (and standard deviation) was calculated. For the SC recordings, the online spike counter displayed trial-by-trial spike counts for each of the stimulus conditions, from which these values were recorded and the mean spike number per trial (and standard deviation) was calculated. A paired, two-tailed t-test used to statistically compare the responses to the combined stimuli to that of the most effective single stimulus, and responses that showed a significant difference (p < .05) were defined as response interactions (Meredith and Stein 1986, 1996). The magnitude of a response interaction was estimated by the following formula: (C – M)/M × 100 = %, where C is the response to the combined stimulation, and M is the maximal response to the unimodal stimulation (according to the criteria of Meredith and Stein 1986). Summative responses were evaluated by comparing the responses evoked by the combined stimuli to the sum of the responses elicited by the same stimuli presented separately.
54
The Neural Bases of Multisensory Processes
4.3 RESULTS 4.3.1 Anterior Ectosylvian Sulcal Cortex The banks of the anterior ectosylvian sulcus (AES) contain auditory (field of the AES; Clarey and Irvine 1990), visual (AEV; Olson and Graybiel 1987), and somatosensory (SIV; Clemo and Stein 1983) representations. Numerous studies of this region have identified bimodal neurons (Wallace et al. 1992; Rauschecker and Korte 1993; Jiang et al. 1994a, 1994b) particularly at the intersection of the different sensory representations (Meredith 2004; Carriere et al. 2007). The bimodal neurons described in the present study were collected during the recordings reported by Meredith and Allman (2009). Neurons were identified in six penetrations in three cats, of which 24% (n = 46/193) were bimodal. These neurons exhibited suprathreshold responses to independent presentations of auditory and visual (n = 39), auditory and somatosensory (n = 6), or visual and somatosensory (n = 1) stimuli. A typical example is illustrated in Figure 4.2, where the presentation of either auditory or visual stimuli vigorously activated this neuron. Furthermore, the combination of visual and auditory stimuli induced an even stronger response representing a significant (p < .05, paired t-test) enhancement of activity (36%) over that elicited by the most effective stimulus presented alone (see Meredith and Stein 1986 for criteria). This response increment was representative of bimodal AES neurons because the population average level of enhancement was 34% (see Figure 4.3). This modest level of multisensory integration was collectively achieved by neurons of widely different activity levels. As illustrated in Figure 4.4, responses to separate or combined-modality stimulation achieved between an average of 1 and 50 spikes/trial [response averages to the weakest (5.1 ± 4.9 standard deviation (SD)) and best (8.9 ± 7.9 SD) separate stimuli and to combined-modality stimulation (11.7 ± 9.9 SD) are also shown in Figure 4.3]. However, only a minority (46%; n = 21/46) of bimodal neurons showed response enhancement to the available stimuli and most showed levels of activity that plotted close to the line of unity in Figure 4.4. Figure 4.5 shows that the highest levels of enhancement were generally achieved in those neurons with lower levels of unimodal response activity. Specifically, the neurons showing >75% response change (average 130%) exhibited responses to unimodal stimuli that averaged 6.6 spikes/trial. As illustrated in Figure 4.6, however, most (85%; n = 39/46) bimodal neurons demonstrated response enhancements of <75%. In addition, a few (11%; 5/46) AES bimodal neurons even showed smaller responses to combined-modality stimulation than to the most effective unimodal stimulus. Another measure of multisensory processing is the proportional relationship of the activity evoked by the combined stimuli to that of the sum of responses to the different separate-modality stimuli (e.g., King and Palmer 1985). This analysis for bimodal AES neurons is presented in Figure 4.7, which indicates that fewer neurons (17%; n = 8/46) show superadditive activity compared with those that show statistically significant levels of response enhancement (46%; n = 21/46). Given that bimodal neurons represent only about 25% of the AES neurons (Jiang et al. 1994b; Meredith and Allman 2009), and that multisensory integration occurs in a portion of that population (17–46%, depending on the criterion for integration), these data suggest that integrated multisensory signals in response to effective sensory stimuli contribute to a small portion of the output from the AES.
4.3.2 Posterolateral Lateral Suprasylvian Cortex The auditory cortices of the middle ectosylvian gyrus are bordered, medially, by the suprasylvian sulcus whose banks contain the lateral suprasylvian visual areas. Containing a representation of the contralateral upper visual hemifield, the posterolateral lateral suprasylvian (PLLS) visual area (Palmer et al. 1978) is bordered, laterally, by the dorsal zone of auditory cortex (Stecker et al. 2005). Largely along this lateral border, the PLLS contains bimodal visual–auditory neurons whose visual receptive fields are restricted to eccentric portions of visual space >40° (Allman and Meredith
600
V
0
Time (ms)
0 600
A
S
V
6
0
15
0 A
A
S
V
AV
SA
0%
36%
*
0
600
0
60
0
600
(d) Superior colliculus A
0
20
V
(b) Posterolateral lateral suprasylvian area V A
A V
V A
0
25
0
8
A
A
V
*
AV
*
AV
302%
V
39%
FIGURE 4.2 For each recording area (a–d), individual bimodal neurons showed responses to both unimodal stimuli presented separately as well as to their combination stimuli, as illustrated by rasters (1 dot = 1 spike) and histograms (10 ms time bins). Waveforms above each raster/histogram indicate stimulation condition (square wave labeled “A” = auditory; ramp labeled “V” = visual; ramp labeled “S” = somatosensory; presented separately or in combination). Bar graphs depict mean (and standard deviation) of responses to different stimulus conditions; numerical percentage indicates proportional difference between the most effective unimodal stimulus and the response elicited by stimulus combination (i.e., integration). Asterisk (*) indicates that response change between these two conditions was statistically significant (p < .05 paired t-test).
0
20
A
S
(c) Rostral suprasylvian sulcal area
0 0
40
Spikes
A
Mean spikes/trial
(a) Anterior ectosylvian sulcal area A
Are Bimodal Neurons the Same throughout the Brain? 55
56
The Neural Bases of Multisensory Processes
Mean response (spikes/trial)
20
AES
15
34 ± 7 % 24 ± 4 %
10
5
0 20
Mean response (spikes/trial)
PLLS
Lowest unimodal
Best unimodal
Lowest unimodal
Combined
RSS
Best unimodal
Combined
88 ± 12 %
SC
15
10
37 ± 4 %
5
0
Lowest unimodal
Best unimodal
Lowest unimodal
Combined
Best unimodal
Combined
FIGURE 4.3 For each recording area, average response levels (and standard error of the mean [SEM]) for population of bimodal neurons. Responses to unimodal stimuli were grouped by response level (lowest, best), not by modality. Percentage (and SEM) indicates proportional change between the best unimodal response and that elicited by combined stimulation (i.e., integration). In each area, combined response was statistically greater than that evoked by the most effective unimodal stimulus (p < .05; paired t-test).
Combined response (spikes/trial)
50 40
40
30
30
20
20
10
10
0
0
50
Combined response (spikes/trial)
50
AES
10
20
30
40
Best unimodal response (spikes/trial)
50
0
40
30
30
20
20
10
10
0
0
50
RSS
40
0
PLLS
10
20
30
40
Best unimodal response (spikes/trial)
50
0
10
20
30
40
50
10
20
30
40
50
Best unimodal response (spikes/trial) SC
0
Best unimodal response (spikes/trial)
FIGURE 4.4 For neural areas sampled, response of a given bimodal neuron to the most effective unimodal stimulus (x axis) was plotted against its response to stimulus combination (y axis). For the most part, bimodal neurons in each area showed activity that almost always plotted above line of unity (dashed line).
57
Are Bimodal Neurons the Same throughout the Brain?
Interaction (%)
400
400
AES
300
300
200
200
100
100
0
0
10
20
0
–200
Best unimodal response (spikes/trial)
400
–200
300
200
200
100
100 0
10
20
–100 –200
10
20
30
Best unimodal response (spikes/trial)
400
RSS
300
0
0
–100
–100
Interaction (%)
30
PLLS
30
0
SC
0
10
20
30
–100
Best unimodal response (spikes/trial)
–200
Best unimodal response (spikes/trial)
FIGURE 4.5 For each of recording areas, response of a given bimodal neuron to the most effective unimodal stimulus (x axis) was plotted against proportional change (interaction) elicited by combined stimuli (y axis). Most bimodal neurons exhibited interactions > 0, but level of interaction generally decreased with increasing levels of spiking activity.
2007). The bimodal neurons described in the present study were collected during PLLS recordings reported by Allman and Meredith (2007). A total of 520 neurons were identified in eight penetrations in three cats, of which 9% (n = 49/520) were visual–auditory bimodal. A typical example is illustrated in Figure 4.2, where the presentation of either auditory or visual stimuli vigorously activated the neuron. In addition, when the same visual and auditory stimuli were combined, an even stronger response was evoked. The combined response representing a significant (p < .05, paired t-test) enhancement of activity (39%) over that elicited by the most effective stimulus presented alone (see Meredith and Stein 1986 for criteria). This response increment was slightly larger than the average magnitude of integration (24%) seen in the population of bimodal PLLS neurons [response averages to the weakest (4.7 ± 5.4 SD) and best (7.1 ± 6.8 SD) separate stimuli and to combined-modality stimulation (8.8 ± 8.8 SD) are shown in Figure 4.3]. This modest response increment was generated by neurons of widely different activity levels. As illustrated in Figure 4.4, PLLS responses to separate or combined-modality stimulation produced between 1 and 50 mean spikes/trial. However, only a minority (39%; n = 19/49) of bimodal neurons showed significant response enhancement to the available stimuli and most showed levels of activity that plotted close to the line of unity in Figure 4.4. Figure 4.5 shows that levels of response interaction were generally the same across activity levels. Furthermore, all PLLS interaction magnitudes represented <75% change, as also depicted in Figure 4.6. A few (16%; 8/49) PLLS bimodal neurons even showed smaller responses to the combined stimuli than elicited by the most effective unimodal stimulus. Analysis of the proportional change in bimodal PLLS neurons resulting from combined-modality stimulation revealed that even fewer neurons (10; n = 5/49) achieve superadditive levels of activity than statistically significant levels of response enhancement (39%; n = 19/49). Given that bimodal neurons represent only about 25% of the PLLS neurons (Allman and Meredith 2007), and that multisensory integration occurs in a portion of that population (17–46%, depending on the criterion for integration), these data suggest that integrated multisensory signals in response to effective sensory stimuli contribute to a small portion of the output from the PLLS.
58
The Neural Bases of Multisensory Processes
Neurons (%)
60
AES
60
x = 34%
50
50
40
40
30
30
20
20
10
10
0
>–25 –25 to 25 to 75 to 125 to >175 24 74 124 174
0
Interaction (%)
Neurons (%)
60
RSS
50
50
40
40
30
30
20
20
10
10
0
>–25 –25 to 25 to 75 to 125 to >175 24 74 124 174
Interaction (%)
x = 24%
>–25 –25 to 25 to 75 to 125 to >175 24 74 124 174
Interaction (%)
60
x = 37%
PLLS
0
SC
x = 88%
>–25 –25 to 25 to 75 to 125 to >175 24 74 124 174
Interaction (%)
FIGURE 4.6 For each recording area, many bimodal neurons showed low levels of interaction (–25% to 25%). However, only AES and SC exhibited integrated levels in excess of 175%.
100
AES
PLLS
RSS
SC
Statistical summative
Statistical summative
Neurons (%)
80 60 40 20 0
Statistical summative
Statistical summative
FIGURE 4.7 Multisensory interactions in bimodal neurons can be evaluated by statistical (paired t-test between best unimodal and combined responses) or by summative (combined response exceeds sum of both unimodal responses) methods. For each area, fewer combined responses met these criteria using summative rather than statistical methods. However, only in SC was integration (by either method) achieved by >50 of neurons.
Are Bimodal Neurons the Same throughout the Brain?
59
4.3.3 Rostral Suprasylvian Sulcal Cortex As described by Clemo et al. (2007), extracellular recordings were made in three cats in which recording penetrations (n = 27) covered the anterior–posterior extent and depth of the lateral bank of rostral suprasylvian sulcus (RSS; see Figure 4.1 for location). A total of 946 neurons were recorded, of which 24% were identified as bimodal: either auditory–somatosensory neurons (20%; n = 193/946) or audio–visual neurons (4%; n = 35/946). Of these, 86 were tested quantitatively for responses to separate and combined-modality stimulation, of which a representative example is provided in Figure 4.2. This neuron showed a reliable response to the auditory stimulus, and a vigorous response to the somatosensory stimulus. When the two stimuli were combined, a vigorous response was also elicited but did not significantly differ from that of the most effective (somatosensory) stimulus presented alone. In addition, nearly 20% (18/97) of the neurons showed smaller responses to the combined stimuli than to the most effective single-modality stimulus. This low level of multisensory integration was surprising, although not unusual in the RSS. In fact, the majority (66%; 64/97) of RSS bimodal neurons failed to show a significant response interaction to combined stimulation. This effect is evident in the averaged responses of the RSS population, which achieved an average 37% response increase (see Figure 4.3). Also evident from this figure are the comparatively low levels of response evoked by stimuli presented separately (least effective, 1.67 ± 1.2 SD; most effective, 2.8 ± 2.2 SD average spikes/trial) or together (3.6 ± 2.9 SD average spikes/trial). These low response levels are also apparent in Figure 4.4, where responses to best and combined stimulation are plotted for each neuron and, under no condition, was activity measured >20 spikes/trial. This low level of activity may underlie the strong inverse relationship between effectiveness and interactive level, shown in Figure 4.5, because the neurons with the lowest unimodal response values also showed the highest proportional gains. In fact, all of the neurons that showed >75% response change had an average response to the most effective unimodal stimulus of only 0.89 ± 0.5 spikes/ trial. Therefore, the appearance of large proportional changes in these low-activity neurons may be the result of comparisons among low values. With that in mind, the proportion of RSS neurons showing response changes that were more than summative may be artificially large. As shown in Figure 4.7, the proportion of RSS bimodal neurons with significant (34%) or more than summative (20%) changes represented only a third of the sample or less. Given that only 24% of the RSS was identified as bimodal, the small amount of multisensory integration produced by less than one third of participating neurons would indicate that integrated multisensory signals are not a robust indicator of this cortical region.
4.3.4 Superior Colliculus The bimodal SC neurons described in the present study were collected from recordings reported by Meredith and Stein (1983, 1985). A total of 81 bimodal neurons met acceptance criteria (see Methods) were identified from recordings from 20 cats. Of these SC neurons, 62% (n = 50/81) were visual–auditory, 16% (n = 13/81) were visual–somatosensory, 10% (n = 8/81) auditory– somatosensory, and 12% (n = 10/81) were trimodal; these proportions were similar to those reported earlier (Meredith and Stein 1986). A typical example of a bimodal SC neuron is illustrated in Figure 4.2, where the presentation of either auditory or visual stimuli activated the neuron. When the same visual and auditory stimuli were combined, however, a significantly (p < .05 paired t-test) stronger response was evoked. This response to the combined stimulation represented a multisensory enhancement of activity of >300%. Most (77%; n = 62/81) bimodal SC neurons showed significant response enhancement, averaging a magnitude of 88% for the overall population [response averages to the weakest (5.9 ± 6.7 SD) and best (10.9 ± 10.4 SD) separate stimuli and to combinedmodality stimulation (17.4 ± 13.5 SD) are shown in Figure 4.3]. As depicted in Figure 4.4, response enhancement was generated by neurons of widely different activity levels, ranging from 1 to 40 mean spikes/trial. However, Figure 4.5 shows that levels of response enhancement tended to be
60
The Neural Bases of Multisensory Processes
larger for responses with lower levels of activity. Given the levels of enhancement achieved by such a large proportion of SC bimodal neurons, it did not seem surprising that >48% of neurons showed enhancement levels in excess of a 75% change (see Figure 4.6). In contrast, few SC neurons (3%; 3/97) produced combined responses that were lower than that elicited by the most effective singlemodality stimulus. Analysis of the proportional change in bimodal SC neurons resulting from combined-modality stimulation revealed that a majority (56%; n = 45/81) achieved superadditive levels of activity; a large majority also demonstrated statistically significant levels of response enhancement (76%; n = 62/81). Given that bimodal neurons represent a majority of neurons in the deep layers of the SC (63%; Wallace and Stein 1997), and that significant levels of multisensory response enhancement are achieved in more than three-fourths of those, these data suggest that integrated multisensory signals are a robust component of sensory signals in the SC.
4.4 DISCUSSION 4.4.1 Bimodal Neurons with Different Integrative Properties Bimodal neurons clearly differ from one another (Perrault et al. 2005). In the SC, some bimodal neurons are highly integrative and exhibit integrated, superadditive responses to a variety of stimulus combinations, whereas others never produce superadditive levels in spite of the full range of stimuli presented. Thus, different bimodal neurons exhibit different functional ranges. The question of whether bimodal neurons elsewhere in the brain might also exhibit integrative differences was examined in the present study. Bimodal neurons in the AES, PLLS, and RSS were tested for their responses to combined-modality stimuli that revealed that some cortical neurons generated multisensory integrated responses whereas others did not. It should be pointed out that the present study did not make an exhaustive characterization of the integrative capacity of each neuron (as done by Perrault et al. 2005). However, the present sampling methods appear to have overestimated (not underestimated) the proportion of integrative neurons because 45% of the SC sample showed superadditive response levels, whereas fewer (28%) were identified using more intensive methods (Perrault et al. 2005). Regardless of these testing differences, these combined studies indicate that bimodal neurons from across the brain are a diverse group.
4.4.2 Bimodal Neurons in SC and Cortex Differ The SC is well known for its highly integrative neurons, with examples of multisensory response enhancement in excess of 1200% (Meredith and Stein 1986). The present sample of bimodal SC neurons (derived from Meredith and Stein 1983, 1985) showed a range of –11% to 918% change (average 88%) with most (55%; 45/81) neurons showing superadditive responses. In contrast, cortical bimodal neurons (AES, PLLS, and RSS) generated a consistently lower range of integration (–62 to 212; 33% overall average). In fact, only a minority (39%; 75/192) of cortical bimodal neurons exhibited significant multisensory response changes and only 17% (33/192) produced superadditive response levels. As a group, the average level of response interaction was only 17% change from the best unimodal response. In addition, instances where the combined response was less than the maximal unimodal response occurred in 16% of cortical bimodal neurons, but only in 3% of the SC neurons (no such examples were observed in SC by Perrault et al. 2005). Clearly, bimodal neurons in the cortex integrate multisensory information differently from those in the SC.
4.4.3 Bimodal Neurons in Different Cortical Areas Differ Bimodal neurons in different cortical areas also exhibit different capacities for multisensory integration. Proportionally more bimodal AES neurons showed significant response interactions (46%;
61
Are Bimodal Neurons the Same throughout the Brain?
21/46) and higher levels of integration (34% average) than those in the RSS (34%; 33/97 showed significant response change; 24% average). Furthermore, bimodal neurons in these regions showed significantly different (p < .01 t-test) spike counts in response to adequate separate and combinedmodality stimuli. AES neurons averaged 8.9 ± 7.9 SD spikes/trial in response to the most effective separate-modality stimulus, and 11.7 ± 9.9 SD spikes/trial to the combined stimuli. In contrast, RSS neurons averaged 2.8 ± 2.2 SD spikes/trial in response to the most effective separate-modality stimulus, and 3.6 ± 2.9 SD spikes/trial to the combined stimuli. In addition, nearly 20% of RSS neurons showed combined responses that were less than the maximal unimodal responses, compared with 11% of AES bimodal neurons. Thus, by a variety of activity measures, the multisensory processing capacity is clearly different for bimodal neurons in different cortical areas. Measures of multisensory processing in bimodal PLLS neurons appear to fall between those obtained for AES and RSS.
4.4.4 Population Contribution to Areal Multisensory Function The present results indicate that the range of multisensory integration is different for bimodal neurons in different neural areas. Therefore, it should be expected that the performance of different areas will differ under the same multisensory conditions. As illustrated in the left panel of Figure 4.8, some areas contain relatively few bimodal neurons, and those that are present are generally poor multisensory integrators (e.g., those observed in the RSS). In contrast, other areas (e.g., the SC) contain a high proportion of bimodal neurons of which many are strong integrators (right panel Figure 4.8). Furthermore, the data suggest that areas of intermediate multisensory properties also occur (e.g., AES), as schematized by the intermingled low- and high-integrators in the center panel of Figure 4.8. Under these conditions, it is likely that a given multisensory stimulus will simultaneously elicit widely different multisensory responses and levels of integration in these different areas. Furthermore, although the cat SC contains ~63% bimodal (and trimodal) neurons (Wallace and Stein 1997), most cortical areas exhibit bimodal populations of only between 25% and 30% (Rauschecker and Korte 1993; Jiang et al. 1994a, 1994b; Carriere
Low integration
High integration
FIGURE 4.8 Bimodal neurons with different functional modes, when distributed in different proportions, underlie regions exhibiting different multisensory properties. Each panel shows same array of neurons, except that proportions of unisensory (white), low-integrator (gray), and high-integrator (black) multisensory neurons are different. Areas in which low-integrator neurons predominate show low overall levels of multisensory integration (left), whereas those with a large proportion of high-integrators (right) exhibit high levels of multisensory integration. Intermediate proportions of low- and high-integrators collectively generate intermediate levels of multisensory integration at areal level. Ultimately, these arrangements may underlie a range of multisensory processes that occur along a continuum from one extreme (no integration, not depicted) to the other (high integration).
62
The Neural Bases of Multisensory Processes
et al. 2007; Meredith et al. 2006; Clemo et al. 2007; Allman and Meredith 2007). Therefore, from an areal level, the comparatively weak multisensory signal from a cortical area is likely to be further diluted by the fact that only a small proportion of bimodal neurons contribute to that signal. It should also be pointed out that many cortical areas have now been demonstrated to contain subthreshold multisensory (also termed “modulatory”) neurons. These neurons are activated by inputs from only one modality, but that response can be subtly modulated by influences from another to show modest (but statistically significant) levels of multisensory interaction (Dehner et al. 2004; Meredith et al. 2006; Carriere et al. 2007; Allman and Meredith 2007; Meredith and Allman 2009). Collectively, these observations suggest that cortical multisensory activity is characterized by comparatively low levels of integration. In the context of the behavioral/perceptual role of cortex, these modest integrative levels may be appropriate. For example, when combining visual and auditory inputs to facilitate speech perception (e.g., the cocktail party effect), it is difficult to imagine how accurate perception would be maintained if every neuron showed a response change in excess of 1200%. On the other hand, for behaviors in which survival is involved (e.g., detection), multisensory interactions >1200% would clearly provide an adaptive advantage.
4.4.5 Methodological Considerations Several methodological considerations should be appreciated for these results to have their proper context. The results were obtained from cats under essentially the same experimental conditions (single-unit recording under ketamine anesthesia). Data collection for all cortical values was carried out using the same paradigms and equipment. Although the experimental design was the same, the SC data were obtained before the incorporation of computers into experimental methods. Consequently, the different sensory trials were not interleaved but taken in sequential blocks, usually with fewer repetitions (n = 10–16). This is important because the number of trials has recently been demonstrated to be a key factor in determining statistical significance among multisensory interactions (Allman et al. 2009), where the larger number of trials was correlated with more neurons meeting statistical criterion. However, the SC recordings revealed a higher proportion of significantly affected neurons than in the cortex, despite these statistical measures being based on fewer trials for SC neurons (10–16 trials) than for the cortical neurons (25 trials). All cortical sensory tests were conducted in essentially the same manner: adequate (not minimal or maximal) stimuli from each modality were used and they were not systematically manipulated to maximize their integrative product. For this reason, only SC data taken before the spatial and temporal parametric investigations (e.g., Meredith and Stein 1986; Meredith et al. 1987) were included in the present comparative study. The present results are based completely on comparisons of spike counts in response to single- and combined-modality stimulation. It is also possible (indeed likely) that other response measures, such as temporal pattern or information content, may provide reliable indicators of these different effects. In addition, each of these experiments used an anesthetized preparation and it would be expected that effects such as alertness and attention would have an influence on neuronal properties. However, the anesthetic regimen was the same for each of the experiments and the comparisons were made with respect to relative changes within the data sample. Furthermore, it would seem counterintuitive that response subtleties among bimodal neurons would be observable under anesthesia but not in alert animals. However, these issues await empirical evaluation. In an effort to identify cortical areas capable of multisensory processing in humans, studies using noninvasive technologies have adopted the principles of multisensory integration determined at the level of the bimodal neuron in the SC into the criteria by which computational, perceptual, and cognitive multisensory effects could be measured and defined. For example, the metric of superadditivity has been used in neuroimaging studies in a conservative effort
Are Bimodal Neurons the Same throughout the Brain?
63
to avoid “false positives” while identifying sites of multisensory integration within the cortex (see Laurienti et al. 2005 for review). Based on the multisensory characteristics of SC neurons (Perrault et al. 2005), however, Laurienti and colleagues cautioned that multisensory stimuli would not likely generate superadditive responses in the blood oxygenation level–dependent signal as measured by functional magnetic resonance imaging (Laurienti et al. 2005). The results of the present study further support this caution because proportionally fewer cortical neurons reveal superadditive responses than SC neurons (Figure 4.7), and the magnitude of response enhancement is considerably smaller in the cortex (Figure 4.6). On the other hand, given the tenuous relationship between single neuron discharge activity (i.e., action potentials) and brain hemodynamics underlying changes in the blood oxygenation level–dependent signal (Logothetis et al. 2001; Laurienti et al. 2005; Sirotin and Das 2009; Leopold 2009), it remains debatable whether effects identified in single-unit electrophysiological studies are appropriate to characterize/define multisensory processing in neuroimaging studies in the first place. How this issue is resolved, however, does not change the fact that electrophysiological measures of multisensory processing at the neuronal level reveal differences among bimodal neurons from different brain regions.
4.5 CONCLUSIONS Bimodal neurons are known to differ functionally within the same structure, the SC. The present study shows that this variation also occurs within the cortex. Ultimately, by varying the proportional representation of the different types of bimodal neurons (defined by functional ranges), different neural areas can exhibit different levels of multisensory integration in response to the same multisensory stimulus.
ACKNOWLEDGMENTS Collection of superior colliculus data was supported by grants NS019065 (to B.E. Stein) and NS06838 (to M.A. Meredith), that of cortical data was supported by grant NS039460 (to M.A. Meredith).
REFERENCES Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in ‘unimodal’ neurons: Cross-modal subthreshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–549. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2009. Not just for bimodal neurons anymore: The contribution of unimodal neurons to cortical multisensory processing. Brain Topography 21:157–167. Carriere, B.N., D.W. Royal, T.J. Perrault, S.P. Morrison, J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2007. Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology 98:2858–2867. Clarey, J.C., and D.R.F. Irvine. 1990. The anterior ectosylvian sulcal auditory field in the cat: I. An electrophysiological study of its relationship to surrounding auditory cortical fields. Journal of Comparative Neurology 301:289–303. Clemo, H.R., B.L. Allman, M.A. Donlan, and M.A. Meredith. 2007. Sensory and multisensory representations within the cat rostral suprasylvian cortices. Journal of Comparative Neurology 503:110–127. Clemo, H.R., and B.E. Stein. 1983. Organization of a fourth somatosensory area of cortex in cat. Journal of Neurophysiology 50:910–925. Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multisensory convergence. Cerebral Cortex 14:387–403. Horn, G., and R.M. Hill. 1966. Responsiveness to sensory stimulation of units in the superior colliculus and subjacent tectotegmental regions of the rabbit. Experimental Neurology 14:199–223. Jiang, H., F. Lepore, M. Ptito, and J.P. Guillemot. 1994a. Sensory interactions in the anterior ectosylvian cortex of cats. Experimental Brain Research 101:385–396.
64
The Neural Bases of Multisensory Processes
Jiang, H., F. Lepore, M. Ptito, and J.P. Guillermot. 1994b. Sensory modality distribution in the anterior ectosylvian cortex (AEC) of cats. Experimental Brain Research 97:404–414. King, A.J., and A.R. Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus. Experimental Brain Research 60:492–500. Laurienti, P.J., T.J. Perrault, T.F. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research 166:289–297. Leopold, D.A. 2009. Neuroscience: Pre-emptive blood flow. Nature 457:387–388. Logothetis, N.K., J. Pauls, M. Augath, T. Trinath, and A. Oeltermann. 2001. Neurophysiological investigation of the basis of the fMRI signal. Nature 412:150–157. Meredith, M.A. 2004. Cortico-cortical connectivity and the architecture of cross-modal circuits. In Handbook of Multisensory Processes, eds. C. Spence, G. Calvert, and B. Stein, 343–355. Cambridge, MA: MIT Press. Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex. Neuroreport 20:126–131. Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221:389–391. Meredith, M.A., and B.E. Stein. 1985. Descending efferents of the superior colliculus relay integrated multisensory information. Science 227:657–659. Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in the superior colliculus results in multisensory integration. Journal of Neurophysiology 56:640–662. Meredith, M.A., and B.E. Stein. 1996. Spatial determinants of multisensory integration in cat superior colliculus neurons. Journal of Neurophysiology. 75:1843–1857. Meredith, M.A., L.R. Keniston, L.R. Dehner, and H.R. Clemo. 2006. Cross-modal projections from somatosensory area SIV to the auditory field of the anterior ecosylvian sulcus (FAES) in cat: Further evidence for subthreshold forms of multisensory processing. Experimental Brain Research 172:472–484. Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons: I. Temporal factors. Journal of Neuroscience 7:3215–3229. Olson, C.R., and A.M. Graybiel. 1987. Ectosylvian visual area of the cat: Location, retinotopic organization, and connections. Journal of Comparative Neurology 261:277–294. Palmer, L.A., A.C. Rosenquist, and R.J. Tusa. 1978. The retinotopic organization of lateral suprasylvian visual areas in the cat. Journal of Comparative Neurology 177:237–256. Perrault, T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2005. Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. Journal of Neurophysiology 93:2575–2586. Rauschecker, J.P., and M. Korte. 1993. Auditory compensation for early blindness in cat cerebral cortex. Journal of Neuroscience 13:4538–4548. Sirotin, Y.B., and A. Das. 2009. Anticipatory haemodynamic signals in sensory cortex not predicted by local neuronal activity. Nature 457:475–479. Stecker, G.C., I.A. Harrington, E.A. MacPherson, and J.C. Middlebrooks. 2005. Spatial sensitivity in the dorsal zone (area DZ) of cat auditory cortex. Journal of Neurophysiology 94:1267–1280. Stein, B.E., and M.A. Meredith. 1993. Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., M.A. Meredith. W.S. Huneycutt, and L. McDade. 1989. Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience 1:12–24. Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–11147. Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience 17:2429–2444. Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. Integration of multiple sensory inputs in cat cortex. Experimental Brain Research 91:484–488.
5
Audiovisual Integration in Nonhuman Primates A Window into the Anatomy and Physiology of Cognition Yoshinao Kajikawa, Arnaud Falchier, Gabriella Musacchia, Peter Lakatos, and Charles E. Schroeder
CONTENTS 5.1 Behavioral Capacities..............................................................................................................66 5.1.1 Recognition..................................................................................................................66 5.1.2 Fusion and Illusions.....................................................................................................66 5.1.3 Perception.................................................................................................................... 67 5.2 Neuroanatomical and Neurophysiological Substrates.............................................................68 5.2.1 Prefrontal Cortex......................................................................................................... 69 5.2.2 Posterior Parietal Cortex............................................................................................. 71 5.2.3 STP Area..................................................................................................................... 72 5.2.4 MTL Regions............................................................................................................... 73 5.2.5 Auditory Cortex........................................................................................................... 74 5.2.6 Visual Cortex............................................................................................................... 75 5.2.7 Subcortical Regions..................................................................................................... 76 5.3 Functional Significance of Multisensory Interactions............................................................. 77 5.3.1 Influences on Unimodal Perception............................................................................. 77 5.3.1.1 Influence on Temporal Dynamics of Visual Processing............................... 77 5.3.1.2 Sound Localization....................................................................................... 78 5.3.2 AV Recognition........................................................................................................... 79 5.4 Principles of Multisensory Interaction.................................................................................... 79 5.4.1 Inverse Effectiveness................................................................................................... 80 5.4.2 Temporal Contiguity....................................................................................................80 5.4.3 Spatial Contiguity........................................................................................................ 81 5.5 Mechanisms and Dynamics of Multisensory Interaction........................................................ 82 5.5.1 Phase Reset: Mechanisms............................................................................................ 82 5.5.2 Phase Reset: Dependence on Types of Stimuli........................................................... 83 5.6 Importance of Salience in Low-Level Multisensory Interactions........................................... 83 5.6.1 Role of (Top-Down) Attention.....................................................................................84 5.6.2 Attention or Saliency of Stimuli.................................................................................. 85 5.7 Conclusions, Unresolved Issues, and Questions for Future Studies........................................ 85 5.7.1 Complex AV Interactions............................................................................................. 85 5.7.2 Anatomical Substrates of AV Interaction.................................................................... 85 5.7.3 Implication of Motor Systems in Modulation of Reaction Time................................. 85 5.7.4 Facilitation or Information?......................................................................................... 86 65
66
The Neural Bases of Multisensory Processes
5.7.5 Inverse Effectiveness and Temporal Interaction.......................................................... 86 5.7.6 What Drives and What Is Driven by Oscillations?...................................................... 86 5.7.7 Role of Attention.......................................................................................................... 86 Acknowledgment.............................................................................................................................. 87 References......................................................................................................................................... 87
5.1 BEHAVIORAL CAPACITIES Humans can associate a sound with its visual source, where it comes from, how it is produced, and what it means. This association, or audiovisual (AV) integration, also occurs in many nonhuman primate species, and may be used in kin recognition, localization, and social interaction, among other things (Cheney and Seyfarth 1990; Ghazanfar and Santos 2004). These abilities suggest that nonhuman primates integrate sight and sound as humans do: through recognition of AV vocalizations and enhanced perception of audiovisual stimuli.
5.1.1 Recognition One of the most ubiquitous AV functions in everyday human life is recognizing and matching the sight and sounds of other familiar humans. Nonhuman primates can also recognize the sight and sound of a familiar object and can express this association behaviorally. Primates reliably associate coincident auditory and visual signals of conspecific vocalizations (Evans et al. 2005; Ghazanfar and Logothetis 2003; Jordan et al. 2005; Sliwa et al. 2009) and can match pictures to vocal sounds of both conspecifics and familiar humans (Izumi and Kojima 2004; Kojima et al. 2003; Martinez and Matsuzawa 2009). Monkeys can also identify a picture in which the number of individuals matches the number of vocal sounds (Jordan et al. 2005). Although it appears that primates recognize the AV components of a talking face much better when the individual is socially familiar, familiarity does not appear to be a critical component of audiovisual recognition; many of the studies cited above showed that primates can correctly match AV vocalizations from other primate species (Martinez and Matsuzawa 2009; Zangenehpour et al. 2009). Facial movement, on the other hand, appears to be a key component for nonhuman primates in recognizing the vocal behavior of others. When matching a visual stimulus to a vocalization, primates correctly categorized a still face as a mismatch (Izumi and Kojima 2004; Evans et al. 2005; Ghazanfar and Logothetis 2003) and performed poorly when only the back view was presented (Martinez and Matsuzawa 2009). AV matching by monkeys is not limited to facial recognition. Ghazanfar et al. (2002) showed that a rising-intensity sound attracted a monkey’s attention to a similar degree as a looming visual object (Schiff et al. 1962). These auditory and visual signals are signatures of an approaching object. Monkeys preferentially look at the corresponding looming rather than receding visual signal when presented with a looming sound. This was not the case when the monkey was presented with either a receding sound or white noise control stimulus with an amplitude envelope matching that of the looming sound (Maier et al. 2004). Therefore, monkeys presumably form single events by associating sound and visual attributes at least for signals of approaching objects. Taken together, these data indicate that the dynamic structure of the visual stimulus and compatibility between two modalities is vital for AV recognition in primates and suggest a common mechanistic nature across primate species.
5.1.2 Fusion and Illusions For humans, one of the most striking aspects of AV integration is that synchronous auditory and visual speech stimuli seem fused together, and illusions relating to this phenomenon may arise. The McGurk illusion is a case of this sort. When a mismatch between certain auditory and visual syllables occurs (e.g., an auditory “ba” with a visual “ga”), humans often perceive a synthesis of those
Audiovisual Integration in Nonhuman Primates
67
syllables, mostly “da” (McGurk and MacDonald 1976). The illusion persists even when the listener is aware of the mismatch, which indicates that visual articulations are automatically integrated into speech perception (Green et al. 1991; Soto-Faraco and Alsius 2009). Vatakis et al. (2008) examined whether auditory and visual components of monkey vocalizations elicited a fused perception in humans. It is well known that people are less sensitive to temporal asynchrony when auditory and visual components of speech are matched compared to a mismatched condition (called the “unity effect”). Capitalizing on this phenomenon, Vatakis and colleagues used a temporal order judgment task with matched and mismatched sounds and movies of monkey vocalizations across a range of stimulus onset asynchronies (SOA). The unity effect was observed for human speech vocalization, but was not observed when people observed monkey vocalizations. The authors also showed negative results for human vocalizations mimicking monkey vocalizations, suggesting that the fusion of face–voice components is limited to human speech for humans. This may be because of the fact that monkey vocal repertoires are much more limited than those of humans and have a large dissimilarity between facial expressive components and sound (Chakladar et al. 2008; Partan 2002). Another famous AV illusion, called the “ventriloquist effect,” also appears to have a corollary in nonhuman primate perception. The effect is such that under the right conditions, a sound may be perceived as originating from a visual location despite a spatial disparity. After training a monkey to identify the location of a sound source, Recanzone’s group introduced a 20 to 60 min period of spatially disparate auditory (tones) and visual (dots) stimuli (Woods and Recanzone 2004). The consequence of this manipulation appeared in the sound lateralization task as a deviation of the “auditory center spot” in the direction to the location of sound relative to visual fixation spot during the prior task. The underlying neural mechanism of this effect may be similar to the realignment of visual and auditory spatial maps after adapting to an optical prism displacing the visual space (Cui et al. 2008; Knudsen and Knudsen 1989). What about perception of multisensory moving objects? Preferential looking at looming sound and visual signal suggests that monkeys associate sound and visual attributes of approaching objects (Maier et al. 2004). However, longer looking does not necessarily imply fused perception, but may instead suggest the attentional attraction to moving stimuli after assessing their congruency. Fused perception of looming AV signals was supported by human studies, showing the redundant signal effect (see Section 5.1.3 for more details) in reaction time (shorter reaction time to congruent looming AV signals) under the condition of bimodal attention (Cappe et al. 2010; see also Romei et al. 2009 for data suggesting preattentive effects of looming auditory signals). Interestingly, for such an AV looming effect to happen, the spectrum of the sound has to be dynamically structured along with sound intensity. It is not known which other attributes of a visual stimulus, other than motion, could contribute to this effect. It is likely that auditory and visual stimuli must be related, not only in spatial and temporal terms, but also in dynamic spectral dimensions in both modalities in order for an attentional bias or performance enhancement to appear.
5.1.3 Perception Visual influences on auditory perception, and vice versa, is well established in humans (Sumby and Pollack 1954; Raab 1962; Miller 1982; Welch and Warren 1986; Sams et al. 1991; Giard and Peronnet 1999; for review, see Calvert 2001; Stein and Meredith 1993) and has been examined in several studies on nonhuman primates (described below). By using simple auditory and visual stimuli, such as tones and dots, the following studies show that auditory and visual information interact with each other to modulate perception in monkeys. Barone’s group trained monkeys to make a saccade to a visual target that starts to flash at the moment when the fixation point disappears (Wang et al. 2008). In half of the trials, the visual target was presented with a brief task-irrelevant noise. The result was faster saccadic reaction times when the visual target was accompanied with a sound than without it. Frens and Van Opstal (1998) also
68
The Neural Bases of Multisensory Processes
studied the influence of auditory stimulation on saccadic responses in monkeys performing tasks similar to that of Wang et al. (2008). They showed not only a shortening of reaction time, but also that reaction time depended on the magnitude of the spatial and temporal shift between visual and auditory stimuli; smaller distance and closer timing yielded shorter reaction times. These results demonstrated a temporal effect of sound on visual localization. These results are compatible with human psychophysical studies of AV integration (Frens et al. 1995; Diederich and Colonius 2004; Perrott et al. 1990) and suggest that the underlying mechanism may be common to both human and nonhuman primates. Like humans, monkeys have also been shown to have shorter manual reaction times to bimodal targets compared with unimodal targets. In a simple detection task in which a monkey had to report the detection of a light flash (V alone), noise sound (A alone), or both (AV) stimuli by manual response, reaction times to AV stimuli were faster than V alone regardless of its brightness (Cappe et al. 2010; see also Miller et al. 2001, showing similar data for small data sets). When the sound was loud, reaction times to AV stimuli and A alone were not different. When sound intensity was low, the overall reaction time was longer and the response to AV stimuli was still faster than A alone. A study from our laboratory showed that reaction times to perceptual “oddballs,” or novel stimuli in a train of standard stimuli, were faster for AV tokens than for the visual or auditory tokens presented alone (Kajikawa and Schroeder 2008). Monkeys were presented with a series of standard AV stimuli (monkey picture and vocal sound) with an occasional oddball imbedded in the series that differed from the standard in image (V alone), sound (A alone), or both (AV) stimuli. The monkey had to manually respond upon detection of such oddballs. In that case, whereas intensity levels were fixed, reaction times to the AV oddballs were faster than either A alone or V alone oddballs. In addition, the probability of a correct response was highest for the AV oddball and lowest for the A alone condition. Therefore, not only the detection of signals, but also its categorization benefited from AV integration. This pattern of reaction times conforms to the results of human psychophysics studies showing faster reaction time to bimodal than unimodal stimuli (Frens et al. 1995; Diederich and Colonius 2004; Perrott et al. 1990). Observations of faster reaction in response to bimodal compared with unimodal stimuli in different motor systems suggest that AV integration occurs in sensory systems before the motor system is engaged to generate a behavioral response (or that a similar integration mechanism is present in several motor systems). Difference in task demands complicates the ability to define the role of attention in the effect of AV integration on reaction times. In the study conducted by Wang et al. (2008), monkeys were required to monitor only the occurrence of the visual stimulus. Therefore, task-irrelevant sound acted exogenously from outside of the attended sensory domain, that is, it likely drew the monkey’s attention, but this possibility is impossible to assess. In contrast, Cappe et al. (2010) and Kajikawa and Schroeder (2008) used monkeys that were actively paying attention to both visual and auditory modalities during every trial. It is worth noting that the sound stimuli used by Wang et al. (2008) did not act as distracters. Hence, it was possible that monkeys could do the task by paying attention to both task-relevant visual stimuli and task-irrelevant sound (see Section 5.6).
5.2 NEUROANATOMICAL AND NEUROPHYSIOLOGICAL SUBSTRATES In the following sections, we will describe AV interactions in numerous monkey brain regions (Figure 5.1). Investigators have identified AV substrates in broadly two ways: by showing that (1) the region responds to both auditory and visual stimuli or (2) AV stimuli produce neural activity that differs from the unimodal responses presented alone. AV integration has been shown at the early stages of processing, including primary sensory and subcortical areas (for review, see Ghazanfar and Schroeder 2006; Musacchia and Schroeder 2009; Schroeder and Foxe 2005; Stein and Stanford 2008). Other areas that respond to both modalities have been identified in the prefrontal cortex (PFC), the posterior parietal cortex (PPC), the superior temporal polysensory area (STP), and
69
Audiovisual Integration in Nonhuman Primates
2
iS
-R
-lg PV
pt al T ud B Ca lt/P 1 be
VLPFC
al A str B Ro lt/P be
T
FS
DLPFC
7A t
p al T ud B Caelt/P 1 b al A str B Roelt/P b
23 31
V2/V1
Pro
MP MGm SG
Li Po
FIGURE 5.1 (See color insert.) Connections mediating multisensory interactions in primate auditory cortex. Primate auditory cortices receive a variety of inputs from other sensory and multisensory areas. Somatosensory areas (PV, parietoventral area; Ri, retroinsular area; S2, secondary somatosensory cortex) and their projections to auditory cortex are shown in red. Blue areas and lines denote known visual inputs (FST, fundus of superior temporal area; Pro, prostriata; V1, primary visual cortex; V2, secondary visual cortex). Feedback inputs from higher cognitive areas (7A, Brodmann’s area 7A; 23, Brodmann’s area 23; 31, Brodmann’s area 31; DLPFC, dorsolateral prefrontal cortex; VLPFC, ventrolateral prefrontal cortex) are shown in green. Multisensory feedforward inputs from thalamic nuclei (Li, limitans; MP, medial pulvinar; MGm, medial division of medial geniculate; Po, posterior nucleus; SG, suprageniculate nucleus) are shown in purple.
medial temporal lobe (MTL). Even though most studies could not elucidate the relationship between behavior and physiology because they did not test the monkey’s behavior in conjunction with physiological measures, these studies provide promising indirect evidence that is useful in directing future behavioral/physiological studies.
5.2.1 Prefrontal Cortex In the PFC, broad regions have been reported to be multisensory. PFC is proposed to have “what” and “where” pathways of visual object and space information processing segregated into dorsolateral (DLPFC) and ventrolateral (VLPFC) parts of PFC (Goldman-Rakic et al. 1996; Levy and Goldman-Rakic 2000; Ungerleider et al. 1998). Although numerous studies support the idea of segregated information processing in PFC (Wilson et al. 1993), others found single PFC neurons integrated what and where information during a task that required monitoring of both object and location (Rao et al. 1997). It appears that auditory information processing in PFC also divides into analogous “what” (e.g., speaker specific) and “where” (e.g., location specific) domains. The proposed “what” and “where” pathways of the auditory cortical system (Kaas and Hackett 2000; Rauschecker and Tian 2000) have been shown to project to VLPFC and DLPFC, respectively (Hackett et al. 1999; Romanski et al. 1999a, 1999b). Broad areas of the DLPFC were shown to be sensitive to sound location (Artchakov
70
The Neural Bases of Multisensory Processes
et al. 2007; Azuma and Suzuki 1984; Kikuchi-Yorioka and Sawaguchi 2000; Vaadia et al. 1986). Conversely, response selectivity to macaque vocal sounds were found in VLPFC (Cohen et al. 2009; Gifford et al. 2005; Romanski and Goldman-Rakic 2002; Romanski et al. 2005) and orbitofrontal cortex (Rolls et al. 2006). These two areas may correspond to face-selective regions of frontal lobe in nonhuman primates (Parr et al. 2009; Tsao et al. 2008b). Taken together, these findings support the notion that, as in the visual system, sensitivity to location and nonspatial features of sounds are segregated in PFC. Although the dorsolateral stream in PFC has largely been shown to be sensitive to location, auditory responses to species-specific vocalizations were also found in regions of DLPFC in squirrel monkey (Newman and Lindsley 1976; Wollberg and Sela 1980) and macaque monkey (Bon and Lucchetti 2006). Interestingly, visual fixation diminished responses to vocal sounds in some neurons (Bon and Lucchetti 2006). Taken together with the results of Rao et al. (1997) showing that neurons of the “what” and “where” visual stream are distributed over a region spanning both the DLPFC and VLPFC, these studies suggest that the “what” auditory stream might extend outside the VLPFC. Apart from showing signs of analogous processing streams in auditory and visual pathways, PFC is cytoarchitecturally primed to process multisensory stimuli. In addition to auditory cortical afferents, the DLPFC and VLPFC have reciprocal connections with rostral and caudal STP subdivisions (Seltzer and Pandya 1989). The VLPFC also receives inputs from the PPC, a presumed “where” visual region (Petrides and Pandya 2009). Within both the DLPFC and VLPFC, segregated projections of different sensory afferents exist. Area 8 receives projections from visual cortices (occipital and IPS) in its caudal part, and auditory-responsive cortices [superior temporal gyrus (STG) and STP] in its rostral part (Barbas and Mesulam 1981). Similar segregation of visual [inferior temporal (IT)] and auditory (STG and STP) afferents exist within VLPFC (Petrides and Pandya 2002). Thus, DLPFC and VLPFC contain regions receiving both or either one of auditory and visual projections, and those regions are intermingled. Additionally, orbitofrontal cortex and medial PFC receive inputs from IT, STP, and STG (Barbas et al. 1999; Carmichael and Price 1995; Cavada et al. 2000; Kondo et al. 2003; Saleem et al. 2008), and may contribute to AV integration (see Poremba et al. 2003). Not surprisingly, bimodal properties of PFC neurons have been described in numerous studies. Some early studies described neurons responsive to both tones and visual stimuli (Kubota et al. 1980; Aou et al. 1983). However, because these studies used sound as a cue to initiate immediate behavioral response, it is possible that the neuronal response to the sound might be related to motor execution. Other studies of PFC employed tasks in which oculomotor or manual responses were delayed from sensory cues (Artchakov et al. 2007; Ito 1982; Joseph and Barone 1987; KikuchiYorioka and Sawaguchi 2000; Vaadia et al. 1986; Watanabe 1992). Despite the delayed response, populations of neurons still responded to both visual and auditory stimuli. Such responses had spatial tuning and dependence on task conditions such as modality of task and task demands of discrimination, active detection, passive reception (Vaadia et al. 1986), or reward/no reward contingency (Watanabe 1992). One report shows that visuospatial and audiospatial working memory processes seem to share a common neural mechanism (Kikuchi-Yorioka and Sawaguchi 2000). The behavioral tasks used in studies described so far did not require any comparison of visual and auditory events. Fuster et al. (2000) trained monkeys to learn pairing of tones and colors and perform a cross-modal delayed matching task using tones as the sample cue and color signals as the target. They found that PFC neurons in those monkeys had elevated firing during the delay period that was not present on error trials. Therefore, PFC has many neurons responsive to both auditory and visual signals, somehow depending on behavioral conditions, and possibly associates them. Romanski’s group explored multisensory responses in VLPFC (Sugihara et al. 2006), and found that this region may have unimodal visual, unimodal auditory, or bimodal AV responsive regions (Romanski et al. 2002, 2005). Their group used movies, images, and sounds of monkeys producing vocalizations as stimuli, and presented them unimodally or bimodally while subjects fixated. Although neurons responded exclusively to one or both modalities, about half of the neurons
Audiovisual Integration in Nonhuman Primates
71
examined exhibited AV integration as either enhancement or suppression of unimodal response. Because subjects were not required to maintain working memory or make decision, those responses are considered to be sensory. In addition to the above described regions, premotor (PM) areas between the primary motor cortex and the arcuate sulcus contain neurons sensitive to sound and vision. Although most of the neurons in PM respond to somatosensory stimuli, there are neurons that also respond to sound and visual stimuli and have receptive fields spatially registered between different modalities (Graziano et al. 1994, 1999). Those neurons are located in caudal PM particularly coding the space proximal to the face (Fogassi et al. 1996; Graziano et al. 1997; Graziano and Gandhi 2000) as well as defensive actions (Cooke and Graziano 2004a, 2004b). Rostral PM contains audiovisual mirror neurons activity that is elevated not only during the execution of actions but also during the observation of such actions from others. Those neurons generate specific manual actions and respond to sound in addition to the sight of such actions (Keysers et al. 2003; Kohler et al. 2002; Rizzolatti et al. 1996; Rizzolatti and Craighero 2004) and the goal objects of those actions (Murata et al. 1997). Although AV sensitivity in caudal PM seems directly connected to the subject’s actions, rostral PM presumably reflects the cognitive processing of others’ actions. In summary, the PFC is subdivided into various regions based on sensory, motor, and other cognitive processes. Each subdivision contains AV sensitivity that could serve to code locations or objects. There are neurons specialized in coding vocalization, associating sound and visual signals, or engaged in representation/execution of particular motor actions.
5.2.2 Posterior Parietal Cortex The PPC in the monkey responds to different modalities (Cohen 2009), is known to be a main station of the “where” pathway before the information enters PFC (Goodale and Milner 1992; Ungerleider and Mishkin 1982), and is highly interconnected with multisensory areas (see below). PPC receives afferents from various cortices involved in visual spatial and motion processing (Baizer et al. 1991; Cavada and Goldman-Rackic 1989a; Lewis and Van Essen 2000; Neal et al. 1990). The caudal area of PPC has reciprocal connections with multisensory parts of PFC and STS, suggesting that the PPC plays a key role in multisensory integration (Cavada and Goldman-Rackic 1989b; Neal et al. 1990). The ventral intraparietal area receives input from the auditory association cortex of the temporoparietal area (Lewis and Van Essen 2000). The anterior intraparietal area also receives projections from the auditory cortex (Padberg et al. 2005). PPC receives subcortical inputs from the medial pulvinar (Baizer et al. 1993) and superior colliculus (SC; Clower et al. 2001) that may subserve multisensory responses in PPC. Several subregions of PPC are known to be bimodal. An auditory responsive zone in PPC overlaps with visually responsive areas (Poremba et al. 2003). Space-sensitive responses to sound (noise) in several areas of PPC, typically thought to be primarily visual oriented, have been observed in the lateral intraparietal cortex (LIP; Stricane et al. 1996), ventral intraparietal area (Schlack et al. 2005), the medial intraparietal cortex, and the parietal reach region (Cohen and Andersen 2000, 2002). The auditory space-sensitive neurons in PPC also respond to visual stimulation with similar spatial tuning (Mazzoni et al. 1996; Schlack et al. 2005). Furthermore, the spatial tuning of the auditory and visual response properties was sufficiently correlated to be predictive of one another, indicating a shared spatial reference frame across modalities (Mullette-Gilman et al. 2005, 2009). PPC also plays a major role in motor preparation during localization tasks (Andersen et al. 1997). Auditory responses in LIP only appeared after training on memory-guided delayed reaction tasks with auditory and visual stimuli (Grunewald et al. 1999) and disappeared when the sound cue became irrelevant for the task (Linden et al. 1999). These results suggested that auditory responses in PPC were not just sensory activity. Information for encoding spatial auditory cues evolve as the phase of the task progresses but remains constantly higher for visual ones in LIP and parietal reach region (Cohen et al. 2002, 2004). Thus, there is a difference in processing between modalities.
72
The Neural Bases of Multisensory Processes
Even though most PPC studies used simple stimuli such as LED flashes and noise bursts, one study also examined LIP response to vocal sounds and showed that LIP neurons are capable of carrying information of sound acoustic features in addition to spatial location (Gifford and Cohen 2005). In that study, sounds were delivered passively to monkeys during visual fixation. Thus, it seems inconsistent with the previously mentioned findings that manifestation of auditory response in PPC requires behavioral relevance of the sounds (Grunewald et al. 1999; Linden et al. 1999). Nevertheless, that study suggested the possibility that auditory coding in PPC may not be limited to spatial information. Similarly, the existence of face-selective patches was shown in PPC of chimpanzee using PET (Parr et al. 2009). Although these studies suggest AV integration in PPC, responses to stimuli in bimodal conditions have not yet been directly examined in monkeys.
5.2.3 STP Area The STP, located in the anterior region of the superior temporal sulcus, from the fundus to the upper bank, responds to multisensory stimuli in monkeys (Bruce et al. 1981; Desimone and Gross 1979; Schroeder and Foxe 2002; Poremba et al. 2003) and is a putatively key site for AV integration in both monkeys and humans. STP is highly connected to subcortical and cortical multisensory regions. STP receives inputs from presumed multisensory thalamic structures (Yeterian and Pandya 1989) and medial pulvinar (Burton and Jones 1976), and has reciprocal connections with the PFC and other higher-order cortical regions such as PPC, IT cortex, cingulate cortex, MTL, and auditory parabelt regions (Barnes and Pandya 1992; Cusick et al. 1995; Padberg et al. 2003; Saleem et al. 2000; Seltzer et al. 1996; Seltzer and Pandya 1978, 1994). Based on connectivity patterns, area STP can be subdivided into rostral and caudal regions. Its anterior part is connected to the ventral PFC, whereas the caudal part seems to be connected to the dorsal PFC (Seltzer and Pandya 1989). STP exhibits particular selectivity to complex objects, faces, and moving stimuli. STP was shown to have responses to visual objects (Oram and Perrett 1996), and particularly to show some degree of face selectivity (Bruce et al. 1981; Baylis et al. 1987). Face-selectivity was shown to exist in discrete patches in monkeys (Pinsk et al. 2005; Tsao et al. 2006, 2008a) and chimpanzees (Parr et al. 2009), although others found responses to faces over a wide area (Hoffman et al. 2007). Responses to faces are further selective to identity, gaze direction, and/or viewing angle of the presented face (De Souza et al. 2005; Eifuku et al. 2004). Both regions of caudal STS, like MT (Born and Bradley 2005; Duffy and Wurtz 1991; Felleman and Kaas 1984) or MST (Gu et al. 2008; Tanaka et al. 1986) as well as anterior STP (Anderson and Siegel 1999, 2005; Nelissen et al. 2006; Oram et al. 1993), are sensitive to directional movement patterns. Although the caudal STS is regarded as a part of the “where” pathway, the anterior STP is probably not because of its large spatial receptive field size (Bruce et al. 1981, 1986; Oram et al. 1993). Given this and taken together with face selectivity, it stands to reason that anterior STP may be important for the perception or recognition of facial gestures, such as mouth movement. In addition, STP responds to somatosensory, auditory, and visual stimulation. Multisensory responsiveness of neurons in STS was tested in anesthetized (Benevento et al. 1977; Bruce et al. 1981; Hikosaka et al. 1988) and alert monkeys (Baylis et al. 1987; Perrett et al. 1982; Watanabe and Iwai 1991). In both cases, stimuli were delivered unimodally (Baylis et al. 1987; Bruce et al. 1981; Hikosaka et al. 1988) or simple bimodal stimuli (tone and LED flash) were used (Benevento et al. 1977; Watanabe and Iwai 1991). Although auditory and visual selective neurons were present in STG and formed segregated clusters in STP (Dahl et al. 2009), a population of neurons responded to both visual and auditory stimuli (Baylis et al. 1987; Bruce et al. 1981; Hikosaka et al. 1988). When the response to bimodal stimuli was examined, the neural firing rate was either enhanced or reduced compared to unimodal stimuli (Benevento et al. 1977; Watanabe and Iwai 1991). The laminar profile
Audiovisual Integration in Nonhuman Primates
73
of current source density (CSD), which reflects a pattern of afferent termination across cortical layers in response to sounds (click) and lights (flash), indicated that STP receives feedforward auditory and visual inputs to layer IV (Schroeder and Foxe 2002). Lesion studies in STP reveal that the region appears to process certain dimensions of sound and vision used for discrimination. Monkeys with lesions of STG and STP areas showed an impairment of auditory but not visual working memory and auditory pattern discrimination while sparing hearing (Iversen and Mishkin 1973; Colombo et al. 2006). Although IT lesions impair many visual tasks, IT and STP lesions (Aggleton and Mishkin 1990; Eaccott et al. 1993) selectively impair visual discrimination of objects more severely while sparing the performance of other visual tasks. These findings suggest that multisensory responses in STP are not simply sensory, but are involved in cognitive processing of certain aspects of sensory signals. A series of recent studies examined AV integration in STS using more naturalistic stimuli during visual fixation, using sound and sight of conspecific vocalizations, naturally occurring scenes, and artifactual movies (Barraclough et al. 2005; Dahl et al. 2009; Chandrasekaran and Ghazanfar 2009; Ghazanfar et al. 2008; Kayser and Logothetis 2009; Maier et al. 2008). As in previous studies (Benevento et al. 1977; Watanabe and Iwai 1991), neuronal firing to bimodal stimuli was found to be either stronger or weaker when compared to unimodal stimuli. Barraclough et al. (2005) showed that the direction of change in the magnitude of response to AV stimuli from visual response depended on the size of visual response. Incongruent pairs of sound and scenes seem to evoke weaker responses (Barraclough et al. 2005; Maier et al. 2008). To our knowledge, there are no animal studies that used task conditions requiring active behavioral discrimination. Therefore, results may not be conclusive about whether the STS can associate/integrate information of different modalities to form a recognizable identity. However, their bimodal responsiveness, specialization for objects such as faces in the visual modality, and sensitivity to congruence of signals in different modalities suggests that areas in STP are involved in such cognitive processes and/or AV perception.
5.2.4 MTL Regions The MTL is composed of the hippocampus, entorhinal, perirhinal and parahippocampal cortices. These regions are involved in declarative memory formation (Squire et al. 2004) and place coding (McNaughton et al. 2006). The amygdala plays a predominant role in emotional processes (Phelps and LeDoux 2005), some of which may be affected by multisensory conjunction (e.g., in response to “dominant” conspecifics or looming stimuli, as discussed above). The MTL receives various multisensory cortical inputs. Entorhinal cortex (EC), the cortical gate to the hippocampus, receives inputs from STG, STP, IT, and other nonprimary sensory cortices either directly or through parahippocampal and perirhinal cortices (Blatt et al. 2003; Mohedano-Moriano et al. 2007, 2008; Suzuki and Amaral 1994). Auditory, visual, and somatosensory association cortices also project to the nuclei of the amygdala (Kosmal et al. 1997; Turner et al. 1980). Although IT, a part of the ventral “what” pathway (Ungerleider and Mishkin 1982) and the major input stage to MTL, responds mainly to complex visual stimuli, IT can exhibit postauditory sample delay activity during cross-modal delayed match-to-sample tasks, in which auditory sample stimuli (tones or broadband sounds) were used to monitor the type of visual stimuli (Colombo and Gross 1994; Gibson and Maunsell 1997). During the same task, greater auditory responses and delay activity were observed in the hippocampus. Those delay activities presumably reflected the working memory of a visual object associated with sound after learning. In a visual discrimination task that used tone as a warning to inform the start of trials to monkeys, ventral IT neurons responded to this warning sound (Ringo and O’Neill 1993). Such auditory responses did not appear when identical tones were used to signal the end of a trial, thereby indicating that effects were context-dependent.
74
The Neural Bases of Multisensory Processes
In the hippocampus, a small population of neurons responds to both auditory and visual cues for moving tasks in which monkeys control their own spatial translation and position (Ono et al. 1993). Even without task demands, hippocampal neurons exhibit spatial tuning properties to auditory and visual stimuli (Tamura et al. 1992). Neurons in the amygdala respond to face or vocalization of conspecifics passively presented (Brothers et al. 1990; Kuraoka and Nakamura 2007; Leonard et al. 1985). Some neurons respond selectively to emotional content (Hoffman et al. 2007; Kuraoka and Nakamura 2007). Multisensory responses to different sensory cues were also shown in the amygdala of monkeys performing several kinds of tasks to retrieve food or drink, avoid aversive stimuli, or discriminate sounds associated with reward (Nishijo et al. 1988a). These responses reflected affective values of those stimuli rather than the sensory aspect (Nishijo et al. 1988b). These data corroborate the notion that sensory activity in MTL is less likely to contribute to detection, but more related to sensory association, evaluation, or other cognitive processes (Murray and Richmond 2001). The integrity of these structures is presumably needed for the formation and retention of cross-modal associational memory (Murray and Gaffan 1994; Squire et al. 2004).
5.2.5 Auditory Cortex Recent findings of multisensory sensitivity in sensory (early) cortical areas, including primary areas, have revised our understanding of cortical “AV integration” (for review, see Ghazanfar and Schroeder 2006). Before these findings came to light, it was thought that AV integration occurred in higher-order cortices during complex component processing. To date, a large body of work has focused on multisensory mechanisms in the AC. Like some of the seminal findings with human subjects in this field (Sams et al. 1991; Calvert and Campbell 2003), the monkey AC appears to respond to visual stimulus presented alone. Kayser et al. (2007) measured the BOLD signal to natural unimodal and bimodal stimuli over the superior temporal plane. They observed that visual stimuli alone could induce activity in the caudal area of the auditory cortex. In this same area, the auditory-evoked signal was also modulated by cross-modal stimuli. The primate auditory cortex stretches from the fundus of the lateral sulcus (LS) medially to the STG laterally, and has more than 10 defined areas (Hackett 2002; Hackett et al. 2001; Kaas and Hackett 2000). Among auditory cortical areas, the first area in which multisensory responsiveness was examined was the caudal–medial area (CM; Schroeder et al. 2001). In addition to CM, other auditory areas including the primary auditory cortex (A1) were also shown to receive somatosensory inputs (Cappe and Barone 2005; Disbrow et al. 2003; de la Mothe et al. 2006a; Kayser et al. 2005; Lakatos et al. 2007; Smiley et al. 2007; for a review, see Musacchia and Schroeder 2009). Most areas also received multisensory thalamic inputs (de la Mothe 2006b; Hackett et al. 2007; Kosmal et al. 1997). Documented visual inputs to the auditory cortex have thus far originated from STP (Cappe and Barone 2005) as well as from peripheral visual fields of V2 and prostriata (Falchier et al. 2010). Schroeder and Foxe (2002) reported CSD responses to unimodal and bimodal combinations of auditory, visual, and somatosensory stimuli in area CM of the awake macaque. The laminar profiles of CSD activity in response to visual stimuli differed from those of auditory and somatosensory responses. Analysis of activity in different cortical layers revealed that visual inputs targeted the extragranular layers, whereas auditory and somatosensory inputs terminated in the granular layers in area CM. These two termination profiles are in accordance with the pattern of laminar projections of visual corticocortical projections (Falchier et al. 2002; Rockland and Ojima 2003) and primarylike thalamocortical projections (Jones 1998), respectively. In contrast, A1 receives auditory and somatosensory inputs in the granular and supragranular cortical layers, respectively (Lakatos et al. 2007). This suggests that somatosensory input to A1 originates from lateral, feedback, or nonspecific thalamic nuclei connections. Our laboratory showed that attended visual stimuli presented in
Audiovisual Integration in Nonhuman Primates
75
isolation modulate activity in the extragranular layer of A1 (Lakatos et al. 2009) and the same pattern is observed with attended auditory stimuli in V1 (Lakatos et al. 2008). These findings strengthen the hypothesis that nonspecific thalamic projections (Sherman and Guillery 2002) or pulvinar-mediated lateral connections (Cappe et al. 2009) contribute to AV integration in A1. Ghazanfar et al. and Logothetis et al. groups have shown that concurrent visual stimuli influenced auditory cortical response systematically in A1 as well as in the lateral associative auditory cortices and STP (Ghazanfar et al. 2005; Hoffman et al. 2008; Kayser et al. 2007, 2008). These studies used complex and natural AV stimuli, which are more efficient in evoking responses in some nonprimary auditory areas (Petkov et al. 2008; Rauschecker et al. 1995; Russ et al. 2008). Their initial study (Ghazanfar et al. 2005) revealed that movies of vocalizations presented with the associated sounds could modulate local field potential (LFP) responses in A1 and the lateral belt. Kayser et al. (2008) showed visual responses in LFP at frequency bands near 10 Hz. This frequency component responded preferably to faces, and more preference existed in the lateral belt than A1 (Hoffman et al. 2008). However, multiunit activity (MUA) barely showed visual response that correlated in magnitude with the LFP response. AV interactions occurred as a small enhancement in LFP and suppression in MUA (see also Kayser and Logothetis 2009). Although AV integration in areas previously thought to be unisensory are intriguing and provocative, the use of a behavioral task is imperative in order to determine the significance of this phenomenon. Brosch et al. (2005) employed a task in which an LED flash cued the beginning of an auditory sequence. Monkeys were trained to touch a bar to initiate the trial and to signal the detection of a change in the auditory sequence. They found that some neurons in AC responded to LED, but only when the monkey touched the bar after detecting the auditory change. This response disappeared when the monkey had to perform a visual task that did not require auditory attention. Although this may be due in part to the fact that the monkeys were highly trained (or potentially overtrained) on the experimental task, they also point to the importance of engaging auditory attention in evoking responses to visual stimuli. Findings like these, which elucidate the integrative responses of individual and small populations of neurons, can provide key substrates to understand the effects of bimodal versus unimodal attention on cross-modal responses demonstrated in humans (Jääskeläinen et al. 2007; McDonald et al. 2003; Rahne and Böckmann-Barthel 2009; Talsma et al. 2009; von Kriegstein and Giraud 2006). The timing of cross-modal effects in primary auditory and posterior auditory association cortices in resting or anesthetized monkeys seemed consistent with the cross-modal influence of touch and sight in monkeys engaged in an auditory task. In resting monkeys, the somatosensory CSD response elicited by electrical stimulation of the median nerve had an onset latency as short as 9 ms (Lakatos et al. 2007; Schroeder et al. 2001), and single neurons responded to air puff stimulation at dorsum hand in anesthetized monkey with a latency of about 30 ms (Fu et al. 2003). Cutaneous sensory response of single units in AC during active task peaked at 20 ms (Brosch et al. 2005) and occurred slower than direct electrical activation of afferent fibers but faster than passive condition. Similarly, visual responses of single units in AC were observed from 60 ms and peaked at around 100 ms after the onset of LED during an active task (Brosch et al. 2005). That was within the same range of the onset latency, about 100 ms, of neuronal firing and the peak timing of LFP responses to complex visual stimuli in AC when monkeys were simply visually fixating (Hoffman et al. 2007; Kayser et al. 2008). The effect of gaze direction/saccades will also need to be taken into account in future studies because it has been proposed that it can considerably affect auditory processing (Fu et al. 2004; Groh et al. 2001; Werner-Reiss et al. 2006).
5.2.6 Visual Cortex There has been much less multisensory research done in visual cortex than in auditory cortex, although it has been shown that the peripheral visual field representations of primary visual cortex (V1) receive inputs from auditory cortical areas, A1, parabelt areas on STG, and STP (Falchier et al.
76
The Neural Bases of Multisensory Processes
2002). The peripheral visual field representation of area V2 also receives feedback inputs from caudal STG/auditory belt region (Rockland and Ojima 2003). A preference to vocal sounds, relative to other sounds, was found in the nonprimary visual cortex using functional MRI (fMRI) in monkeys (Petkov et al. 2008). In contrast to studies of visual responses in the auditory cortex, not many visual studies recorded auditory responses in visual cortex during the performance of a task. Wang et al. (2008) recorded V1 single-unit firing while monkeys performed a visual detection task. Concurrent presentation of auditory and visual stimuli not only shortened saccadic reaction time, but also increased the neuronal response magnitude and reduced response latency. This effect was greatest when the intensity of visual stimuli was of a low to moderate level, and disappeared when the luminance of the visual stimuli was intense. When monkeys were not performing a task, no auditory effect was observed in V1 (see Section 5.6.1). In a series of studies from our laboratory, a selective attention task was employed to determine whether attention to auditory stimuli influenced neuronal activity in V1 (Lakatos et al. 2008, 2009; Mehta et al. 2000a, 2000b). In these studies, tones and flashes were presented alternatively and monkeys had to monitor a series of either visual or auditory stimuli, while ignoring the other modality. The visual response was stronger when monkeys tracked the visual series than when they tracked the auditory series. In the attend-auditory condition, it appeared that a phase reset of ongoing neuronal oscillations occurred earlier than the visual response (Lakatos et al. 2009). This effect disappeared when the same stimuli were ignored. Thus, auditory influences on V1 were observed only when auditory stimuli were attended. It contrasted with the findings of Wang et al. (2008) in which sound affected V1 activity in monkeys performing a visual task. As we propose later, control of attention likely has a major role in the manifestation of auditory effects in V1 (see Section 5.6.2).
5.2.7 Subcortical Regions The basal ganglia group is composed of several nuclei, each having a distinct function, such as motor planning and execution, habitual learning, and motivation. Several studies show auditory, visual, and bimodally responsive neurons in basal ganglia nuclei. Even though multisensory responses could be observed under passive conditions (Santos-Benitez et al. 1995), many studies showed that these responses were related to reinforcement (Wilson and Rolls 1990) or sensorimotor association (Aosaki et al. 1995; Hikosaka et al. 1989; Kimura 1992). Although it is well known that the SC is a control station orienting movement (Wurtz and Albano 1980), its multisensory property has been a hotbed of research for decades in monkey (Allon and Wollberg 1978; Cynader and Berman 1972; Updyke 1974; Wallace et al. 1996) and other animal models (Meredith and Stein 1983; Meredith et al. 1987; Rauschecker and Harris 1989; Stein et al. 2001, 2002). Neurons in the monkey SC adhered to well-established principles of multisensory integration such as spatial contiguity and inverse effectiveness (for review, see Stein and Steinford 2008), whether the animals were engaged in tasks (Frens and Van Opstal 1998) or under anesthesia (Wallace et al. 1996). In the SC of awake animals, AV integration depended on the task conditions, whether they fixated on visible or memory-guided spots during AV stimuli (Bell et al. 2003). The presence of a visual fixation spot decreased unimodal responses, and nearly suppressed response enhancement by AV stimuli. Bell et al. (2003) attributed the reason of weaker AV integration during visually guided fixation to fixation-mediated inhibition in SC. It is consistent with the fact that, whereas activity in SC is coupled to eye movements, fixation requires the monkey to refrain from gaze shifts. Although the inferior colliculus (IC) has been generally assumed to be a passive station for primarily auditory information and immune to nonauditory or cognitive influences, recent AV studies challenge this view. Neuronal activity in the IC has been shown to be influenced by eye position (Groh et al. 2001), saccades, and visual stimuli (Porter et al. 2007), suggesting that the IC may be
Audiovisual Integration in Nonhuman Primates
77
influenced by covert orienting of concurrent visual events. This covert orienting may contribute to the visual influence observed on portions of human auditory brainstem responses that are roughly localized to the IC (Musacchia et al. 2006). Studies of thalamic projections to the primary auditory cortex show that multisensory connections are present in centers previously thought to be “unisensory” (de la Mothe et al. 2006b; Hackett et al. 2007; Jones 1998). Multiple auditory cortices also receive divergent afferents originating from common thalamic nuclei (Cappe et al. 2009; Jones 1998). In addition, the connections between thalamic nuclei and cortices are largely reciprocal. Even though the functions of those thalamic nuclei have to be clarified, they may contribute to multisensory responsiveness in cerebral cortices. Bimodal responsiveness was shown in a few thalamic nuclei (Matsumoto et al. 2001; Tanibuchi and Goldman-Rakic 2003).
5.3 FUNCTIONAL SIGNIFICANCE OF MULTISENSORY INTERACTIONS It was shown in monkeys that, under certain circumstances, audition influences vision (Wang et al. 2008), vision influences audition (Woods and Recanzone 2004), or the two senses influence each other (Cappe et al. 2010). For AV integration of any form, auditory and visual information has to converge. As described in the previous section, most brain regions have the potential to support that interaction (for review, see Ghazanfar and Schroeder 2006; Musacchia and Schroeder 2009), but the importance of that potential can only be determined by assessing the functional role that each region plays in helping to achieve perceptual integration of sight and sound. This can be achieved by observing the behavioral effects of cortical lesions or electrical stimulation in different areas and by simultaneously measuring behavioral performance and neural activity in normal functioning and impaired populations.
5.3.1 Influences on Unimodal Perception Neural activity in a unimodal area is thought to give rise to sensations only in the preferential modality of the area. It is not surprising, therefore, that lesions in these areas only extinguish sensations of the “primary” modality. For example, STG lesions impair auditory memory retention but leave visual memory retention intact (Colombo et al. 1996). One exception to this rule lies in cases of acquired cross-modal activity such as auditory responses in the occipital cortex in blind people (Théoret et al. 2004). Despite this reorganization, direct cortical stimulation in the visual cortex of blind people elicits photic sensations of simple patterns (such as letters) (Dobelle et al. 1974). Similar sensations of phosphenes can also be induced in sighted individuals using transcranial magnetic stimulation (TMS) (Bolognini et al. 2010; Ramos-Estebanez et al. 2007; Romei et al. 2007, 2009). But do they also induce auditory sensations? Our opinion is that auditory activity in the visual cortex does not induce visual sensations, and visual activity in the auditory cortex does not induce auditory sensations, although it may depend on the condition of subjective experience with stimuli (Meyer et al. 2010). In humans, influences of cross-modal attention on activity of sensory cortices during cross-modal stimulus presentation, e.g., visual attention gates visual modulation in auditory cortex, is known (Ciaramitaro et al. 2007; Lehman et al. 2006; Nager et al. 2006; TederSälejärvi et al. 1999). In particular, the functional role of visual information on speech perception and underlying auditory cortical modulation is well documented (Besle et al. 2009; van Attenveldt et al. 2009; Schroeder et al. 2008). The findings described below also suggest that the functional role of cross-modal activation in early sensory cortices is likely the modulation of primitive (lowlevel) sensory perception/detection. 5.3.1.1 Influence on Temporal Dynamics of Visual Processing In the sensory system, more intense stimuli generally produce a higher neuronal firing rate, faster response onset latencies, and stronger sensations. AV interactions often have a facilitative
78
The Neural Bases of Multisensory Processes
effect on the neural response, either through increased firing rate or faster response (for review, see Stein and Stanford 2008), suggesting that AV stimuli should increase the acuity of the behavioral sensation in some fashion. In humans, AV stimuli increase reaction time speed during target detection (Diederich and Colonius 2004; Giard and Peronnet 1999; Molholm et al. 2002, 2007) and improve temporal order judgments (Hairston et al. 2006; Santangelo and Spence 2009). In the monkey, Wang et al. (2008) showed electrophysiological results consistent with this notion. During a visual localization task, the effect of AV enhancement in V1 occurred as shorter response latency. Interestingly, no appreciable enhancement of visual response was elicited by auditory stimuli when monkeys were not engaged in tasks. The auditory stimuli by themselves did not evoke firing response in V1. This suggests that auditory influence on V1 activity is a subthreshold phenomenon. Suprathreshold response in V1 begins at about 25 to 30 ms poststimulation (Chen et al. 2007; Musacchia and Schroeder 2009). To achieve auditory influences on visual responses, auditory responses must arrive within a short temporal window, a few milliseconds before visual input arrives (Lakatos et al. 2007; Schroeder et al. 2008). Auditory responses in the auditory system generally begin much earlier than visual responses in V1. For some natural events such as speech, visible signals lead the following sounds (Chandrasekaran et al. 2009; for review, see Musacchia and Schroeder 2009). For these events, precedence of visual input, relative to auditory input, is likely a requirement for very early AV interaction in early sensory interactions. 5.3.1.2 Sound Localization The ventriloquist aftereffect observed by Woods and Recanzone (2004) involves the alteration of auditory spatial perception by vision. This phenomenon implies the recruitment of structures whose auditory response depends on or encodes sound location. Several brain structures have sensitivity to spatial location of sound in monkeys. Those include IC (Groh et al. 2001), SC (Wallace et al. 1996), ventral division of the medial geniculate body (Starr and Don 1972), caudal areas of auditory cortex (Recanzone et al. 2000; Tian et al. 2001), PPC (Cohen 2009), and PFC (Artchakov et al. 2007; Kikuchi-Yorioka and Sawaguchi 2000). Woods and Recanzone (2004) used two tasks to test for bimodal interaction during sound localization: one for training to induce the ventriloquist aftereffect and another to test spatial sound lateralization. Monkeys maintained fixation except when making a saccade to the target sound location in the latter test task. The location of the LED light on which monkeys fixated during training task differed between sessions and affected the sound localization in the subsequent sound localization test tasks. Monkey’s “sound mislocalization” was predicted by the deviation of the LED position during the training task from the true center position on which the monkey fixated during the test task. Because monkeys always fixated on the LED, the retinotopic locus of the LED was identical across the tasks. However, there was a small difference in gaze direction that played a key role in causing “mislocalization” by presumably inducing plastic change in proprioceptive alignment of gaze position to sensory LED position. An additional key to that study was even though LED positions were not identical between tasks, they were so close to each other that monkeys presumably treated fixation points of slightly different positions as the same and did not notice differences in gaze directions. Therefore, it could be guessed that the plasticity of the visual spatial localization affected the auditory spatial localization. Although the precise substrate for the ventriloquist aftereffect in the macaque has not been established, several structures are candidates: IC (Groh et al. 2001), SC (Jay and Sparks 1984), AC (Werner-Reiss et al. 2006), and LIP and MIP (Mullette-Gilman et al. 2005). However, in all structures, except for the SC, the observed effects varied between simple gain modulation without altering the spatial receptive field (head-centered coordinate), systematic change that followed gaze direction (eye-centered coordinate), or other complex changes. Plastic change in either coordinate or both can presumably contribute to inducing the ventriloquist aftereffect.
Audiovisual Integration in Nonhuman Primates
79
Fixation during head restraint does not allow any eye movement. During fixation, subjects can pay visual attention to locations off from the fixated spot (covert attention) or listen carefully. Neuronal activity correlates of such processes were seen in PFC (Artchakov et al. 2007; KikuchiYorioka and Sawaguchi 2000) or PPC (Andersen et al. 1997). Meanwhile, subjects have to keep feeding oculomotor command signals to maintain steady eye position. Therefore, the signal that transmits fixating location and differentiates between center and deviant should be present. A possible correlate to such a signal was described in AC, a change in spontaneous activity dependent on gaze direction, whereas it was not observed in IC (Werner-Reiss et al. 2006). Even though what provides the eye positional signal to AC is unknown, it suggests AC as one of the candidates inducing the ventriloquist aftereffect. It is worth mentioning that regardless of the name “ventriloquist aftereffect,” it is quite different from the ventriloquist effect. The ventriloquist effect happens when audio and visual signals stem from a shared vicinity, but does not require fixation on a visual spot and a steady eye positional signal. In contrast, the ventriloquist aftereffect is about spatial coding of solely auditory events. Hence, the study of this phenomenon may be useful to clarify which type of neuronal coding is the main strategy for cortical encoding of sound localization.
5.3.2 AV Recognition Identifying a previously known AV object, such as a speaker’s face and voice, requires AV integration, discrimination, and retention. This process likely relies on accurate encoding of complex stimulus features in sensory cortices and more complex multiplexing in higher-order multisensory association cortices. Multisensory cortices in the “what” pathway probably function to unite these sensory attributes. In humans, audiovisual integration plays an important role in person recognition (Campanella and Belin 2007). Several studies have shown that unimodal memory retrieval of multisensory experiences activated unisensory cortices, presumably because of multisensory association (Wheeler et al. 2000; Nyberg et al. 2000; Murray et al. 2004, 2005; von Kriegstein and Giraud 2006) and such memory depended on meaningfulness of combined signals (Lehmann and Murray 2005). Differential responses to vocal sounds were observed in PFC (Gifford et al. 2005; Romanski et al. 2005), STG (Rauschecker et al. 1995; Russ et al. 2008), and AC (Ghazanfar et al. 2005). Differential responses to faces were found in PFC (Rolls et al. 2006), temporal lobe cortices (Eifuku et al. 2004), and amygdala (Kuraoka and Nakamura 2007). Some of these structures may possess selectivity to both vocal sounds and faces. Recognition of a previously learned object suggests that this process relies in part on working and long-term memory centers. The fact that the identification of correspondence between vocal sound and face is better when the individuals are socially familiar (Martinez and Matsuzawa 2009) supports this notion. PFC and MTL are also involved in the association of simple auditory and visual stimuli as shown by delayed match to sample task studies (Colombo and Gross 1994; Fuster et al. 2000; Gibson and Maunsell 1997). Lesions in MTL (Murray and Gaffan 1994) or PFC (Gaffan and Harrison 1991) impaired performance in tasks requiring memory and AV association. These findings implicate PFC, STG, and MTL in AV recognition.
5.4 PRINCIPLES OF MULTISENSORY INTERACTION Relationships between multisensory responses and stimulus parameters, derived primarily from single-unit studies in the cat SC, are summarized in three principles of multisensory interaction: inverse effectiveness, temporal, and spatial principles (Stein and Meredith 1993). These organizing principles have been shown to be preserved with other sensory combinations (i.e., auditory– somatosensory; Lakatos et al. 2007) and in humans (Stevenson and James 2009); however, systematic examination of these principles for AV integration in monkey cerebral cortex is limited to the auditory cortex.
80
The Neural Bases of Multisensory Processes
5.4.1 Inverse Effectiveness The inverse effectiveness principle of multisensory interaction states that the interaction of weaker unimodal inputs results in larger gain of multisensory response. In the case of audition, the response to a softer sound should be enhanced more by visual input, relative to a louder sound. In the case of vision, the response to a dimmer object should be enhanced more by sounds relative to a brighter object. Cappe et al. (2010) showed a behavioral correlate to inverse effectiveness in monkeys. Manual reaction times to soft sounds were slower relative to loud sounds, and only the reaction time to soft sound was shortened by simultaneous visual stimuli. Responses to AV stimuli were also more accurate than responses to sounds alone at the lowest sound intensities. The same group also showed that the effect of sound on saccades as well as V1 neuronal response latencies is larger in the case of less salient visual stimuli (Wang et al. 2008). fMRI studies show that degraded auditory and visual stimuli both evoke weaker BOLD signal responses in the macaque AC, relative to intact stimuli (Kayser et al. 2007). When those degraded stimuli were presented simultaneously, enhancement of BOLD signal responses was larger than simultaneous intact stimuli. Even though they did not test the combination of degraded and intact stimuli, the results suggest synergistic inverse effectiveness between modalities. Electrophysiologically, Ghazanfar et al. (2005) showed that weaker LFP responses to vocal sounds were enhanced more by concurrently viewing a movie clip of a vocalizing monkey, relative to stronger responses. Another study showed that responses to vocal stimuli were modulated by movie stimuli differentially depending on loudness: responses to the loud vocal stimuli were suppressed when the movie was added, whereas the responses to the soft sounds were enhanced (Kayser et al. 2008). These studies are compatible with the idea that weak responses are enhanced by AV integration. Additionally, a recent study reported a small but significant increase in the information capacity of auditory cortical activity (Kayser et al. 2010). Thus, visual stimuli may not only enhance responses but also deploy more cortical neurons in computational analysis of auditory signals, creating redundancy in processed information to secure the perception.
5.4.2 Temporal Contiguity The Temporal Principle of multisensory processing (Stein and Meredith 1993) predicts that integration effects will be greatest when neuronal responses evoked by stimuli of the two modalities are within a small temporal window. Quite a few studies investigated spatial and temporal contiguity principles of AV integration in nonhuman primates. Overall, results in the monkey SC and A1 conform to the principle of temporal contiguity and describe a range of enhancement and suppression effects. In the SC, Wallace et al. (1996) showed that visual stimuli preceding auditory stimuli tend to produce more interaction. This condition corresponds to the natural order of physical events in everyday stimuli where the visual stimulus precedes the accompanying auditory one. Ghazanfar et al. (2005) described neural responses in A1 and lateral belt areas to the presentation of conspecific vocal sounds, with and without the accompanying movies at different SOAs. In this region, bimodal stimulation can elicit suppression or enhancement, depending on the neural population. Results showed that the proportion of sites exhibiting bimodal enhancement depended on the SOA: SOAs longer than 100 ms enhanced less regions of AC. When the auditory response was suppressed by a movie, the proportion of suppressed locations peaked at SOAs shorter than 80 ms and longer than 100 ms, interestingly sparing the peak timing of visually evoked LFPs. Kayser et al. (2008) tested responses in A1 and belt areas to systematic combinations of noise bursts and flashes in 20 ms steps. Bimodal suppression was only observed when the flash preceded noise by 20 to 80 ms. For the natural AV stimuli, bimodal enhancement was observed in some popu-
Audiovisual Integration in Nonhuman Primates
81
lations of auditory cortex at an SOA of 0 ms, and that was abolished by introducing a perceivable delay between stimuli (160 ms). These results suggest that AV interaction in AC could happen as either enhancement (if audio and visual stimuli are nearly synchronized or separated by less than 100 ms delay) or suppression (at delays longer than 100 ms). Interpretations of these data should be approached with a little caution. In the first study, the effect of AV interaction was attributed to the interaction between movements of the mouth and the following vocal sound (Ghazanfar et al. 2005). However, because the mouth movement started immediately after the abrupt appearance of the first movie frame, the sudden change in the screen image could capture visual attention. In other studies, an abrupt visual change was shown to elicit a brief freeze of gaze position in monkeys (Cui et al. 2009) and in humans (e.g., Engbert and Kliegl 2003). Therefore, the onset of the movie itself could evoke transient activity. This would suggest that the observed effects were related simply to visual response or a transient change in covert visual attention. Because LFPs capture the response of a large population of neurons, such activity generated in non-AC structures may be superimposed. Further studies are necessary to dissociate the AV interaction into mouth movement-related and other components.
5.4.3 Spatial Contiguity The spatial principle of multisensory integration states that multisensory integration is greatest when loci of events of different modalities overlap with the receptive fields of neurons and when those receptive fields of different modalities overlap with each other. Although there is little data in monkey cortex on this topic for AV integration, we can speculate how it operates based on anatomical and electrophysiological findings. Anatomical studies predict that peripheral representations of visual stimuli should be more susceptible to auditory influences. The representation of the visual periphery is retinotopically organized in the visual cortex and is interconnected with caudal auditory cortices (Falchier et al. 2002, 2010; Rockland and Ojima 2003). In accordance with this prediction, Wang et al. (2008) observed auditory influences on V1 responses to visual stimuli presented more peripherally than 10°, although central vision was not tested. Similarly, in humans, auditory activation of visual cortex subserving the peripheral visual fields was shown (Cate et al. 2009). However, many human studies used central and parafoveal stimuli, for which anatomical substrates or other physiological mechanisms await to be found. Other studies used different types of visual stimuli to study auditory cortical responses. Flashes (e.g., Kayser et al. 2008; Lakatos et al. 2009) excite a wide area of the retinotopic map. Images and movies were overlaid around a central fixation point (Ghazanfar et al. 2005, 2008). In the latter case, visual stimulation did not extend to peripheral visual space. In addition, when monkey faces are used, the subjects tend to look at the mouth and eyes proximal to the center of face (Ghazanfar et al. 2006). These findings suggest that visual influence may have different sources depending on the type of stimulus preference in each area. For example, cortices across STS possess face preference, large receptive fields, and position invariance of object selectivity. Therefore, facial influences on AC may originate from STS, as proposed by recent studies (Ghazanfar et al. 2008; Kayser and Logothetis 2009, see below). Such speculation may be further clarified by comparing the vocal movie effect on AC between face positions relative to gaze position considering the difference in the size of receptive field between visually responsive cortices. In PPC, common spatial tuning to visual and auditory stimuli was observed (Mazzoni et al. 1996; Schlack et al. 2005). Even though PPC response to simultaneous AV stimuli has not been investigated, it is likely that integration there depends on spatial congruency between modalities. Further studies are needed to verify this.
82
The Neural Bases of Multisensory Processes
5.5 MECHANISMS AND DYNAMICS OF MULTISENSORY INTERACTION Traditionally, multisensory integration is indexed at the neuronal level by a change in the averaged magnitude of evoked activity relative to the sum of unimodal responses. This type of effect was most often studied in the classical higher-order multisensory regions of the temporal, parietal, and frontal cortices, and generally manifested as a simple enhancement of the excitatory response beginning at the initial input stage in layer 4 as reviewed by Schroeder and Foxe (2002). Recent studies have shown that cross-modal influence on traditional unisensory cortices could occur via manipulation of ongoing oscillatory activity in supragranular layers, which in turn modulates the probability that neurons will fire in response to the dominant (driving) auditory input (Lakatos et al. 2007; Schroeder and Lakatos 2009). Similarly, modulatory rather than driving multisensory influences were found in single-unit studies as well (Allman and Meredith 2007; Allman et al. 2008; Dehner et al. 2004; Meredith et al. 2009). This more novel mechanism will be the focus of discussion here.
5.5.1 Phase Reset: Mechanisms Somatosensory stimuli evoked a modulatory response in the supragranular layer of A1, with an onset time even faster than the auditory response (Lakatos et al. 2007). When it was paired with synchronized auditory stimuli, faster somatosensory activation influenced the forthcoming auditory response. However, somatosensory activity did not evoke a single rapid bolus of afferent activity like a click, which elevates signal power across a broad frequency range at once. Instead, the somato sensory effect appeared as a modulation by phase reset of certain dominating neuronal oscillations observed in CSD. In other words, the somatosensory stimulus changed randomly fluctuating excitability of auditory neuronal ensembles to a certain excitable condition (represented by the oscillatory phase), thereby determining the effect of the auditory input. The modulatory effect is differential across somatosensory–auditory SOAs dependent on how a given SOA relates to periods of delta, theta, and gamma oscillations; that is, facilitation is maximal at SOAs corresponding to full gamma, theta, and delta cycles, and these peaks in the function are separated by “suppressive” troughs, particularly at SOAs corresponding to 1/2 of a theta cycle, and 1/2 of a delta cycle. In contrast with somatosensory activation of A1, visual responses are relatively slow even within the visual systems (Chen et al. 2007; Musacchia and Schroeder 2009; Schmolesky et al. 1998). It takes more time for visual activity to reach the auditory cortex than auditory activity in both A1 and V1 (Lakatos et al. 2009). Therefore, for the timing of visual signals to coincide with or to reach AC earlier than that of the auditory signal, visual stimuli have to occur earlier than auditory stimuli, which is the case for many natural forms of AV stimulation, particularly speech (Chandrasekaran et al. 2009). Cross-modal auditory modulation of V1 activity and visual modulation of A1 activity were observed in monkeys performing an intermodal selective attention task, in which auditory and visual stimuli were presented alternatively at a rate in the range of delta frequency (Lakatos et al. 2009). Just like in the case of somatosensory modulation of A1 activity, cross-modal responses occurred as a modulatory phase reset of ongoing oscillatory activity in the supragranular layers, without a significant change in neuronal firing while those stimuli were attended. Supragranular and granular layers are recipients of corticocortical, nonspecific thalamocortical inputs or sensory-specific thalamocortical inputs, respectively. Modulatory phase reset in supragranular layer without any change in neuronal firing in granular and even supragranular layers suggests that cross-modal activation happens as a transient change in supragranular cellular excitability at the subthreshold level. It is consistent with the fact that cross-modal sensory firing response has not been reported for primary sensory cortices in many studies that relied on action potentials as a sole dependent measure. The manifestations of multiple poststimulus time windows of excitability are consistent with nested hierarchical structure of frequency bands of ongoing neuronal activity (Lakatos et al. 2005).
Audiovisual Integration in Nonhuman Primates
83
Cross-modal responses during an intermodal selective attention task were observed in response to unimodal stimuli (Lakatos et al. 2008, 2009). What would be the effect of a phase reset when auditory and visual stimuli are presented simultaneously? Wang et al. (2008) analyzed neuronal firing responses to light with or without paired auditory noise stimuli using single-unit recordings in V1. When stimuli were presented passively, firing rate in a population of V1 neurons increased and remained high for 500 ms. V1 population responses to a visual target without sound during visual detection tasks appeared as double peaks in a temporal pattern. The timing of each peak after response onset was in the range of cycle length of gamma or theta frequency bands. In response to AV stimuli, an additional peak near the time frame of a full delta cycle showed up in the temporal firing pattern. Although translation of firing activity into underlying membrane potential is not straightforward, those activity parameters are roughly monotonically proportional to each other (e.g., Anderson et al. 2000). Thus, the oscillatory pattern of neuronal firing suggests oscillatory modulation of neuronal excitability by the nonauditory stimuli.
5.5.2 Phase Reset: Dependence on Types of Stimuli How would phase reset work in response to stimuli with complex temporal envelopes? Sounds and movies of vocalizations are the popular stimuli examined in studies of AV integration in auditory cortical areas and STP in nonhuman primates. As vocalization starts with visible facial movement before a sound is generated, phase reset by visible movement pattern is in a position to affect processing of a following sound. Kayser et al. (2008) showed changes in frequency bands of LFP (around and below 10 Hz) consistent with the above predictions, that is, they observed phase reset and excitability increases when response to the sound of complex AV stimuli started in A1. When phase reset occurred, it was accompanied with enhanced firing responses. There were differences in the frequency bands in which phase reset is produced by visual inputs between Kayser et al. (2008) and the findings of Lakatos et al. (2009), who showed cross-modal phase reset in A1 and V1 occurred around theta (below 10 Hz) and gamma (above 25 Hz) bands leaving a 10 to 25 Hz band out of the phenomena. Kayser et al. observed phase reset by visual input alone across the range of 5 to 25 Hz. The differences between the results of these studies are likely attributable to differences in visual stimuli. Lakatos et al. (2009) did not examine whether phase reset of ongoing oscillatory activity at theta and gamma bands contributed to AV integration because their task did not present auditory and visual stimuli simultaneously. Kayser et al. (2008) showed that observation of enhanced neuronal firing response to AV stimuli compared with auditory stimuli correlated with the occurrence of phase reset about 10 Hz, underscoring the importance of reset in that band for AV response enhancement. Also, differences in frequency band of phase reset by visual stimuli between the Lakatos et al. and Kayser et al. studies suggests that the frequency of oscillation influenced by crossmodal inputs depends on conditions of attention and stimulation. Is phase reset a phenomenon beyond primary sensory cortices? This question is open. At least STP clearly receives feedforward excitatory input from several modalities (Schroeder and Foxe 2002). The contribution of oscillatory phase reset in STP and other higher-order multisensory areas have not been examined in detail, although the suspicion is that phase reset may have more to do with attentional modulation than multisensory representation.
5.6 IMPORTANCE OF SALIENCE IN LOW-LEVEL MULTISENSORY INTERACTIONS Variations in AV integration effects according to saliency and attentional conditions are so pervasive that some have begun to wonder if attention is a prerequisite to integration (Navarra et al. 2010).
84
The Neural Bases of Multisensory Processes
However, AV integration has been observed in many higher cortical areas even when subjects were only required to maintain visual fixation without further demands of a task (PFC, Sugihara et al. 2006; STP, Barraclough et al. 2005; AC, Ghazanfar et al. 2005; Kayser et al. 2008). Does this mean audiovisual interactions happen automatically? The answer may depend on the level of the system being studied, as well as the behavioral states, as discussed below.
5.6.1 Role of (Top-Down) Attention There is strong evidence that top-down attention is required in order for AV integration to take place in primary sensory cortices. Using an intermodal selective attention task, Lakatos et al. (2008, 2009) showed that the manifestation of visual influence in A1 and auditory influence in V1 was dependent on attention. If a stimulus was ignored, its cross-modal influence could not be detected. The selective role of sensory attention illustrated above contrasts with some findings that show how attention to either modality elicits AV effects. Wang et al. (2008) showed that neurons in V1 responded to auditory targets only when monkeys performed a purely visual localization task. Similarly, in humans, task-irrelevant sound promoted phosphene detection during a task that requires only visual attention to detect phosphene induced by TMS over visual cortex (Romei et al. 2007, 2009). Thus, tasks requiring either auditory (Lakatos et al. 2009) or visual (Romei et al. 2007, 2009; Wang et al. 2008) attention both rendered auditory influences observable in V1. This apparent disagreement is most likely because of differences in the role of unattended sensory stimuli during those tasks. In the visual localization task (Wang et al. 2008), monkeys needed to react faster to localize visual targets. Task-irrelevant auditory stimuli occurred in half of the trials, being delivered always temporally congruent with visual targets and at a fixed center location. In this task, the status of sound is key. Auditory stimuli, when delivered, were always informative, and thus, could act as an instruction like that given verbally to subjects performing visual localization as in Posner’s classical study (Posner et al. 1980). Therefore, it was possible that monkeys paid attention to such informative auditory stimuli in addition to visual stimuli to perform the visual localization task. In a similar vein, responses to visual events in the auditory discrimination task of Brosch et al. (2005) may be regarded as an informative cross-modal cue to perform the task, although again, the effects of overtraining must also be considered. In the intermodal attention task (Lakatos et al. 2008, 2009), subjects did not have to spread their spatial attention to different locations because visual and auditory stimuli were spatially congruent. However, those stimuli were temporally incongruent, divided into two series as asynchronous streams. Furthermore, whereas monkeys had to monitor a sequence of one modality, deviants appeared in the other sequence and monkeys had to refrain from responding to it. The easiest way to perform such a task is to plug one’s ears when watching and to close the eyes when you are listening. Prevented from these strategies, all monkeys could do actually was not only to pay attention to a modality cued to attend to but also to ignore the other stream at the same time, in order to perform the task. Although it may be impossible to determine what monkeys are actually attending to during any given task, it can be argued that monkeys do not ignore informative sounds based on the observation of auditory influence on visual response in V1 (Wang et al. 2008). Further studies are needed to determine how attentive conditions influence AV integration. It would be interesting to see whether an auditory influence could be observable in a visual localization task, as in the study of Wang et al. (2008), but with auditory stimuli incongruent with visual stimuli matched both spatially and temporally, thereby acting as distracters. Auditory attention has also been suggested to play a role in evoking auditory response in LIP (Linden et al. 1999) and PFC (Vaadia et al. 1986). Further clarification of the role of attention in higher associative areas, such as the PFC, is very important because many models assume that those cortices impose attentional control over lower cortices.
Audiovisual Integration in Nonhuman Primates
85
5.6.2 Attention or Saliency of Stimuli Degrees of attentional focus and ranges of stimulus saliency surely have differential effects on AV integration. It is difficult to argue that monkeys monitor AV stimuli during simple tasks such as fixation because monkeys will receive reward anyway regardless of what happens during stimulus presentation. However, monkeys are certainly alert in such a condition. Even though the mandated level of such attention is different from active monitoring, such weak attention, or lack of competing stimulation, may be enough to induce audiovisual integration. Besides attentive requirements, there are differences in stimulus saliency between simple stimuli, such as flashes and tones, and complex stimuli such as faces. It is well known that meaningful visual stimuli attract attention in a behaviorally observable manner. The eyes and mouths of individuals vocalizing draw a subject’s gaze (Ghazanfar et al. 2006). Thus, it is possible that highly salient stimuli may passively induce AV effects in the absence of explicit requirements to attend. Certain forms of AV effects in adult animals occur only after training (Grunewald et al. 1999; Woods and Recanzone 2004). In that sense, perception of vocalization has already been acquired by life-long training in monkeys. We may suppose that AV integration is essential for acquisition of communication skills in nonhuman primates. Once trained, AV integration may become “pre- potent” requiring less attention and may be done “effortlessly.”
5.7 CONCLUSIONS, UNRESOLVED ISSUES, AND QUESTIONS FOR FUTURE STUDIES Compared to human studies, behavioral studies of AV integration in nonhuman primates are still relatively rare. The ability to simultaneously record behavior and local neural activity has helped to reconcile the multisensory findings in humans, and expand our understanding of how AV integration occurs in the nervous system. Below, we list several issues to be addressed in the future.
5.7.1 Complex AV Interactions Tasks requiring linguistic ability may be out of reach for experiments involving nonhuman primates; however, visual tasks of high complexity have been done in previous studies. Considering that many AV effects in humans were seen with purely visual tasks, it may be possible to train monkeys to perform complex visual tasks and then study the effect of auditory presentation on visual performance.
5.7.2 Anatomical Substrates of AV Interaction The anatomical substrates of cross-modal inputs to primary sensory cortices (de la Mothe et al. 2006b; Cappe and Barone 2005; Cappe et al. 2009; Falchier et al. 2002, 2010; Hackett et al. 2007; Rockland and Ojima 2003; Smiley et al. 2007) provide the basis for the models of routes for AV integration. These data show that two types of corticocortical inputs (feedback and lateral connections), and thalamocortical along with subcortical inputs from nonspecific as well as multisensory thalamic nuclei are potential pathways mediating early multisensory convergence and integration. The challenge here is to discriminate the influence of each of these pathways during a behavioral task. It is probable that the weight of these different pathways is defined by the sensory context as well as by the nature of the task objective.
5.7.3 Implication of Motor Systems in Modulation of Reaction Time Brain structures showing AV responses included parts of not just sensory but motor systems. Facilitated reaction time in both saccadic and manual responses raises an issue of whether
86
The Neural Bases of Multisensory Processes
enhancement occurs in just sensory systems or somewhere else additionally. As Miller et al. (2001) showed, motor cortical activation triggered by sensory stimuli reflected that sensory signals were already integrated at the stage of primary motor cortex, it is possible that activation of PPC, PFC, particularly PM areas or SC is facilitated by redundant sensory inputs. These possibilities are not fully discerned yet. The possibility of additional sources for facilitated reaction time was also suggested by the findings of Wang et al. (2008). When intense visual stimuli were presented, additional auditory stimuli did not affect visual response in V1, but it did influence saccadic reaction time. This suggests either that visual response is facilitated somewhere in the visual system outside of V1 or that auditory stimuli directly affect motor responses.
5.7.4 Facilitation or Information? In general, larger neuronal responses can be beneficial for faster reactions to and discrimination of events because they have faster onset latencies and better signal-to-noise ratios. The coding of which strategy, or strategies, neurons take as they respond to stimuli has to be discerned. For example, visual localization tasks require not only fast reaction times but also good discrimination of visual target location. Visual influences on ongoing oscillations by phase reset mechanisms and the consequence of modulations on response magnitude have been shown by several groups. Additionally, Kayser et al. (2010) has shown the possibility that visual influences can tune the auditory response by increasing the signal-to-noise ratio and thereby its information capacity. Because it is not known what aspect of neuronal response the brain utilizes, it is desirable to compare mechanisms of modulation with behavioral responses.
5.7.5 Inverse Effectiveness and Temporal Interaction Inverse effectiveness states that multisensory integration is most effective when weak stimuli are presented. Even though most electrophysiological studies of AV integration in monkey auditory cortex often utilize loud sounds, low stimulus intensity can degrade the temporal response pattern of sensory neurons. Such an effect would be more prominent for complex stimuli, such as vocal sounds, because smaller peaks in the temporal envelope (e.g., the first envelope peak of macaque grunt call) may be missed in auditory encoding. The condition of weak sound is relevant to Sumby and Pollack’s (1954) classic observation of inverse effectiveness of human speech. It is thus important to investigate how AV integration works in degraded conditions. It could be possible that degraded stimuli reveal a more central role of attention because weaker stimuli require more attention in order to discern them. Also, altered timing of peaks in response to weak vocal sound may interact differently with the excitability phases of ongoing oscillation, leading to different patterns of enhancement.
5.7.6 What Drives and What Is Driven by Oscillations? Recent studies of AV integration in AC and STP stress the importance of oscillatory neuronal activity. Oscillations in field potentials and CSD reflect rhythmic net excitability fluctuations of the local neuronal ensemble in sensory cortical areas. Although numerous hypotheses are available, the role of oscillatory modulation in other structures is unknown. Endogenous attention may also be reflected in ongoing activity by top-down modulation. Its interaction with bottom-up sensory activation can contribute to and be influenced by oscillatory dynamics. This is an extremely fruitful area for future studies.
5.7.7 Role of Attention Although some multisensory studies in monkeys did control for attention, most studies were done where attention was not specifically controlled. The former studies provide ample evidence for a
Audiovisual Integration in Nonhuman Primates
87
definitive role of sensory attention in AV integration. To get a clear picture on the role attention plays in multisensory interactions, more studies are needed in which attention, even unimodal, is controlled through behavioral tasks and stimuli. It will be also important to investigate issues of attentional load because differences in selective attention may only emerge under high load conditions, as under high attentional loads in attended modality subjects may try to ignore stimuli of irrelevant modalities either consciously or unconsciously.
ACKNOWLEDGMENT This work was supported by grant nos. K01MH082415, R21DC10415, and R01MH61989.
REFERENCES Aggleton, J.P., and M. Mishkin. 1990. Visual impairments in macaques following inferior temporal lesions are exacerbated selectively by additional damage to superior temporal sulcus. Behavioural Brain Research 39:262–274. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008. Subthreshold auditory inputs to extrastriate visual neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific coding. Brain Research 1242:95–101. Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal subthreshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–549. Allon, N., and Z. Wollberg. 1978. Responses of cells in the superior colliculus of the squirrel monkey to auditory stimuli. Brain Research 159:321–330. Andersen, R.A., L.H. Snyder, D.C. Bradley, and J. Xing. 1997. Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience 20:303–330. Anderson, J., I. Lampl, I. Reichova, M. Carandini, and D. Ferster. 2000. Stimulus dependence of two-state fluctuations of membrane potential in cat visual cortex. Nature Neuroscience 3:617–621. Anderson, K.C., and R.M. Siegel. 1999. Optic flow selectivity in the anterior superior temporal polysensory area, STPa, of the behaving monkey. Journal of Neuroscience 19:2681–2691. Anderson, K.C., and R.M. Siegel. 2005. Three-dimensional structure-from-motion selectivity in the anterior superior temporal polysensory area STPs of the behaving monkey. Cerebral Cortex 15:1299–1307. Aosaki, T., M. Kimura, and A.M. Graybiel. 1995. Temporal and spatial characteristics of tonically active neurons of the primate’s striatum. Journal of Neurophysiology 73:1234–1252. Aou, S., Y. Oomura, H. Nishino, et al. 1983. Functional heterogeneity of single neuronal activity in the monkey dorsolateral prefrontal cortex. Brain Research 260:121–124. Artchakov, D., D. Tikhonravov, V. Vuontela, I. Linnankoski, A. Korvenoja, and S. Carlson. 2007. Processing of auditory and visual location information in the monkey prefrontal cortex. Experimental Brain Research 180:469–479. Azuma, M., and H. Suzuki. 1984. Properties and distribution of auditory neurons in the dorsolateral prefrontal cortex of the alert monkey. Brain Research 298:343–346. Baizer, J.S., L.G. Ungerleider, and R. Desimone. 1991. Organization of visual inputs to the inferior temporal and posterior parietal cortex in macaques. Journal of Neuroscience 11:168–190. Baizer, J.S., R. Desimone, and L.G. Ungerleider. 1993. Comparison of subcortical connections of inferior temporal and posterior parietal cortex in monkeys. Visual Neuroscience 10:59–72. Barbas, H., H. Ghashghaei, S.M. Dombrowski, and N.L. Rempel-Clower. 1999. Medial prefrontal cortices are unified by common connections with superior temporal cortices and distinguished by input from memory-related areas in the rhesus monkey. Journal of Comparative Neurology 410:343–367. Barbas, H., and M.M. Mesulam. 1981. Organization of afferent input to subdivisions of area 8 in the rhesus monkey. Journal of Comparative Neurology 200:407–431. Barnes, C.L., and D.N. Pandya. 1992. Efferent cortical connections of multimodal cortex of the superior temporal sulcus in the rhesus monkey. Journal of Comparative Neurology 318:222–244. Barraclough, N.E., D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience 17:377–391.
88
The Neural Bases of Multisensory Processes
Baylis, G.C., E.T. Rolls, and C.M. Leonard. 1987. Functional subdivisions of the temporal lobe neocortex. Journal of Neuroscience 7:330–342. Bell, A.H., B.D. Corneil, D.P. Munoz, and M.A. Meredith. 2003. Engagement of visual fixation suppresses sensory responsiveness and multisensory integration in the primate superior colliculus. European Journal of Neuroscience 18:2867–2873. Benevento, L.A., J. Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57:849–872. Besle, J., Bertrand, O., and Giard, M.H. 2009. Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex. Hearing Research 258:143–151. Blatt, G.J., D.N. Pandya, and D.L. Rosene. 2003. Parcellation of cortical afferents to three distinct sectors in the parahippocampal gyrus of the rhesus monkey: An anatomical and neurophysiological study. Journal of Comparative Neurology 466:161–179. Bologninia, N., I. Senna, A. Maravita, A. Pascual-Leone, and L.B. Merabet. 2010. Auditory enhancement of visual phosphene perception: The effect of temporal and spatial factors and of stimulus intensity. Neuroscience Letters 477:109–114. Bon, L., and C. Lucchetti. 2006. Auditory environmental cells and visual fixation effect in area 8B of macaque monkey. Experimental Brain Research 168:441–449. Born, R.T., and D.C. Bradley. 2005. Structure and function of visual area MT. Annual Review of Neuroscience 28:157–189. Brosch, M., E. Selezneva, and H. Scheich. 2005. Nonauditory events of a behavioral procedure activate auditory cortex of highly trained monkeys. Journal of Neuroscience. 25:6797–6806. Brothers, L., B. Ring, and A. Kling. 1990. Response of neurons in the macaque amygdala to complex social stimuli. Behavioural Brain Research 41:199–213. Bruce, C.J., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology 46:369–384. Bruce, C.J., R. Desimone, and C.G. Gross. 1986. Both striate cortex and superior colliculus contributes to visual properties of neurons in superior temporal polysensory area of macaque monkey. Journal of Neurophysiology 55:1057–1075. Burton, H., and E.G. Jones. 1976. The posterior thalamic region and its cortical projection in new world and old world monkeys. Journal of Comparative Neurology 168:249–302. Carmichael, S.T., and J.L. Price. 1995. Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. Journal of Comparative Neurology 363:642–664. Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex 11:1110–1123. Calvert, G.A., and R. Campbell. 2003. Reading speech from still and moving faces: The neural substrates of visible speech. Journal of Cognitive Neuroscience 15:57–70. Campanella, S., and P. Belin. 2007. Integrating face and voice in person perception. Trends in Cognitive Sciences 11:535–543. Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience 22:2886–2902. Cappe, C., A. Morel, P. Barone, and E. Rouiller. 2009. The thalamocortical projection systems in primate: An anatomical support for multisensory and sensorimotor interplay. Cerebral Cortex 19:2025–2037. Cappe, C., M.M. Murray, P. Barone, and E.M. Rouiller. 2010. Multisensory facilitation of behavior in monkeys: Effects of stimulus intensity. Journal of Cognitive Neuroscience 22:2850–2863. Cate, A.D., T.J. Herron, E.W. Yund, et al. 2009. Auditory attention activates peripheral visual cortex. PLoS ONE 4:e4645. Cavada, C., and P.S. Goldman-Rakic. 1989a. Posterior parietal cortex in rhesus monkey: I. Parcellation of areas based on distinctive limbic and sensory corticocortical connections. Journal of Comparative Neurology 287:393–421. Cavada, C., and P.S. Goldman-Rakic. 1989b. Posterior parietal cortex in rhesus monkey: II. Evidence for segregated corticocortical networks linking sensory and limbic areas with the frontal lobe. Journal of Comparative Neurology 287:422–445. Cavada, C., T. Company, J. Tejedor, R.J. Cruz-Rizzolo, and F. Reinoso-Suarez. 2000. The anatomical connections of the macaque monkey orbitofrontal cortex. A review. Cerebral Cortex 10:220–242. Chakladar, S., N.K. Logothetis, and C.I. Petkov. 2008. Morphing rhesus monkey vocalizations. Journal of Neuroscience Methods 170:45–55.
Audiovisual Integration in Nonhuman Primates
89
Chandrasekaran, C., and A.A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. Journal of Neurophysiology 101:773–788. Chandrasekaran, C., A. Trubanova, S. Stillittano, A. Caplier, and A.A. Ghazanfar. 2009. The natural statistics of audiovisual speech. PLoS Computational Biology 5:e1000436. Chen, C.M., P. Lakatos, A.S. Shah, et al. 2007. Functional anatomy and interaction of fast and slow visual pathways in macaque monkeys. Cerebral Cortex 17:1561–1569. Cheney, D.L., and Seyfarth, R.M. 1990. How Monkeys See the World. Chicago: Univ. of Chicago Press. Ciaramitaro, V.M., G.T. Buracas, and G.M. Boynton. 2007. Spatial and crossmodal attention alter responses to unattended sensory information in early visual and auditory human cortex. Journal of Neurophysiology 98:2399–2413. Clower, D.M., R.A. West, J.C. Lynch, and P.L. Strick. 2001. The inferior parietal lobule is the target of output from the superior colliculus, hippocampus, and cerebellum. Journal of Neuroscience. 21:6283–6291. Cohen, Y.E. 2009. Multimodal activity in the parietal cortex. Hearing Research 258:100–105. Cohen, Y.E., and R.A. Andersen. 2000. Reaches to sounds encoded in an eye-centered reference frame. Neuron 27:647–652. Cohen, Y.E., and R.A. Andersen. 2002. A common reference frame for movement plans in the posterior parietal cortex. Nature Reviews. Neuroscience 3:553–562. Cohen, Y.E., A.P. Batista, and R.A. Andersen. 2002. Comparison of neural activity preceding reaches to auditory and visual stimuli in the parietal reach region. Neuroreport 13:891–894. Cohen, Y.E., I.S. Cohen, and G.W. Gifford III. 2004. Modulation of LIP activity by predictive auditory and visual cues. Cerebral Cortex 14:1287–1301. Cohen, Y.E., B.E. Russ, S.J. Davis, A.E. Baker, A.L. Ackelson, and R. Nitecki. 2009. A functional role for the ventrolateral prefrontal cortex in non-spatial auditory cognition. Proceedings of the National Academy of Sciences of the United States of America 106:20045–20050. Colombo, M., and C.G. Gross. 1994. Responses of inferior temporal cortex and hippocampal neurons during delayed matching to sample in monkeys (Macaca fascicularis). Behavioral Neuroscience 108:443–455. Colombo, M., H.R. Rodman, and C.G. Gross. 1996. The effects of superior temporal cortex lesions on the processing and retention of auditory information in monkeys (Cebus apella). Journal of Neuroscience. 16:4501–4517. Cooke, D.F., and M.S.A. Graziano. 2004a. Super-flinchers and nerves of steel: Defensive movements altered by chemical manipulation of a cortical motor area. Neuron 43:585–593. Cooke, D.F., and M.S.A. Graziano. 2004b. Sensorimotor integration in the precentral gyrus: Polysensory neurons and defensive movements. Journal of Neurophysiology 91:1648–1660. Cui, Q.N., L. Bachus, E. Knoth, W.E. O’Neill, and G.D. Paige. 2008. Eye position and cross-sensory learning both contribute to prism adaptation of auditory space. Progress in Brain Research 171:265–270. Cui, J., M. Wilke, N.K. Logothetis, D.A. Leopold, and H. Liang. 2009. Visibility states modulate microsaccade rate and direction. Vision Research 49:228–236. Cusick, C.G., B. Seltzer, M. Cola, and E. Griggs. 1995. Chemoarchitectonics and corticocortical terminations within the superior temporal sulcus of the rhesus monkey: Evidence for subdivisions of superior temporal polysensory cortex. Journal of Comparative Neurology 360:513–535. Cynader, M., and N. Berman. 1972. Receptive field organization of monkey superior colliculus. Journal of Neurophysiology 35:187–201. Dahl, C.D., N.K. Logothetis, and C. Kayser. 2009. Spatial organization of multisensory responses in temporal association cortex. Journal of Neuroscience. 29:11924–11932. de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006a. Cortical connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:27–71. de la Mothe, L.A., S. Blumell, Y. Kajikawa, and T.A. Hackett. 2006b. Thalamic connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496:72–96. De Souza, W.C., S. Eifuku, R. Tamura, H. Nishijo, and T. Ono. 2005. Differential characteristics of face neuron responses within the anterior superior temporal sulcus of macaques. Journal of Neurophysiology 94:1251–1566. Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multisensory convergence. Cerebral Cortex 14:387–403. Desimone, R., and C.G. Gross. 1979. Visual areas in the temporal cortex of the macaque. Brain Research 178:363–380.
90
The Neural Bases of Multisensory Processes
Diederich, A., and H. Colonius. 2004. Modeling the time course of multisensory interaction in manual and saccadic responses. In Handbook of Multisensory Processes, ed. G. Calvert, C. Spence, and B.E. Stein, 373–394. Cambridge, MA: MIT Press. Disbrow, E., E. Litinas, G.H. Recanzone, J. Padberg, and L. Krubitzer. 2003. Cortical connections of the second somatosensory area and the parietal ventral area in macaque monkeys. Journal of Comparative Neurology 462:382–399. Dobelle, W.H., M.G. Mladejovsky, and J.P. Girvin. 1974. Artificial vision for the blind: Electrical stimulation of visual cortex offers hope for a functional prosthesis. Science 183:440–444. Duffy, C.J., and R.H. Wurtz. 1991. Sensitivity of MST neurons to optic flow stimuli: I. A continuum of response selectivity to large-field stimuli. Journal of Neurophysiology 65:1329–1345. Eaccott, M.J., C.A. Heywood, C.G. Gross, and A. Cowey. 1993. Visual discrimination impairments following lesions of the superior temporal sulcus are not specific for facial stimuli. Neuropsychologia 31:609–619. Eifuku, S., W.C. De Souza, R. Tamura, H. Nishijo, and T. Ono. 2004. Neuronal correlates of face identification in the monkey anterior temporal cortical areas. Journal of Neurophysiology 91:358–371. Engbert, R., and R. Kliegl. 2003. Microsaccades uncover the orientation of covert attention. Vision Research 43:1035–1045. Evans, T.A., S. Howell, and G.C. Westergaard. 2005. Auditory–visual cross-modal perception of communicative stimuli in tufted capuchin monkeys (Cebus apella). Journal of Experimental Psychology. Animal Behavior Processes 31:399–406. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience. 22:5749–5759. Falchier, A., C.E. Schroeder, T.A. Hackett, et al. 2010. Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. Cerebral Cortex 20:1529–1538. Felleman, D.J., and J.H. Kaas. 1984. Receptive field properties of neurons in middle temporal visual area (MT) of owl monkeys. Journal of Neurophysiology 52:488–513. Fogassi, L., V. Gallese, L. Fadiga, F. Luppino, M. Matelli, and G. Rizzolatti. 1996. Coding of peripersonal space in inferior premotor cortex (area F4). Journal of Neurophysiology 76:141–157. Frens, M.A., and A.J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in monkey superior colliculus. Brain Research Bulletin 46:211–224. Frens, M.A., A.J. Van Opstal, and R.F. Van der Willigen. 1995. Spatial and temporal factors determine auditory– `visual interactions in human saccadic eye movements. Perception & Psychophysics 57:802–816. Fu, K.G., T.A. Johnston, A.S. Shah, et al. 2003. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience. 23:7510–7515. Fu, K.G., A.S. Shah, M.N. O’Connell, et al. 2004. Timing and laminar profile of eye-position effects on auditory responses in primate auditory cortex. Journal of Neurophysiology 92:3522–3531. Fuster, J.M., M. Bodner, and J.K. Kroger. 2000. Cross-modal and cross-temporal association in neurons of frontal cortex. Nature 405:347–351. Gaffan, D., and S. Harrison. 1991. Auditory–visual associations, hemispheric specialization and temporal– frontal interaction in the rhesus monkey. Brain 114:2133–2144. Ghazanfar, A.A., and N.K. Logothetis. 2003. Facial expressions linked to monkey calls. Nature 423:934–934. Ghazanfar, A.A., and L.R. Santos. 2004. Primate brains in the wild: The sensory bases for social interactions. Nature Reviews. Neuroscience 5:603–616. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10:278–285. Ghazanfar, A.A., J.G. Neuhoff, and N.K. Logothetis. 2002. Auditory looming perception in rhesus monkeys. Proceedings of the National Academy of Sciences of the United States of America 99:15755–15757. Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience. 25:5004–5012. Ghazanfar, A.A., K. Nielsen, and N.K. Logothetis. 2006. Eye movements of monkey observers viewing vocalizing conspecifics. Cognition 101:515–529. Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience. 28:4457–4469. Giard, M.H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–490. Gibson, J.R., and J.H.R. Maunsell. 1997. Sensory modality specificity of neural activity related to memory in visual cortex. Journal of Neurophysiology 78:1263–1275.
Audiovisual Integration in Nonhuman Primates
91
Gifford III, G.W., and Y.E. Cohen. 2005. Spatial and non-spatial auditory processing in the lateral intraparietal area. Experimental Brain Research 162:509–512. Gifford III, G.W., K.A. MacLean, M.D. Hauser, and Y.E. Cohen. 2005. The neurophysiology of functionally meaningful categories: Macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. Journal of Cognitive Neuroscience 17:1471–1482. Goldman-Rakic, P.S., A.R. Cools, and K. Srivastava. 1996. The prefrontal landscape: Implications of functional architecture for understanding human mentation and the central executive. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 351:1445–1453. Goodale, M.A., and A.D. Milner. 1992. Separate visual pathways for perception and action. Trends in Neurosciences 15:20–25. Graziano, M.S.A., and S. Gandhi. 2000. Location of the polysensory zone in the precentral gyrus of anesthetized monkeys. Experimental Brain Research 135:259–266. Graziano, M.S.A., X.T. Hu, and C.G. Gross. 1997. Visuospatial properties of ventral premotor cortex. Journal of Neurophysiology 77:2268–2292. Graziano, M.S.A., L.A.J. Reiss, and C.G. Gross. 1999. A neuronal representation of the location of nearby sounds. Nature 397:428–430. Graziano, M.S.A., G.S. Yap, and C.G. Gross. 1994. Coding of visual space by premotor neurons. Science 266:1054–1057. Green, K.P., P.K. Kuhl, A.N. Meltzoff, and E.B. Stevens. 1991. Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception & Psychophysics 50:524–536. Groh, J.M., A.S. Trause, A.M. Underhill, K.R. Clark, and S. Inati. 2001. Eye position influences auditory responses in primate inferior colliculus. Neuron 29:509–518. Grunewald, A., J.F. Linden, and R.A. Andersen. 1999. Responses to auditory stimuli in macaque lateral intraparietal area I. Effects of training. Journal of Neurophysiology 82:330–342. Gu, Y., D.E. Angelaki, and G.C. DeAngelis. 2008. Neural correlates of multisensory cue integration in macaque MSTd. Nature Neuroscience 11:1201–1210. Hackett, T.A. 2002. The comparative anatomy of the primate auditory cortex. In: Primate Audition: Ethology and Neurobiology, ed. Asif A. Ghazanfar, 199–226. Boca Raton, FL: CRC. Hackett, T.A., L.A. de la Mothe, I. Ulbert, G. Karmos, J.F. Smiley, and C.E. Schroeder. 2007. Multisensory convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology 502:894–923. Hackett, T.A., T.M. Preuss, and J.H. Kaas. 2001. Architectonic identification of the core region in auditory cortex of macaques, chimpanzees, and humans. Journal of Comparative Neurology 441:197–222. Hackett, T.A, I. Stepniewska, and J.H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain Research 817:45–58. Hairston, W.D., D.A. Hodges, J.H. Burdette, and M.T. Wallace. 2006. Auditory enhancement of visual temporal order judgment. Neuroreport 17:791–795. Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology 60:1615–1637. Hikosaka, O., M. Sakamoto, and S. Usui. 1989. Functional properties of monkey caudate neurons: II. Visual and auditory responses. Journal of Neurophysiology 61:799–813. Hoffman, K.L., A.A. Ghazanfar, I. Gauthier, and N.K. Logothetis. 2008. Category-specific responses to faces and objects in primate auditory cortex. Frontiers in Systems Neuroscience 1:2. Hoffman, K.L., K.M. Gothard, M.C. Schmid, and N.K. Logothetis. 2007. Facial-expression and gaze-selective responses in the monkey amygdala. Current Biology 17:766–772. Ito, S. 1982. Prefrontal activity of macaque monkeys during auditory and visual reaction time tasks. Brain Research 247:39–47. Iversen, S.D., and M. Mishkin. 1973. Comparison of superior temporal and inferior prefrontal lesions on auditory and non-auditory task in rhesus monkeys. Brain Research 55:355–367. Izumi, A., and S. Kojima. 2004. Matching vocalizations to vocalizing faces in chimpanzee (Pan troglodytes). Animal Cognition 7:179–184. Jääskeläinen, I.P., J. Ahveninen, J.W. Belliveau, T. Raij, and M. Sams. 2007. Short-term plasticity in auditory cognition. Trends in Neurosciences 30:653–661. Jay, M.F., and D.L. Sparks. 1984. Auditory receptive fields in primate superior colliculus shift with changes in eye position. Nature 309:345–347. Jones, E.G. 1998. Viewpoint: The core and matrix of thalamic organization. Neuoroscience 85:331–345.
92
The Neural Bases of Multisensory Processes
Jordan, K.E., E.M. Brannon, N.K. Logothetis, and A.A. Ghazanfar. 2005. Monkeys match the number of voices they hear to the number of faces they see. Current Biology 15:1034–1038. Joseph, J.P., and P. Barone. 1987. Prefrontal unit activity during a delayed oculomotor task in the monkey. Experimental Brain Research 67:460–468. Kaas, J.H., and T.A. Hackett. 2000. Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences of the United States of America 97:11793–11799. Kajikawa, Y., C.E. Schroeder. 2008. Face–voice integration and vocalization processing in the monkey. Abstracts Society for Neuroscience 852.22. Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices and their role in sensory integration. Frontiers in Integrative Neuroscience 3:7. Kayser, C.I., C.I. Petkov, M. Augath, and N.K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48:373–384. Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation of specific fields in auditory cortex. Journal of Neuroscience 27:1824–1835. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–1574. Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices and their role in sensory integration. Frontiers in Integrative Neuroscience 3:7. Kayser, C., N.K. Logothetis, and S. Panzeri. 2010. Visual enhancement of the information representation in auditory cortex. Current Biology 20:19–24. Keysers, C., E. Kohler, M.A. Umilta, L. Nanetti, L. Fogassi, and V. Gallese. 2003. Audiovisual mirror neurons and action recognition. Experimental Brain Research 153:628–636. Kikuchi-Yorioka, Y., and T. Sawaguchi. 2000. Parallel visuospatial and audiospatial working memory processes in the monkey dorsolateral prefrontal cortex. Nature Neuroscience 3:1075–1076. Kimura, M. 1992. Behavioral modulation of sensory responses of primate putamen neurons. Brain Research 578:204–214. Knudsen, E.I., and P.F. Knudsen. 1989. Vision calibrates sound localization in developing barn owls. Journal of Neuroscience 9:3306–3313. Kohler, E., C. Keysers, M.A. Umilta, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing sounds, understanding actions: Action representation in mirror neurons. Science 297:846–848. Kojima, S., A. Izumi, and M. Ceugniet. 2003. Identification of vocalizers by pant hoots, pant grants and screams in a chimpanzee. Primates 44:225–230. Kondo, H., K.S. Saleem, and J.L. Price. 2003. Differential connections of the temporal pole with the orbital and medial prefrontal networks in macaque monkeys. Journal of Comparative Neurology 465:499–523. Kosmal, A., M. Malinowska, and D.M. Kowalska. 1997. Thalamic and amygdaloid connections of the auditory association cortex of the superior temporal gyrus in rhesus monkey (Macaca mulatta). Acta Neurobiologiae Experimentalis 57:165–188. Kubota, K., M. Tonoike, and A. Mikami. 1980. Neuronal activity in the monkey dorsolateral prefrontal cortex during a discrimination task with delay. Brain Research 183:29–42. Kuraoka, K., and K. Nakamura. 2007. Responses of single neurons in monkey amygdala to facial and vocal emotions. Journal of Neurophysiology 97:1379–1387. Lakatos, P., C.-M. Chen, M. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–292. Lakatos, P., G., Karmos, A.D. Mehta, I. Ulbert, and C.E. Schroeder. 2008. Entrainment of neural oscillations as a mechanism of attentional selection. Science 320:110–113. Lakatos, P., M.N. O’Connell, A. Barczak, A. Mills, D.C. Javitt, and C.E. Schroeder. 2009. The leading sense: Supramodal control of neurophysiological context by attention. Neuron 64:419–430. Lakatos, P., A.S. Shaw, K.H. Knuth, I. Ulbert, G. Karmos, and C.E. Schroeder. 2005. An oscillatory hierarchy controlling neuronal excitability and stimulu processing in the auditory cortex. Journal of Neurophysiology 94:1904–1911. Lehmann, C., M. Herdener, F. Esposito, et al. 2006. Differential patterns of multisensory interactions in core and belt areas of human auditory cortex. Neuroimage 31:294–300. Lehmann, S., and M.M. Murray. 2005. The role of multisensory memories in unisensory object discrimination. Brain Research. Cognitive Brain Research 24:326–334. Leonard, C.M., E.T. Rolls. F.A. Wilson and G.C. Baylis. 1985. Neurons in the amygdala of the monkey with responses selective for faces. Behavioural Brain Research 15:159–176. Levy, R., and P.S. Goldman-Rakic. 2000. Segregation of working memory functions within the dorsolateral prefrontal cortex. Experimental Brain Research 133:23–32.
Audiovisual Integration in Nonhuman Primates
93
Lewis, J.W., and D.C. Van Essen. 2000. Corticocortical connections of visual, sensorimotor, and multi modal pro cessing areas in the parietal lobe of the macaque monkey. Journal of Comparative Neurology 428:112–137. Linden, J.F., A. Grunewald, and R.A. Andersen. 1999. Responses to auditory stimuli in macaque lateral intraparietal area: II. Behavioral modulation. Journal of Neurophysiology 82:343–358. Maier, J.X., J.G. Neuhoff, N.K. Logothetis, and A.A. Ghazanfar. 2004. Multisensory integration of looming signals by rhesus monkeys. Neuron 43:177–181. Maier, J.X., C. Chandrasekaran, and A.A. Ghazanfar. 2008. Integration of bimodal looming signals through neuronal coherence in the temporal lobe. Current Biology 18:963–968. Martinez, L., and T. Matsuzawa. 2009. Auditory–visual intermodal matching based on individual recognition in a chimpanzee (Pan troglodytes). Animal Cognition 12:S71–S85. Matsumoto, N., T. Minamimoto, A.M. Graybiel, and M. Kimura. 2001. Neurons in the thalamic CM-Pf complex supply striatal neurons with information about behaviorally significant sensory events. Journal of Neurophysiology 85:960–976. Mazzoni, P., R.P. Bracewell, S. Barash, and R.A. Andersen. 1996. Spatially tuned auditory responses in area LIP of macaques performing delayed memory saccades to acoustic targets. Journal of Neurophysiology 75:1233–1241. McDonald, J.J., W.A. Teder-Sälejärvi, F. Di Russo, and S.A. Hillyard. 2003. Neural substrates of perceptual enhancement by cross-modal spatial attention. Journal of Cognitive Neuroscience 15:10–19. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–748. McNaughton, B.L., F.P. Battagllia, O. Jensen, E.I. Moser, and M.B. Moser. 2006. Path integration and the neural basis of the ‘cognitive map.’ Nature Reviews. Neuroscience 7:663–678. Mehta, A.D., U. Ulbert, and C.E. Schroeder. 2000a. Intermodal selective attention in monkeys: I. Distribution and timing of effects across visual areas. Cerebral Cortex 10:343–358. Mehta, A.D., U. Ulbert, and C.E. Schroeder. 2000b. Intermodal selective attention in monkeys: II. Physiological mechanisms of modulation. Cerebral Cortex 10:359–370. Meredith, M.A., B.L. Allman, L.P. Keniston, and H.R. Clemo. 2009. Auditory influences on non-auditory cortices. Hearing Research 258:64–71. Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons: I. Temporal factors. Journal of Neuroscience 7:3215–3229. Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221:389–391. Meyer K., J.T. Kaplan, R. Essec, C. Webber, H. Damasio, and A. Damasio. 2010. Predicting visual stimuli on the basis of activity in auditory cortices. Nature Neuroscience 13:667–668. Miller, J.O. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology 14:247–279 Miller, J., R. Ulrich, and Y. Lanarre. 2001. Locus of the redundant-signals effect in bimodal divided attention: A neurophysiological analysis. Perception & Psychophysics 63:555–562. Mohedano-Moriano, A., P. Pro-Sistiaga, M.M. Arroyo-Jimenez, et al. 2007. Topographical and laminar distribution of cortical input to the monkey entorhinal cortex. Journal of Anatomy 211:250–260. Mohedano-Moriano, A., A. Martinez-Marcos, P. Pro-Sistiaga, et al. 2008. Convergence of unimodal and polymodal sensory input to the entorhinal cortex in the fascicularis monkey. Neuroscience 151:255–271. Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory– visual interactions during early sensory processing in humans: A high-density electrical mapping study. Brain Research. Cognitive Brain Research 14, 115–128. Molholm, S., A. Martinez, M. Shpaner, and J.J. Foxe. 2007. Object-based attention is multisensory: Co-activation of an object’s representations in ignored sensory modalities. European Journal of Neuroscience 26: 499–509. Mullette-Gilman, O.A., Y.E. Cohen, and J.M. Groh. 2005. Eye-centered, head-centered, and complex coding of visual and auditory targets in the intraparietal sulcus. Journal of Neurophysiology 94:2331–2352. Mullette-Gilman, O.A., Y.E. Cohen, and J.M. Groh. 2009. Motor-related signals in the intraparietal cortex encode locations in a hybrid, rather than eye-centered reference frame. Cerebral Cortex 19:1761–1775. Murata, A., L. Fadiga, L. Fogassi, V. Gallese, V. Raos, and G. Rizzolatti. 1997. Object representation in the ventral premotor cortex (area F5) of the monkey. Journal of Neurophysiology 78:2226–2230. Murray, E.A., and D. Gaffan. 1994. Removal of the amygdala plus subjacent cortex disrupts the retention of both intramodal and crossmodal associative memories in monkeys. Behavioral Neuroscience 108:494–500. Murray, E.A., and B.J. Richmond. 2001. Role of perirhinal cortex in object perception, memory, and associations Current Opinion in Neurobiology 11:188–193.
94
The Neural Bases of Multisensory Processes
Murray, M.M., C.M. Michel, R.G. de Peralta, et al. 2004. Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging. Neuroimage 21:125–135. Murray, M.M., J.J. Foxe, and G.R. Wylie. 2005. The brain uses single-trial multisensory memories to discriminate without awareness. Neuroimage 27:473–478. Musacchia, G., M. Sams, T. Nicol, and N. Kraus. 2006. Seeing speech affects acoustic information processing in the human brainstem. Experimental Brain Research 168:1–10. Musacchia, G., and C.E. Schroeder. 2009. Neuronal mechanisms, response dynamics and perceptual functions of multisensory interactions in auditory cortex. Hearing Research 258:72–79. Nager, W., K. Estorf, and T.F. Münte. 2006. Crossmodal attention effects on brain responses to different stimulus classes. BMC Neuroscience 7:31. Navarra, J., A. Alsius, S. Soto-Faraco, and C. Spence. 2010. Assessing the role of attention in the audiovisual integration of speech. Information Fusion 11:4–11. Neal, J.W., R.C. Pearson, and T.P. Powell. 1990. The connections of area PG, 7a, with cortex in the parietal, occipital and temporal lobes of the monkey. Brain Research 532:249–264. Nelissen, K., W. Vanduffel, and G.A. Orban. 2006. Charting the lower superior temporal region, a new motionsensitive region in monkey superior temporal sulcus. Journal of Neuroscience 26:5929–5947. Newman, J.D., and D.F. Lindsley. 1976. Single unit analysis of auditory processing in squirrel monkey frontal cortex. Experimental Brain Research 25:169–181. Nishijo, H., T. Ono, and H. Nishino. 1988a. Topographic distribution of modality-specific amygdalar neurons in alert monkey. Journal of Neuroscience 8:3556–3569. Nishijo, H., T. Ono, and H. Nishino. 1988b. Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. Journal of Neuroscience 8:3570–3583. Nyberg, L., R. Habib, A.R. McIntosh, and E. Tulving. 2000. Reactivation of encoding-related brain activity during memory retrieval. Proceedings of the National Academy of Sciences of the United States of America 97:11120–11124. Ono, T., K. Nakamura, H. Nishijo, and S. Eifuku. 1993. Monkey hippocampal neurons related to spatial and nonspatial functions. Journal of Neurophysiology 70:1516–1529. Oram, M.W., and D.I. Perrett. 1996. Integration of form and motion in the anterior superior temporal polysensory area (STPa) of the macaque monkey. Journal of Neurophysiology 76:109–129. Oram, M.W., D.I. Perrett, and J.K. Hietanen. 1993. Directional tuning of motion-sensitive cells in the anterior superior temporal polysensory area of the macaque. Experimental Brain Research 97:274–294. Padberg, J., B. Seltzer, and C.G. Cusick. 2003. Architectonics and cortical connections of the upper bank of the superior temporal sulcus in the rhesus monkey: An analysis in the tangential plane. Journal of Comparative Neurology 467:418–434. Padberg, J., E. Disbrow, and L. Krubitzer. 2005. The organization and connections of anterior and posterior parietal cortex in titi monkeys: Do new world monkeys have an area 2? Cerebral Cortex 15:1938–1963. Parr, L.A., E. Hecht, S.K. Barks, T.M. Preuss, and J.R. Votaw. 2009. Face processing in the chimpanzee brain. Current Biology 19:50–53. Partan, S.R. 2002. Single and multichannel signal composition: Facial expressions and vocalizations of rhesus macaques (Macaca mulatta). Behavior 139:993–1027. Perrett, D.I., E.T. Rolls, and W. Caan. 1982. Visual neurones responsive to faces in the monkey temporal cortex. Experimental Brain Research 47:329–342. Perrott, D.R., K. Saberi, K. Brown, and T.Z. Strybel. 1990. Auditory psychomotor coordination and visual search performance. Perception & Psychophysics 48:214–226. Petkov, C.I., C. Kayser, T. Steudel, K. Whittingstall, M. Augath, and N.K. Logothetis. 2008. A voice region in the monkey brain. Nature Neuroscience 11:367–374. Petrides, M., and D.N. Pandya. 2002. Comparative cytoarchitectonic analysis of the human and the macaque ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey. European Journal of Neuroscience 16:291–310. Petrides, M., and D.N. Pandya. 2009. Distinct parietal and temporal pathways to the homologues of Broca’s area in the monkey. PLoS Biology 7:e1000170. Phelps, E.A., and J.E. LeDoux. 2005. Contributions of the amygdala to emotion processing: From animal models to human behavior. Neuron 48:175–187. Pinsk, M.A., K. DeSimone, T. Moore, C.G. Gross, and S. Kastner. 2005. Representations of faces and body parts in macaque temporal cortex: A functional MRI study. Proceedings of the National Academy of Sciences of the United States of America 102:6996–7001. Poremba, A., R.C. Saunders, A.M. Crane, M. Cook, L. Sokoloff, and M. Mishkin. 2003. Functional mapping of the primate auditory system. Science 299:568–572.
Audiovisual Integration in Nonhuman Primates
95
Porter, K.K., R.R. Metzger, and J.M. Groh. 2007. Visual- and saccade-related signals in the primate inferior colliculus. Proceedings of the National Academy of Sciences of the United States of America 104:17855–17860. Posner, M.I., C.R.R. Snyder, and D.J. Davidson. 1980. Attention and the detection of signals. Journal of Experimental Psychology. General 109:160–174. Raab, D.H. 1962. Statistical facilitation of simple reaction times. Transactions of the New York Academy of Sciences 24:574–590. Rahne, T., and M. Böckmann-Barthel. 2009. Visual cues release the temporal coherence of auditory objects in auditory scene analysis. Brain Research 1300:125–134. Ramos-Estebanez, C., L.B. Merabet, K. Machii, et al. 2007. Visual phosphene perception modulated by subthreshold crossmodal sensory stimulation. Journal of Neuroscience 27:4178–4181. Rao, S.C., G. Rainer, and E.K. Miller. 1997. Integration of what and where in the primate prefrontal cortex. Science 276:821–824. Rauschecker, J.P., and B. Tian. 2000. Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences of the United States of America 97:11800–11806. Rauschecker, J.P., B. Tian, and M. Hauser. 1995. Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268:111–114. Rauschecker, J.P., and L.R. Harris. 1989. Auditory and visual neurons in the cat’s superior colliculus selective for the direction of apparent motion stimuli. Brain Research 490:56–63. Recanzone, G.H., D.C. Guard, M.L. Phan, and T.K. Su. 2000. Correlation between the activity of single auditory cortical neurons and sound-localization behavior in the macaque monkey. Journal of Neurophysiology 83:2723–2739. Ringo, J.L., and S.G. O’Neill. 1993. Indirect inputs to ventral temporal cortex of monkey: The influence on unit activity of alerting auditory input, interhemispheric subcortical visual input, reward, and the behavioral response. Journal of Neurophysiology 70:2215–2225. Rizzolatti, G., and L. Craighero. 2004. The mirror-neuron system. Annual Review of Neuroscience 27:169–192. Rizzolatti, G., L. Fadiga, V. Gallese, and L. Fogassi. 1996. Premotor cortex and the recognition of motor actions. Brain Research. Cognitive Brain Research 3:131–141. Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology 50:19–26. Rolls, E.T., H.D. Critchley, A.S. Browning, and K. Inoue. 2006. Face-selective and auditory neurons in the primate orbitofrontal cortex. Experimental Brain Research 170:74–87. Romanski, L.M., B.B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. Journal of Neurophysiology 93:734–747. Romanski, L.M., J.F. Bates, and P.S. Goldman-Rakic. 1999a. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 403:141–157. Romanski, L.M., and P.S. Goldman-Rakic. 2002. An auditory domain in primate prefrontal cortex. Nature Neuroscience 5:15–16. Romanski, L.M., B. Tian, J. Fritz, M. Mishkin, P.S. Goldman-Rakic, and J.P. Rauschecker. 1999b. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience 2:1131–1136. Romei, V., M.M. Murray, L.B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions. Journal of Neuroscience 27:11465–11472. Romei, V., M.M. Murray, C. Cappe, and G. Thut. 2009. Preperceptual and stimulus-selective enhancement of low-level human visual cortex excitability by sounds. Current Biology 19:1799–1805. Russ, B.E., A.L. Ackelson, A.E. Baker, and Y.E. Cohen. 2008. Coding of auditory-stimulus identity in the auditory non-spatial processing stream. Journal of Neurophysiology 99:87–95. Saleem, K.S., W. Suzuki, K. Tanaka, and T. Hashikawa. 2000. Connections between anterior inferotemporal cortex and superior temporal sulcus regions in the macaque monkey. Journal of Neuroscience 20:5083–5101. Saleem, K.S., H. Kondo, and J.L. Price. 2008. Complementary circuits connecting the orbital and medial prefrontal networks with the temporal, insular, and opercular cortex in the macaque monkey. Journal of Comparative Neurology 506:659–693. Sams, M., R. Aulanko, M. Hämäläinen, et al. 1991. Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters 127:141–145. Santangelo V., and C. Spence. 2009. Crossmodal exogenous orienting improves the accuracy of temporal order judgments. Experimental Brain Research 194:577–586.
96
The Neural Bases of Multisensory Processes
Santos-Benitez, H., C.M. Magarinos-Ascone, and E. Garcia-Austt. 1995. Nucleus basalis of Meynert cell responses in awake monkeys. Brain Research Bulletin 37:507–511. Schiff, W., J.A. Caviness, and J.J. Gibson. 1962. Persistent fear responses in rhesus monkeys to the optical stimulus of “looming.” Science 136:982–983. Schlack, A., S.J. Sterbing-D’Angelo, K. Hartung, K.-P. Hoffmann, and F. Bremmer. 2005. Multisensory space representations in the macaque ventral intraparietal area. Journal of Neuroscience 25:4616–4625. Schmolesky, M.T., Y. Wang, D.P. Hanes, et al. 1998. Signal timing across the macaque visual system. Journal of Neurophysiology 79:3272–3278. Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Research. Cognitive Brain Research 14:187–198. Schroeder, C.E., and J.J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Current Opinion in Neurobiology 15:454–458. Schroeder, C.E., and P. Lakatos. 2009. Low-frequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences 32:9–18. Schroeder, C.E., P. Lakatos, Y. Kajikawa, S. Partan, and A. Puce. 2008. Neuronal oscillations and visual amplification of speech. Trends in Cognitive Sciences 12:106–113. Schroeder, C.E., R.W. Lindsley, C. Specht, A. Marcovici, J.F. Smilery, and D.C. Javitt. 2001. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85:1322–1327. Seltzer, B., M.G. Cola, C. Gutierrez, M. Massee, C. Weldon, and C.G. Cusick. 1996. Overlapping and nonoverlapping cortical projections to cortex of the superior temporal sulcus in the rhesus monkey: Double anterograde tracer studies. Journal of Comparative Neurology 370:173–190. Seltzer, B., and D.N. Pandya. 1978. Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Research 149:1–24. Seltzer, B., and D.N. Pandya. 1989. Frontal lobe connections of the superior temporal sulcus in the rhesus monkey. Journal of Comparative Neurology 281:97–113. Seltzer, B., and D.N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology 343:445–463. Sherman, S.M., and R.W. Guillery. 2002. The role of the thalamus in the flow of information to the cortex. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 357:1695–1708. Sliwa, J., J.-R. Duhamel, O. Paxsalis, and S.C. Wirth. 2009. Cross-modal recognition of identity in rhesus monkeys for familiar conspecifics and humans. Abstracts Society for Neuroscience 684.14. Smiley, J.F., T.A. Hackett, I. Ulbert, et al. 2007. Multisensory convergence in auditory cortex, I. Cortical connections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology 502:894–923. Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimental Psychology. Human Perception and Performance 35:580–587. Squire, L.R., C.E.L. Stark, and R.E. Clark. 2004. The medial temporal lobe. Annual Review of Neuroscience 27:279–306. Starr, A., and M. Don. 1972. Responses of squirrel monkey (Samiri sciureus) medial geniculate units to binaural click stimuli. Journal of Neurophysiology 35:501–517. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., W. Jiang, M.T. Wallace, and T.R. Stanford. 2001. Nonvisual influences on visual-information processing in the superior colliculus. Progress in Brain Research 134:143–156. Stein, B.E., M.W. Wallace, T.R. Stanford, and W. Jiang. 2002. Cortex governs multisensory integration in the midbrain. Neuroscientist 8:306–314. Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews. Neuroscience 9:255–266. Stevenson, R.A., and T.W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. Neuroimage 44:1210–1223. Stricane, B., R.A. Andersen, and P. Mazzoni. 1996. Eye-centered, head-centered, and intermediate coding of remembered sound locations in area LIP. Journal of Neurophysiology 76:2071–2076. Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–11147. Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26:212–215.
Audiovisual Integration in Nonhuman Primates
97
Suzuki, W.A., and D.G. Amaral. 1994. Perirhinal and parahippocampal cortices of the macaque monkey: Cortical afferents. Journal of Comparative Neurology 350:497–533. Talsma, D., D. Senkowski, and M.G. Woldorff. 2009. Intermodal attention affects the processing of the temporal alignment of audiovisual stimuli. Experimental Brain Research 198:313–328. Tamura, R., T. Ono, M. Fukuda, and K. Nakamura. 1992. Spatial responsiveness of monkey hippocampal neurons to various visual and auditory stimuli. Hippocampus 2:307–322. Tanaka, K., K. Hikosaka, H. Saito, M. Yukie, Y. Fukada, and E. Iwai. 1986. Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey. Journal of Neuroscience 6:134–144. Tanibuchi I., and P.S. Goldman-Rakic. 2003. Dissociation of spatial-, object-, and sound-coding neurons in the mediodorsal nucleus of the primate thalamus. Journal of Neurophysiology 89:1067–1077. Teder-Sälejärvi, W.A., T.F. Münte, F. Sperlich, and S.A. Hillyard. 1999. Intra-modal and cross-modal spatial attention to auditory and visual stimuli. An event-related brain potential study. Brain Research. Cognitive Brain Research 8:327–343. Théoret, H., L. Merabet, and A. Pascual-Leone. 2004. Behavioral and neuroplastic changes in the blind: Evidence for functionally relevant cross-modal interactions. Journal of Physiology, Paris 98:221–233. Tian, B., D. Reser, A. Durham, A. Kustov, and J.P. Rauschecker. 2001. Functional specialization in rhesus monkey auditory cortex. Science 292:290–293. Tsao, D.Y., W.A. Freiwald, R.B.H. Tootell, and M.S. Livingstone. 2006. A cortical region consisting entirely of face-selective cells. Science 311:670–674. Tsao, D.Y., S. Moeller, and W.A. Freiwald. 2008a. Comparing face patch systems in macaques and humans. Proceedings of the National Academy of Sciences of the United States of America 105:19514–19519. Tsao, D.Y., N. Schweers, S. Moeller, and W.A. Freiwald. 2008b. Patches of face-selective cortex in the macaque frontal lobe. Nature Neuroscience 11:877–879. Turner, B.H., M. Mishkin, and M. Knapp. 1980. Organization of the amygdalopetal projections from modalityspecific cortical association areas in the monkey. Journal of Comparative Neurology 191:515–543. Ungerleider, L.G., and M. Mishkin. 1982. Two cortical visual systems. In Analysis of Visual Behavior, ed. D.J. Ingle, M.A. Goodale, and R.J.W. Mansfield, 549–586. Cambridge: MIT Press. Ungerleider, L.G., S.M. Courtney, and J.V. Haxby. 1998. A neural system for human vision working memory. Proceedings of the National Academy of Sciences of the United States of America 95:883–890. Updyke, B.V. 1974. Characteristics of unit responses in superior colliculus of the cebus monkey. Journal of Neurophysiology 37:896–909. Vaadia, E., D.A. Benson, R.D. Hienz, and M.H. Goldstein Jr. 1986. Unit study of monkey frontal cortex: Active localization of auditory and of visual stimuli. Journal of Neurophysiology 56:934–952. van Attenveldt, N., A. Roebroeck, and R. Goebel. 2009. Interaction of speech and script in human auditory cortex: Insights from neuro-imaging and effective connectivity. Hearing Research 258:152–164. Vatakis, A., A.A. Ghazanfar, and C. Spence. 2008. Facilitation of multisensory integration by the “unity effect” reveals that speech is special. Journal of Vision 8(9):14. von Kriegstein, K., and A.-L. Giraud. 2006. Implicit multisensory associations influence voice recognition. PLoS Biology 4:e326. Wallace, M.T., L.K. Wilkinson, and B.E. Stein. 1996. Representation and integration of multiple sensory inputs in primate superior colliculus. Journal of Neurophysiology 76:1246–1266. Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo-auditory interactions in the primary visual cortex of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9:79. Watanabe, M. 1992. Frontal units of the monkey coding the associative significance of visual and auditory stimuli. Experimental Brain Research 89:233–247. Watanabe, J., and E. Iwai. 1991. Neuronal activity in visual, auditory and polysensory areas in the monkey temporal cortex during visual fixation task. Brain Research Bulletin 26:583–592. Welch, R., and D. Warren. 1986. Intersensory interactions. In Handbook of Perception and Human Performance, ed. K.R. Boff, L. Kaufman, and J.P. Thomas, 21–36. New York: Wiley. Werner-Reiss, U., K.A. Kelly, A.S. Trause, A.M. Underhill, and J.M. Groh. 2006. Eye position affects activity in primary auditory cortex of primates. Current Biology 13:554–562. Wheeler, M.E., S.E. Petersen, and R.L. Buckner. 2000. Memory’s echo: Vivid remembering reactivates sensory-specific cortex. Proceedings of the National Academy of Sciences of the United States of America 97:11125–11129. Wilson, F.A.W., and E.T. Rolls. 1990. Neuronal responses related to reinforcement in the primate basal forebrain. Brain Research 509:213–231.
98
The Neural Bases of Multisensory Processes
Wilson, F.A.W., S.P.O. Scalaidhe, and P.S. Goldman-Rakic. 1993. Dissociation of object and spatial processing in primate prefrontal cortex. Science 260:1955–1958. Wollberg, Z., and J. Sela. 1980. Frontal cortex of the awake squirrel monkey: Responses of single cells to visual and auditory stimuli. Brain Research 198:216–220. Woods, T.M., and G.H. Recanzone. 2004. Visually induced plasticity of auditory spatial perception in macaques. Current Biology 14:1559–1564. Wurtz, R.H., and J.E. Albano. 1980. Visual–motor function of the primate superior colliculus. Annual Review of Neuroscience 3:189–226. Yeterian, E.H., and D.N. Pandya. 1989. Thalamic connections of the cortex of the superior temporal sulcus in the rhesus monkey. Journal of Comparative Neurology 282:80–97. Zangenehpour, S., A.A. Ghazanfar, D.J. Lewkowicz, and R.J. Zatorre. 2009. Heterochrony and cross-species intersensory matching by infant vervet monkeys. PLoS ONE 4:e4302.
6
Multisensory Influences on Auditory Processing Perspectives from fMRI and Electrophysiology Christoph Kayser, Christopher I. Petkov, Ryan Remedios, and Nikos K. Logothetis
CONTENTS 6.1 Introduction.............................................................................................................................99 6.2 The Where and How of Sensory Integration......................................................................... 100 6.3 Using Functional Imaging to Localize Multisensory Influences in Auditory Cortex........... 101 6.4 Multisensory Influences along the Auditory Processing Stream.......................................... 102 6.5 Multisensory Influences and Individual Neurons.................................................................. 104 6.6 Multisensory Influences and Processing of Communication Signals................................... 106 6.7 Conclusions............................................................................................................................ 109 References....................................................................................................................................... 109
6.1 INTRODUCTION Traditionally, perception has been described as a modular function, with the different sensory modalities operating as independent and separated processes. Following this view, sensory integration supposedly occurs only after sufficient unisensory processing and only in higher association cortices (Jones and Powell 1970; Ghazanfar and Schroeder 2006). Studies in the past decade, however, promote a different view, and demonstrate that the different modalities interact at early stages of processing (Kayser and Logothetis 2007; Schroeder and Foxe 2005; Foxe and Schroeder 2005). A good model for this early integration hypothesis has been the auditory cortex, where multisensory influences from vision and touch have been reported using a number of methods and experimental paradigms (Kayser et al. 2009c; Schroeder et al. 2003; Foxe and Schroeder 2005). In fact, anatomical afferents are available to provide information about nonacoustic stimuli (Rockland and Ojima 2003; Cappe and Barone 2005; Falchier et al. 2002) and neuronal responses showing cross-modal influences have been described in detail (Lakatos et al. 2007; Kayser et al. 2008, 2009a; Bizley et al. 2006). These novel insights, together with the traditional notion that multisensory processes are more prominent in higher association regions, suggest that sensory integration is a rather distributed process that emerges over several stages. Of particular interest in the context of sensory integration are stimuli with particular behavioral significance, such as sights and sounds related to communication (Campanella and Belin 2007; Petrini et al. 2009; Ghazanfar and Logothetis 2003; von Kriegstein and Giraud 2006; von Kriegstein et al. 2006). Indeed, a famous scenario used to exemplify sensory integration—the cocktail party— concerns exactly this: when in a loud and noisy environment, we can better understand a person talking to us when we observe the movements of his/her lips at the same time (Sumby and Polack 99
100
The Neural Bases of Multisensory Processes
1954; Ross et al. 2007). In this situation, the visual information about lip movements enhances the (perceived) speech signal, hence providing an example of how visual information can enhance auditory perception. However, as for many psychophysical phenomena, the exact neural substrate mediating the sensory integration underlying this behavioral benefit remains elusive. In this review, we discuss some of the results of early multisensory influences on auditory processing, and provide evidence that sensory integration occurs distributed and across several processing stages. In particular, we discuss some of the methodological aspects relevant for studies seeking to localize and characterize multisensory influences, and emphasize some of the recent results pertaining to speech and voice integration.
6.2 THE WHERE AND HOW OF SENSORY INTEGRATION To understand how the processing of acoustic information benefits from the stimulation of other modalities, we need to investigate “where” along auditory pathways influences from other modalities occur, and “how” they affect the neural representation of the sensory environment. Noteworthy, the questions of “where” and “how” address different scales and levels of organization. Probing the “where” question requires the observation of sensory responses at many stages of processing, and hence a large spatial field of view. This is, for example, provided by functional imaging, which can assess signals related to neural activity in multiple brain regions at the same time. Probing the “how” question, in contrast, requires an investigation of the detailed neural representation of sensory information in localized regions of the brain. Given our current understanding of neural information processing, this level is best addressed by electrophysiological recordings that assess the responses of individual neurons, or small populations thereof, at the same time (Donoghue 2008; Kayser et al. 2009b; Quian Quiroga 2009). These two approaches, functional imaging (especially functional magnetic resonance imaging (fMRI)-blood oxygenation level-dependent (BOLD) signal) and electrophysiology, complement each other not only with regard to the sampled spatiotemporal dimensions, but also with regard to the kind of neural activity that is seen by the method. Although electrophysiological methods sample neural responses at the timescale of individual action potentials (millisecond preci sion) and the spatial scale of micrometers, functional imaging reports an aggregate signal derived from (subthreshold) responses of millions of neurons sampled over several hundreds of micrometers and hundreds of milliseconds (Logothetis 2002, 2008; Lauritzen 2005). In fact, because the fMRIBOLD signal is only indirectly related to neuronal activity, it is difficult, at least at the moment, to make detailed inferences about neuronal responses from imaging data (Leopold 2009). As a result, both methods provide complementary evidence on sensory integration. In addition to defining methods needed to localize and describe sensory interactions, operational criteria are required to define what kind of response properties are considered multisensory influences. At the level of neurons, many criteria have been derived from seminal work on the superior colliculus by Stein and Meredith (1993). Considering an auditory neuron, as an example, visual influences would be assumed if the response to a bimodal (audiovisual) stimulus differs significantly from the unimodal (auditory) response. Although this criterion can be easily implemented as a statistical test to search for multisensory influences, it is, by itself, not enough to merit the conclusion that an observed process merits the label “sensory integration.” At the level of behavior, sensory integration is usually assumed if the bimodal sensory stimulus leads to a behavioral gain compared with the unimodal stimulus (Ernst and Bülthoff 2004). Typical behavioral gains are faster responses, higher detection rates, or improved stimulus discriminability. Often, these behavioral gains are highest when individual unimodal stimuli are least effective in eliciting responses, a phenomenon known as the principle of inverse effectiveness. In addition, different unimodal stimuli are only integrated when they are perceived to originate from the same source, i.e., when they occur coincident in space and time. Together, these two principles provide additional criteria to decide whether a particular neuronal process might be related to sensory integration (Stein 1998, 2008).
Multisensory Influences on Auditory Processing
101
This statistical criterion, in conjunction with the verification of these principles, has become the standard approach to detect neural processes related to sensory integration. In addition, recent work has introduced more elaborate concepts derived from information theory and stimulus decoding. Such methods can be used to investigate whether neurons indeed become more informative about the sensory stimuli, and whether they allow better stimulus discrimination in multisensory compared to unisensory conditions (Bizley et al. 2006; Bizley and King 2008; Kayser et al. 2009a).
6.3 USING FUNCTIONAL IMAGING TO LOCALIZE MULTISENSORY INFLUENCES IN AUDITORY CORTEX Functional imaging is by far the most popular method to study the cortical basis of sensory integration, and many studies report multisensory interactions between auditory, visual, and somatosensory stimulation in association cortices of the temporal and frontal lobes (Calvert 2001). In addition, a number of studies reported that visual or somatosensory stimuli activate regions in close proximity to the auditory cortex or enhance responses to acoustic stimuli in these regions (Calvert and Campbell 2003; Calvert et al. 1997, 1999; Pekkola et al. 2005; Lehmann et al. 2006; van Atteveldt et al. 2004; Schurmann et al. 2006; Bernstein et al. 2002; Foxe et al. 2002; Martuzzi et al. 2006; van Wassenhove et al. 2005). Together, these studies promoted the notion of early multisensory interactions in the auditory cortex. However, the localization of multisensory influences is only as good as the localization of those structures relative to which the multisensory influences are defined. To localize multisensory effects to the auditory core (primary) or belt (secondary) fields, one needs to be confident about the location of these auditory structures in the respective subjects. Yet, this can be a problem given the small scale and variable position of auditory fields in individual subjects (Kaas and Hackett 2000; Hackett et al. 1998; Fullerton and Pandya 2007; Clarke and Rivier 1998; Chiry et al. 2003). One way to overcome this would be to first localize individual areas in each subject and to analyze functional data within these regions of interest. Visual studies often follow this strategy by mapping visual areas using retinotopically organized stimuli, which exploit the well-known functional organization of the visual cortex (Engel et al. 1994; Warnking et al. 2002). Auditory studies, in principle, could exploit a similar organization of auditory cortex, known as tonotopy, to define individual auditory fields (Rauschecker 1998; Rauschecker et al. 1995; Merzenich and Brugge 1973). In fact, electrophysiological studies have demonstrated that several auditory fields contain an ordered representation of sound frequency, with neurons preferring similar sound frequencies appearing in clusters and forming continuous bands encompassing the entire range from low to high frequencies (Merzenich and Brugge 1973; Morel et al. 1993; Kosaki et al. 1997; Recanzone et al. 2000). In addition, neurons in the auditory core and belt show differences in their preferences to narrow and broadband sounds, providing a second feature to distinguish several auditory fields (Rauschecker 1998; Rauschecker et al. 1997) (Figure 6.1a). Yet, although these properties in principle provide characteristics to differentiate individual auditory fields, this has proven surprisingly challenging in human fMRI studies (Wessinger et al. 2001; Formisano et al. 2003; Talavage et al. 2004). To sidestep these difficulties, we exploited high-resolution imaging facilities in combination with a model system for which there exists considerably more prior knowledge about the organization of the auditory cortex: the macaque monkey. This model system allows imaging voxel sizes on the order of 0.5 × 0.5 mm, whereas conventional human fMRI studies operate on a resolution of 3 × 3 mm (Logothetis et al. 1999). Much of the evidence about the anatomical and functional structure of the auditory cortex originates from this model system, providing important a priori information about the expected organization (Kaas and Hackett 2000; Hackett et al. 1998; Rauschecker and Tian 2004; Recanzone et al. 2000). Combining this a priori knowledge with high-resolution imaging systems as well as optimized data acquisition for auditory paradigms, we were able to obtain a tonotopic functional parcellation in individual animals (Petkov et al. 2006, 2009). By comparing the activation to stimulation with sounds of different frequency compositions, we obtained a smoothed
102
The Neural Bases of Multisensory Processes (b)
Belt Parabelt
RTM RM MM CM
High
RT RTL R
A1
AL ML
CL
CPB
Caudal
Lateral
Core (PAC)
Low
RPB
Low High Low
Frequency preference Low freq.
Broad Narrow Broad Bandwidth preference
High freq.
5 mm
Rostral Caudal
Frequency map Frequency (kHz) 16 8 4 2 1 0.5
Lateral
Rostral
Frequency preferences
(a)
FIGURE 6.1 (See color insert.) Mapping individual auditory fields using fMRI. (a) Schematic of organization of monkey auditory cortex. Three primary auditory fields (core region) are surrounded by secondary fields (belt region) as well as higher association areas (parabelt). Electrophysiological studies have shown that several of these fields contain an ordered representation of sound frequency (tonotopic map, indicated on left), and that core and belt fields prefer narrow- and broadband sounds, respectively. These two functional properties can be exploited to map layout of these auditory fields in individual subjects using functional imaging. (b) Single-slice fMRI data showing frequency-selective BOLD responses to low and high tones (left panel) and a complete (smoothed) frequency map obtained from stimulation using six frequency bands (right panel). Combining frequency map with an estimate of core region and anatomical landmarks to delineate the parabelt results in a full parcellation of auditory cortex in individual subjects. This parcellation is indicated in the left panel as white dashed lines and is shown in full in panel a.
frequency preference map which allowed determining the anterior–posterior borders of potential fields. In addition, the preference to sounds of different bandwidths often allowed a segregation of core and belt fields, hence providing borders in medial–lateral directions. When combined with the known organization of auditory cortex, the evidence from these activation patterns allowed a more complete parcellation into distinct core and belt fields, and provided constraints for the localization of the parabelt regions (Figure 6.1b). This functional localization procedure for auditory fields now serves as a routine tool to delineate auditory structures in experiments involving auditory cortex.
6.4 MULTISENSORY INFLUENCES ALONG THE AUDITORY PROCESSING STREAM In search for a better localization of multisensory influences in the auditory cortex reported by human imaging studies, we combined the above localization technique with audiovisual and audio-tactile stimulation paradigms (Kayser et al. 2005, 2007). To localize multisensory influences, we searched for regions (voxels) in which responses to acoustic stimuli were significantly enhanced when a visual stimulus was presented at the same time. Because functional imaging poses particular constraints on statistical contrasts (Laurienti et al. 2005), we used a conservative formulation of this criterion in which multisensory influences are defined as significant superadditive effects, i.e., the response in the bimodal condition is required to be significantly stronger than the sum of the two unisensory responses: AV > (A + V). In our experiments, we employed naturalistic stimuli in order to activate those regions especially involved in the processing of everyday scenarios. These stimuli included scenes of conspecific animals vocalizing as well as scenes showing other animals in their natural settings. In concordance with previous reports, we found that visual stimuli indeed influence fMRI responses to acoustic stimuli within the classical auditory cortex. These visual influences were strongest in the caudal portions of the auditory cortex, especially in the caudo–medial and caudo–lateral belt, portions of the medial belt, and the caudal parabelt (Figure 6.2a and b). These multisensory
103
Multisensory Influences on Auditory Processing (a)
p<10–7
p<0.01
(b)
RT R MM A1 CM
CL
Percent change
CPB 2
0
Dorsal
(d)
Core Medial belt Lateral belt STG uSTS
% of total activation
Caudal
(c)
1
40 30 20 10 0
A1
Caudal Parabelt/ uSTS belt STG Lower areas Higher areas
Rostral
FIGURE 6.2 (See color insert.) Imaging multisensory influences in monkey auditory cortex. (a) Data from an experiment with audiovisual stimulation. Sensory activation to auditory (left) and visual (right) stimuli are shown on single image slices (red to yellow voxels). An outline of auditory fields is indicated (white lines). Time course illustrates multisensory enhancement during combined audiovisual stimulation (data from one session, averaged over 36 repeats of the stimulus). For details, see Kayser, C. et al. (2007). (b) Schematic of auditory fields exhibiting significant visual influences. Visual influences (shown in blue) were most prominent in caudal fields, and effects in A1 were only observed in alert animals. (From Kayser, C. et al., J. Neurosci., 27, 1824–1835, 2007. With permission.) (c) Three-dimensional rendering of a segment of a monkey brain. Different structures investigated in fMRI experiments are color coded and comprise classical auditory cortex (core and belt) as well as auditory association cortex (parabelt) and general association cortex (STS). Please note that this figure serves as an illustration only, and individual structures have been sketched based on approximate anatomical location, not on functional criteria. (d) Strength of visual influence along auditory hierarchy. The graph displays contribution of responses to (unimodal) visual stimuli to total fMRI-BOLD activation obtained during auditory, visual, and audiovisual stimulation. This was computed as a fraction (in percentage) of BOLD response to visual stimulation relative to sum of BOLD responses to all three conditions. Visual contribution increases from lower to higher areas.
interactions in secondary and higher auditory regions occurred reliably in both anesthetized and alert animals. In addition, we found multisensory interactions in the core region A1, but only in the alert animal, indicating that these early interactions could be dependent on the vigilance of the animal, perhaps involving cognitive or top-down influences. To rule out nonspecific modulatory projections as the source of these effects, we tested two functional criteria of sensory integration: the
104
The Neural Bases of Multisensory Processes
principles of temporal coincidence and inverse effectiveness. We found both criteria to be obeyed, and multisensory influences were stronger when sensory stimuli were in temporal coincidence and when unisensory stimuli were less effective in eliciting BOLD responses. Overall, these findings not only confirm previous results from human imaging, but also localize multisensory influences mostly to secondary fields and demonstrate a clear spatial organization, with caudal regions being most susceptible to multisensory inputs (Kayser et al. 2009c). In addition to providing a good localization of cross-modal influences (the “where” question), functional imaging can also shed light on the relative influence of visual stimuli on auditory processing at several processing stages. Because fMRI allows measuring responses at many locations at the same time, we were able to quantify visual influences along multiple stages in the caudal auditory network (Figure 6.2c). Using the above-mentioned localization technique in conjunction with anatomical landmarks, we defined several regions of interest outside the classical auditory cortex: these comprised the caudal parabelt, the superior temporal gyrus, as well as the upper bank of the STS (uSTS). The uSTS is a well-known multisensory area where neuronal responses as well as fMRI activations to stimulation of several modalities have been described (Benevento et al. 1977; Bruce et al. 1981; Beauchamp et al. 2004, 2008; Dahl et al. 2009). As a result, one should expect a corresponding increase in visual influence when proceeding from the auditory core to the uSTS. This was indeed the case, as shown in Figure 6.2d: visual influences were relatively small in auditory core and belt fields, as described above. In the parabelt/STG region, an auditory association cortex, visual influences already contributed a considerable proportion to the total activation, and were yet much stronger in the uSTS. As a rule of thumb, it seemed that the contribution of visual stimuli to the total measured activation roughly doubled from stage to stage along this hierarchy. Although human functional imaging has described multisensory influences at different stages of auditory processing, and in a number of behavioral contexts, imaging studies with the animal model localized these influences to the identified areas. These results promote a model in which multisensory influences already exist at early processing stages and progressively increase in higher areas. This suggests that sensory integration is a distributed process involving several processing stages to varying degrees, in opposition to the traditional idea of a modular organization of sensory processing into independent unisensory processes modules.
6.5 MULTISENSORY INFLUENCES AND INDIVIDUAL NEURONS Having localized multisensory influences to particular auditory fields, the obvious question arises of whether and how nonauditory inputs improve the processing of acoustic information. As noted above, this “how” question is ideally investigated using electrophysiological methods, for two reasons. First, the imaging signal reflects neuronal activity only indirectly and does not permit definite conclusions about the underlying neuronal processes (Logothetis 2008; Kayser et al. 2009c; Laurienti et al. 2005). And second, electrophysiology can directly address those parameters that are believed to be relevant for neural information processing, such as the spike count of individual neurons, temporal patterns of action potentials, or the synchronous firing of several neurons (Kayser et al. 2009b). Several electrophysiological studies have characterized multisensory influences in the auditory cortex. Especially at the level of subthreshold activity, as defined by field potentials and current source densities, strong visual or somatosensory influences were reported (Ghazanfar et al. 2005, 2008; Lakatos et al. 2007; Schroeder and Foxe 2002; Schroeder et al. 2001, 2003). These multisensory influences were widespread, in that they occurred at the vast majority of recording sites in each of these studies. In addition, these multisensory influences were not restricted to secondary areas but also occurred in regions functionally and anatomically characterized as primary auditory cortex (Kayser et al. 2008; Lakatos et al. 2007). Given that field potentials are especially sensitive to synaptic activity in the vicinity of the electrode (Mitzdorf 1985; Juergens et al. 1999; Logothetis
Multisensory Influences on Auditory Processing
105
2002), these observations demonstrate that multisensory input to the auditory cortex occurs at the synaptic level. These results provide a direct neural basis for the multisensory influences seen in imaging studies, but do not yet reveal whether the neural information representation benefits from the multisensory input. Other studies provide evidence for multisensory influences on the firing of individual neurons in the auditory cortex. For example, measurements in ferret auditory cortex revealed that 15% of the neurons in core fields are sensitive no nonauditory inputs such as flashes of light (Bizley et al. 2006; and see Cappe et al. 2007 for similar results in monkeys). We investigated such visual influences in the macaque and found that a similar proportion (12%) of neurons in the auditory core revealed multisensory interactions in their firing rates. Of these, nearly 4% responded to both acoustic and visual stimuli when presented individually, and hence, constitute bimodal neurons. The remaining 8% responded to unimodal sounds but did not respond to unimodal visual stimuli; however, their responses were enhanced (or reduced) by the simultaneous presentation of both stimuli. This response pattern does not conform to the traditional notion of bimodal neurons but represents a kind of multisensory influence typically called subthreshold response modulation (Dehner et al. 2004). Similar subthreshold response modulations have been observed in a number of cortical areas (Allman et al. 2008a, 2008b; Allman and Meredith 2007; Meredith and Allman 2009), and suggest that multisensory influences can fall along a continuum, ranging from true unimodal neurons to the classical bimodal neuron that exhibits suprathreshold responses to stimuli in several modalities. Noteworthy, the fraction of neurons with significant multisensory influences in the auditory cortex was considerably smaller than the fraction of sites showing similar response properties in the local field potential (LFP), or the spatial area covered by the voxels showing multisensory responses in the imaging data. Hence, although visual input seems to be widely present at the subthreshold level, only a minority of neurons actually exhibit significant changes of their firing rates. This suggests that the effect of visual stimulation on auditory information coding in early auditory cortex is weaker than one would estimate from the strong multisensory influences reported in imaging studies. When testing the principles of temporal coincidence and inverse effectiveness for these auditory cortex neurons, we found both to be obeyed: the relative timing of auditory and visual stimuli was as important in shaping the multisensory influence as was the efficacy of the acoustic stimulus (Kayser et al. 2008). Similar constraints of spatiotemporal stimulus alignment on audiovisual response modulations in the auditory cortex have been observed in other studies as well (Bizley et al. 2006). Additional experiments using either semantically congruent or incongruent audiovisuals revealed that visual influences in the auditory cortex also show specificity to more complex stimulus attributes. For example, neurons integrating information about audiovisual communication signals revealed reduced visual modulation when the acoustic communication call was paired with a moving disk instead of the movie displaying the conspecific animal (Ghazanfar et al. 2008). A recent study also revealed that pairing a natural sound with a mismatching movie abolishes multisensory benefits for acoustic information representations (Kayser et al. 2009a). Altogether, this suggests that visual influences in the primary and secondary auditory fields indeed provide functionally specific visual information. Given that imaging studies reveal an increase of multisensory influence in higher auditory regions, one should expect a concomitant increase in the proportion of multisensory neurons. Indeed, when probing neurons in a classical association cortex, such as the STS, much stronger multisensory influences are visible in the neurons firing. Using the same stimuli and statistical criteria, a recent study revealed a rather homogeneous population of unimodal and bimodal neurons in the upper bank STS (Dahl et al. 2009): about half the neurons responded significantly to both sensory modalities, whereas 28% of the neurons preferred the visual and 19% preferred the auditory modality. Importantly, this study not only revealed a more complex interplay of auditory and visual information representations in this region, but detailed electrophysiological mappings
106
The Neural Bases of Multisensory Processes
demonstrated that a spatial organization of neurons according to their modality preferences exists in the STS: neurons preferring the same modality (auditory or visual) co-occurred in close spatial proximity or occurred intermingled with bimodal neurons, whereas neurons preferring different modalities occurred only spatially separated. This organization at the scale of individual neurons led to extended patches of same modality preference when analyzed at the scale of millimeters, revealing large-scale regions that preferentially respond to the same modality. These results lend support to the notion that topographical organizations might serve as a general principle of integrating information within and across the sensory modalities (Beauchamp et al. 2004; Wallace et al. 2004). These insights from studies of multisensory integration at the neuronal level are in concordance with the notion that sensory integration is a distributed hierarchical process that extends over several processing stages. Given the difficulty in characterizing and interpreting the detailed effect of multisensory influences at a single processing stage, a comparative approach might prove useful: comparing multisensory influences at different stages using the same stimuli might help not only in understanding the contribution of individual stages to the process of sensory integration, but also facilitate the understanding of the exact benefit for a particular region to receive multisensory input.
6.6 MULTISENSORY INFLUENCES AND PROCESSING OF COMMUNICATION SIGNALS The above findings clearly reveal that the processing of auditory information is modulated by visual (or somatosensory) information already at processing stages in or close to the primary auditory cortex. Noteworthy, these cross-modal influences were seen not only in the context of naturalistic stimuli, but also for very simple and artificial stimuli. For example, visual influences on neuronal firing rates occurred when using flashes of light, short noise bursts, or very rapid somatosensory stimulation (Bizley et al. 2006; Lakatos et al. 2007; Kayser et al. 2008; Cappe et al. 2007; Bizley and King 2008). This suggests that multisensory influences in early auditory fields are not specialized for natural stimuli such as communication sounds, but rather reflect a more general process that is sensitive to basic stimulus attributes such as relative timing, relative position, or other semantic attributes. To those especially interested in the neural basis of communication or speech, this poses the immediate question of where in the brain multisensory influences are specialized for such stimuli and mediate the well-known behavioral benefits of integration. As seen above, the cocktail party effect—the integration of face and voice information—serves as one of the key examples to illustrate the importance of audiovisual integration; the underlying neural substrate, however, remains elusive. One approach to elucidate this could be to focus on those cortical regions in which neural processes directly related to the processing of communication sounds have been reported. Besides the classical speech areas, in case of the human brain, a number of other areas have been implicated in the nonhuman primate: response preferences to conspecific vocalizations have been reported in the lateral belt (Tian et al. 2001), the insula cortex (Remedios et al. 2009), in a voice area on the anterior temporal plane (Petkov et al. 2008), and in the ventrolateral prefrontal cortex (Cohen et al. 2007; Romanski et al. 2005). Noteworthy, several of these stages have not only been investigated in the context of purely auditory processing but have also been assessed for audiovisual integration. The lateral belt is one of the regions classically implicated in an auditory “what” pathway concerned with the processing of acoustic object information (Romanski et al. 1999; Rauschecker and Tian 2000). The process of object segmentation or identification could well benefit from input from other modalities. Indeed, studies have reported that audiovisual interactions in the lateral belt are widespread at the level of LFPs and include about 40% of the recorded units (Ghazanfar et al. 2005).
Multisensory Influences on Auditory Processing
107
In fact, the multisensory influences in this region were found to depend on stimulus parameters such as the face–voice onset asynchrony or the match of visual and acoustic vocalizations, suggesting a good degree of specificity of the visual input. At the other end of this pathway, in the ventrolateral prefrontal cortex, 46% of the neurons were found to reflect audiovisual components of vocalization signals (Sugihara et al. 2006). Although the existence of a dedicated “what” pathway is still debated (Bizley and Walker 2009; Hall 2003; Wang 2000), these results highlight the prominence of multisensory influences in the implicated areas. In addition to these stages of the presumed “what” pathway, two other regions have recently been highlighted in the context of vocal communication sounds. Recording in the primate insula, we recently found a large cluster of neurons that respond preferentially to conspecific vocalizations, when contrasted with a large set of other natural sounds (Remedios et al. 2009) (Figure 6.3a). Many of these neurons not only responded more strongly to conspecific vocalizations, but also responded selectively to only a few examples, and their responses allowed the decoding of the identity of individual vocalizations. This suggests that the insular cortex might play an important role in the representation of vocal communication sounds. Noteworthy, this response preference to conspecific vocalizations is also supported by functional imaging studies in animals (Figure 6.3b) and humans (Griffiths et al. 1997; Rumsey et al. 1997; Kotz et al. 2003; Meyer et al. 2002; Zatorre et al. 1994). In addition, lesions of the insula often manifest as deficits in sound or speech recognition (auditory agnosia) and speech production, confirming a central function of this structure in communication-related processes (Habib et al. 1995; Cancelliere and Kertesz 1990; Engelien et al. 1995). Noteworthy, some of the neurons in this auditory responsive region in the insula also show sensitivity to visual stimuli or response interactions during audiovisual stimulation (R. Remedios and C. Kayser, unpublished data). However, the vast majority of units in this structure are not affected by visual stimuli, suggesting that in this region, it is likely not concerned with the sensory integration of information related to communication calls, but mostly processes acoustic input. Another region that has recently been implicated in the processing of communication sounds is the so-called voice region in the anterior temporal lobe. A preference for the human voice, in particular, the identity of a human speaker, has been found in the human anterior temporal lobe (Belin and Zatorre 2003; Belin et al. 2000; von Kriegstein et al. 2003) and a similar preference for conspecific vocalizations and the identity of a monkey caller has been observed in the anterior temporal lobe of the nonhuman primate (Petkov et al. 2008). For example, highresolution functional imaging revealed several regions in the superior temporal lobe responding preferentially to the presentation of conspecific macaque vocalizations over other vocalizations and natural sounds (see the red clusters in the middle panel of Figure 6.3c), as has been seen in humans (Belin et al. 2000; von Kriegstein et al. 2003). These results can be interpreted as evidence for sensitivity to the acoustic features that distinguish the vocalizations of members of the species from other sounds. Further experiments have shown that one of these regions located in the anterior temporal lobe respond more vigorously to sounds that come from different speakers, whose meaning is constant, rather than to those that come from the same speaker, whose meaning and acoustics vary (Belin and Zatorre 2003; von Kriegstein et al. 2003; Petkov et al. 2008). These observations support the conclusion of a high-level correspondence in the processing of species-specific vocal features and a common cross-species substrate in the brains of human and nonhuman primates. Noteworthy, this human voice region can also be influenced by multisensory input. For instance, von Kriegstein and colleagues (2006) used face and voice stimuli to first localize the human “face” and “voice” selective regions. They then showed that the activity of each of these regions was modulated by multisensory input. Comparable evidence from the animal model is still unavailable. Ongoing work in our laboratory is pursuing this question (Perrodin et al. 2009a, 2009b).
108
The Neural Bases of Multisensory Processes Conspecific vocalizations (MVoc) (b) Environmental sounds (Esnd) Animal vocalizations (Asnd)
60 40 20 0 0
Rel. response
4
500 Time (s)
1000
Population (155 units)
Insula
1500 % Bold response
[lmp/s – baseline]
(a) 80
AC
2 0
Mvoc Asnd Esnd
–2 Mvoc
Asnd
Esnd
(c) Pro Ts1
Voice area Normalized response
Ts2
A1
Tpt Core Belt Parabelt
1.0
0.5
0
Mvoc Asnd Esnd
Preference for cons. vocalization Preference for other sounds
FIGURE 6.3 (See color insert.) Response preferences to (vocal) communication sounds. Preferences to conspecific communication sounds have been found in insula (panels a and b) and in anterior temporal lobe (panel c). In both cases, responses to conspecific communication sounds (Mvoc) have been contrasted with sounds of other animals (Asnd) and environmental sounds (Esnd). (a) Data from an electrophysiological investigation of insula neurons. (From Remedios, R. et al., J. Neurosci., 29, 1034–1045, 2009. With permission.) Upper panel displays one example neuron, showing a strong response to Mvocs. Lower panel displays normalized population response to three sound categories (mean ± SEM). (b) Example data from a single fMRI experiment showing voxels significantly preferring conspecific vocalizations over other sounds (color code) in a single slice. Such voxels were found in anterior auditory cortex (field TS2), core and lateral belt, and in insula. Bar plot displays BOLD signal change for different conditions (mean ± SEM for insula voxels). (c) Identification of a voice region in monkey brain using functional imaging. (From Petkov, C.I. et al., Nat. Neurosci., 11, 367–374, 2008. With permission.) Preferences to conspecific vocal sounds (red voxels) were found in caudal auditory cortex (as also seen in b), and on anterior temporal lobe (voice area). This location of voice area is consistent with studies on voice processing in human brain, and suggests a common basis of voice processing in human and nonhuman primates. Bar plot displays BOLD signal change in voice region for different sound conditions (mean ± SEM across experiments).
Multisensory Influences on Auditory Processing
109
6.7 CONCLUSIONS During everyday actions, we benefit tremendously from the combined input provided by our different sensory modalities. Although seldom experienced explicitly, only this combined sensory input makes an authentic and coherent percept of our environment possible (Adrian 1928; Stein and Meredith 1993). In fact, multisensory integration helps us to react faster or with higher precision (Calvert et al. 2004; Hershenson 1962), improves our learning capacities (Montesori 1967; Oakland et al. 1998), and sometimes even completely alters our percept (McGurk and MacDonald 1976). As a result, the understanding of sensory integration and its neural basis not only shed insights into brain function and perception, but could also provide improved strategies for learning and rehabilitation programs (Shams and Seitz 2008). Evidence from functional imaging and electrophysiology demonstrates that this process of sensory integration is likely distributed across multiple processing stages. Multisensory influences are already present at early stages, such as in the primary auditory cortex, but increase along the processing hierarchy and are ubiquitous in higher association cortices. Existing data suggest that multisensory influences at early stages are specific to basic stimulus characteristics such as spatial and temporal localization, but are not specialized toward particular kinds of stimuli, such as communication signals. Whether, where, and how multisensory influences become more specialized, remains to be investigated by future work. In this search, a comparative approach comparing the multisensory influences at multiple processing stages during the same stimulation paradigm might prove especially useful. And as highlighted here, this ideally precedes using a combination of methods that probe neural responses at different spatiotemporal scales, such as electrophysiology and functional imaging. Definitely, much remains to be learned until we fully understand the neural basis underlying the behavioral gains provided by multisensory stimuli.
REFERENCES Adrian, E.D. 1928. The Basis of Sensations. New York, Norton. Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal subthreshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–9. Allman, B.L., R.E. Bittencourt-Navarrete, L.P. Keniston et al. 2008a. Do cross-modal projections always result in multisensory integration? Cerebral Cortex 18:2066–76. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008b. Subthreshold auditory inputs to extrastriate visual neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific coding. Brain Research 1242:95–101. Beauchamp, M.S., B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin. 2004. Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7:1190–2. Beauchamp, M.S., N.E. Yasar, R.E. Frye, and T. Ro. 2008. Touch, sound and vision in human superior temporal sulcus. NeuroImage 41:1011–20. Belin, P., and R.J. Zatorre. 2003. Adaptation to speaker’s voice in right anterior temporal lobe. Neuroreport 14:2105–9. Belin, P., R.J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex. Nature 403:309–12. Benevento, L.A., J. Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57:849–72. Bernstein, L.E., E.T. Auer Jr., J.K. Moore et al. 2002. Visual speech perception without primary auditory cortex activation. Neuroreport 13:311–5. Bizley, J.K., and A.J. King. 2008. Visual–auditory spatial processing in auditory cortical neurons. Brain Research 1242:24–36. Bizley, J.K., and K.M. Walker. 2009. Distributed sensitivity to conspecific vocalizations and implications for the auditory dual stream hypothesis. Journal of Neuroscience 29:3011–3. Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2006. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89.
110
The Neural Bases of Multisensory Processes
Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology 46:369–84. Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex 11:1110–23. Calvert, G.A., and R. Campbell. 2003. Reading speech from still and moving faces: The neural substrates of visible speech. Journal of Cognitive Neuroscience 15:57–70. Calvert, G.A., M.J. Brammer, E.T. Bullmore et al. 1999. Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport 10:2619–23. Calvert, G., C. Spence, and B.E. Stein. 2004. The Handbook of Multisensory Processes. Cambridge: MIT Press. Calvert, G.A., E.T. Bullmore, M.J. Brammer et al. 1997. Activation of auditory cortex during silent lipreading. Science 276:593–6. Campanella, S., and P. Belin. 2007. Integrating face and voice in person perception. Trends in Cognitive Sciences 11:535–43. Cancelliere, A.E., and A. Kertesz. 1990. Lesion localization in acquired deficits of emotional expression and comprehension. Brain and Cognition 13:133–47. Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience 22:2886–902. Cappe, C., G. Loquet, P. Barone, and E.M. Rouiller. 2007. Neuronal responses to visual stimuli in auditory cortical areas of monkeys performing an audio-visual detection task. European Brain and Behaviour Society. Trieste. Chiry, O., E. Tardif, P.J. Magistretti, and S. Clarke. 2003. Patterns of calcium-binding proteins support parallel and hierarchical organization of human auditory areas. European Journal of Neuroscience 17:397–410. Clarke, S., and F. Rivier. 1998. Compartments within human primary auditory cortex: Evidence from cytochrome oxidase and acetylcholinesterase staining. European Journal of Neuroscience 10:741–5. Cohen, Y.E., F. Theunissen, B.E. Russ, and P. Gill. 2007. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology 97:1470–84. Dahl, C., N. Logothetis, and C. Kayser. 2009. Spatial organization of multisensory responses in temporal association cortex. Journal of Neuroscience 29:11924–32. Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith. 2004. Cross-modal circuitry between auditory and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multisensory convergence. Cerebral Cortex 14:387–403. Engel, S.A., D.E. Rumelhart, B.A. Wandell et al. 1994. fMRI of human visual cortex. Nature 369:525. Engelien, A., D. Silbersweig, E. Stern et al. 1995. The functional anatomy of recovery from auditory agnosia. A PET study of sound categorization in a neurological patient and normal controls. Brain 118(Pt 6):1395–409. Ernst, M.O., and H.H. Bülthoff. 2004. Merging the senses into a robust percept. Trends in Cognitive Science 8:162–9. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience 22:5749–59. Formisano, E., D.S. Kim, F. Di Salle et al. 2003. Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40:859–69. Foxe, J.J., and C.E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical processing. Neuroreport 16:419–23. Foxe, J.J., G.R. Wylie, A. Martinez et al. 2002. Auditory–somatosensory multisensory processing in auditory association cortex: An fMRI study. Journal of Neurophysiology 88:540–3. Fullerton, B.C., and D.N. Pandya. 2007. Architectonic analysis of the auditory-related areas of the superior temporal region in human brain. Journal of Comparative Neurology 504:470–98. Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience 28:4457–69. Ghazanfar, A.A., and N.K. Logothetis. 2003. Neuroperception: Facial expressions linked to monkey calls. Nature 423:937–8. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10:278–85. Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12.
Multisensory Influences on Auditory Processing
111
Griffiths, T.D., A. Rees, C. Witton et al. 1997. Spatial and temporal auditory processing deficits following right hemisphere infarction. A psychophysical study. Brain 120(Pt 5):785–94. Habib, M., G. Daquin, L. Milandre et al. 1995. Mutism and auditory agnosia due to bilateral insular damage— role of the insula in human communication. Neuropsychologia 33:327–39. Hackett, T.A., I. Stepniewska, and J.H. Kaas. 1998. Subdivisions of auditory cortex and ipsilateral cortical connections of the parabelt auditory cortex in macaque monkeys. Journal of Comparative Neurology 394:475–95. Hall, D.A. 2003. Auditory pathways: Are ‘what’ and ‘where’ appropriate? Current Biology 13:R406–8. Hershenson, M. 1962. Reaction time as a measure of intersensory facilitation. Journal of Experimental Psychology 63:289–93. Jones, E.G., and T.P. Powell. 1970. An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93:793–820. Juergens, E., A. Guettler, and R. Eckhorn. 1999. Visual stimulation elicits locked and induced gamma oscillations in monkey intracortical- and EEG-potentials, but not in human EEG. Experimental Brain Research 129:247–59. Kaas, J.H., and T.A. Hackett. 2000. Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences of the United States of America 97:11793–9. Kayser, C., and N.K. Logothetis. 2007. Do early sensory cortices integrate cross-modal information? Brain Structure and Function 212:121–32. Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48:373–84. Kayser, C., C.I. Petkov, M. Augath, and N.K. Logothetis. 2007. Functional imaging reveals visual modulation of specific fields in auditory cortex. Journal of Neuroscience 27:1824–35. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–74. Kayser, C., N. Logothetis, and S. Panzeri. 2009a. Visual enhancement of the information representation in auditory cortex. Current Biology (in press). Kayser, C., M.A. Montemurro, N. Logothetis, and S. Panzeri. 2009b. Spike-phase coding boosts and stabilizes the information carried by spatial and temporal spike patterns. Neuron 61:597–608. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2009c. Multisensory interactions in primate auditory cortex: fMRI and electrophysiology. Hearing Research (in press). doi:10.1016/j.heares.2009.02.011. Kosaki, H., T. Hashikawa, J. He, and E.G. Jones. 1997. Tonotopic organization of auditory cortical fields delineated by parvalbumin immunoreactivity in macaque monkeys. Journal of Comparative Neurology 386:304–16. Kotz, S.A., M. Meyer, K. Alter et al. 2003. On the lateralization of emotional prosody: An event-related functional MR investigation. Brain and Language 86:366–76. Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–92. Laurienti, P.J., T.J. Perrault, T.R. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research 166:289–97. Lauritzen, M. 2005. Reading vascular changes in brain imaging: Is dendritic calcium the key? Nature Neuroscience Reviews 6(1):77–85. Lehmann, C., M. Herdener, F. Esposito et al. 2006. Differential patterns of multisensory interactions in core and belt areas of human auditory cortex. NeuroImage 31:294–300. Leopold, D.A. 2009. Neuroscience: Pre-emptive blood flow. Nature 457:387–8. Logothetis, N.K. 2002. The neural basis of the blood-oxygen-level-dependent functional magnetic resonance imaging signal. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 357:1003–37. Logothetis, N.K. 2008. What we can do and what we cannot do with fMRI. Nature 453:869–78. Logothetis, N.K., H. Guggenberger, S. Peled, and J. Pauls. 1999. Functional imaging of the monkey brain. Nature Neuroscience 2:555–62. Martuzzi, R., M.M. Murray, C.M. Michel et al. 2006. Multisensory interactions within human primary cortices revealed by BOLD dynamics. Cerebral Cortex 17:1672–9. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–8. Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex. Neuroreport 20:126–31. Merzenich, M.M., and J.F. Brugge. 1973. Representation of the cochlear partition of the superior temporal plane of the macaque monkey. Brain Research 50:275–96.
112
The Neural Bases of Multisensory Processes
Meyer, M., K. Alter, A.D. Friederici, G. Lohmann, and D.Y. Von Cramon. 2002. FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Human Brain Mapping 17:73–88. Mitzdorf, U. 1985. Current source-density method and application in cat cerebral cortex: Investigation of evoked potentials and EEG phenomena. Physiological Reviews 65:37–100. Montessori, M. 1967. The Absorbent Mind. New York: Henry Holt & Co. Morel, A., P.E. Garraghty, and J.H. Kaas. 1993. Tonotopic organization, architectonic fields, and connections of auditory cortex in macaque monkeys. Journal of Comparative Neurology 335:437–59. Oakland, T., J.L. Black, G. Stanford, N.L. Nussbaum, and R.R. Balise. 1998. An evaluation of the dyslexia training program: A multisensory method for promoting reading in students with reading disabilities. Journal of Learning Disabilities 31:140–7. Pekkola, J., V. Ojanen, T. Autti et al. 2005. Attention to visual speech gestures enhances hemodynamic activity in the left planum temporale. Human Brain Mapping 27:471–7. Perrodin, C., C. Kayser, N. Logothetis, and C. Petkov. 2009a. Visual influences on voice-selective neurons in the anterior superior-temporal plane. International Conference on Auditory Cortex. Madgeburg, Germany, 2009. Perrodin, C., L. Veit, C. Kayser, N.K. Logothetis, and C.I. Petkov. 2009b. Encoding properties of neurons sensitive to species-specific vocalizations in the anterior temporal lobe of primates. International Conference on Auditory Cortex. Madgeburg, Germany, 2009. Petkov, C.I., C. Kayser, M. Augath, and N.K. Logothetis. 2006. Functional imaging reveals numerous fields in the monkey auditory cortex. PLoS Biology 4:e215. Petkov, C.I., C. Kayser, T. Steudel et al. 2008. A voice region in the monkey brain. Nature Neuroscience 11:367–74. Petkov, C.I., C. Kayser, M. Augath, and N.K. Logothetis. 2009. Optimizing the imaging of the monkey auditory cortex: Sparse vs. continuous fMRI. Magnetic Resonance Imaging 27:1065–73. Petrini, K., M. Russell, and F. Pollick. 2009. When knowing can replace seeing in audiovisual integration of actions. Cognition 110:432–9. Rauschecker, J.P. 1998. Cortical processing of complex sounds. Current Opinion in Neurobiology 8:516–21. Rauschecker, J.P., and B. Tian. 2000. Mechanisms and streams for processing of what and where in auditory cortex. Proceedings of the National Academy of Sciences of the United States of America 97:11800–6. Rauschecker, J.P., and B. Tian. 2004. Processing of band-passed noise in the lateral auditory belt cortex of the rhesus monkey. Journal of Neurophysiology 91:2578–89. Rauschecker, J.P., B. Tian, and M. Hauser. 1995. Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268:111–4. Rauschecker, J.P., B. Tian, T. Pons, and M. Mishkin. 1997. Serial and parallel processing in rhesus monkey auditory cortex. Journal of Comparative Neurology 382:89–103. Recanzone, G.H., D.C. Guard, and M.L. Phan. 2000. Frequency and intensity response properties of single neurons in the auditory cortex of the behaving macaque monkey. Journal of Neurophysiology 83:2315–31. Remedios, R., N.K. Logothetis, and C. Kayser. 2009. An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. Journal of Neuroscience 29:1034–45. Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology 50:19–26. Romanski, L.M., B. Tian, J. Fritz et al. 1999. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience 2:1131–6. Romanski, L.M., B.B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. Journal of Neurophysiology 93:734–47. Ross, L.A., D. Saint-Amour, V.M. Leavitt, D.C. Javitt, and J.J. Foxe. 2007. Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex 17: 1147–53. Rumsey, J.M., B. Horwitz, B.C. Donohue et al. 1997. Phonological and orthographic components of word recognition. A PET-rCBF study. Brain 120(Pt 5):739–59. Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Research. Cognitive Brain Research 14:187–98. Schroeder, C.E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Current Opinion in Neurobiology 15:454–8. Schroeder, C.E., R.W. Lindsley, C. Specht et al. 2001. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85:1322–7. Schroeder, C.E., J. Smiley, K.G. Fu et al. 2003. Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing. International Journal of Psychophysiology 50:5–17.
Multisensory Influences on Auditory Processing
113
Schurmann, M., G. Caetano, Y. Hlushchuk, V. Jousmaki, and R. Hari. 2006. Touch activates human auditory cortex. NeuroImage 30:1325–31. Shams, L., and A.R. Seitz. 2008. Benefits of multisensory learning. Trends in Cognitive Sciences 12:411–7. Stein, B.E. 1998. Neural mechanisms for synthesizing sensory information and producing adaptive behaviors. Experimental Brain Research 123:124–35. Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience 9:255–66. Stein, B.E., and M.A. Meredith. 1993. Merging of the Senses. Cambridge: MIT Press. Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–47. Sumby, W.H., and I. Polack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26:212–5. Talavage, T.M., M.I. Sereno, J.R. Melcher et al. 2004. Tonotopic organization in human auditory cortex revealed by progressions of frequency sensitivity. Journal of Neurophysiology 91:1282–96. Tian, B., D. Reser, A. Durham, A. Kustov, and J.P. Rauschecker. 2001. Functional specialization in rhesus monkey auditory cortex. Science 292:290–3. van Atteveldt, N., E. Formisano, R. Goebel, and L. Blomert. 2004. Integration of letters and speech sounds in the human brain. Neuron 43:271–82. van Wassenhove, V., K.W. Grant, and D. Poeppel. 2005. Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America 102:1181–6. von Kriegstein, K., and A.L. Giraud. 2006. Implicit multisensory associations influence voice recognition. PLoS Biol 4:e326. von Kriegstein, K., E. Eger, A. Kleinschmidt, and A.L. Giraud. 2003. Modulation of neural responses to speech by directing attention to voices or verbal content. Brain Research. Cognitive Brain Research 17:48–55. von Kriegstein, K., A. Kleinschmidt, and A.L. Giraud. 2006. Voice recognition and cross-modal responses to familiar speakers’ voices in prosopagnosia. Cerebral Cortex 16:1314–22. Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences of the United States of America 101:2167–72. Wang, X. 2000. On cortical coding of vocal communication sounds in primates. Proceedings of the National Academy of Sciences of the United States of America 97:11843–9. Warnking, J., M. Dojat, A. Guerin-Dugue et al. 2002. fMRI retinotopic mapping—step by step. NeuroImage 17:1665–83. Wessinger, C.M., J. Vanmeter, B. Tian et al. 2001. Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. Journal of Cognitive Neuroscience 13:1–7. Zatorre, R.J., A.C. Evans, and E. Meyer. 1994. Neural mechanisms underlying melodic perception and memory for pitch. Journal of Neuroscience 14:1908–19.
7
Multisensory Integration through Neural Coherence Andreas K. Engel, Daniel Senkowski, and Till R. Schneider
CONTENTS 7.1 Introduction........................................................................................................................... 115 7.2 Views on Cross-Modal Integration........................................................................................ 116 7.2.1 Integration by Convergence....................................................................................... 116 7.2.2 Integration through Neural Coherence...................................................................... 117 7.3 Oscillatory Activity in Cross-Modal Processing................................................................... 117 7.3.1 Oscillations Triggered by Multisensory Stimuli....................................................... 117 7.3.2 Effects of Cross-Modal Semantic Matching on Oscillatory Activity....................... 119 7.3.3 Modulation of Cross-Modal Oscillatory Responses by Attention............................ 119 7.3.4 Percept-Related Multisensory Oscillations................................................................ 121 7.4 Functional Role of Neural Synchrony for Cross-Modal Interactions.................................... 123 7.5 Outlook.................................................................................................................................. 125 References....................................................................................................................................... 126
7.1 INTRODUCTION The inputs delivered by different sensory organs provide us with complementary information about the environment. Constantly, multisensory interactions occur in the brain to evaluate cross-modal matching or conflict of such signals. The outcome of these interactions is of critical importance for perception, cognitive processing, and the control of action (Meredith and Stein 1983, 1985; Stein and Meredith 1993; Macaluso and Driver 2005; Kayser and Logothetis 2007). Recent studies have revealed that a vast amount of cortical operations, including those carried out by primary regions, are shaped by inputs from multiple sensory modalities (Amedi et al. 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007, 2009). Multisensory integration is highly automatized and can even occur when there is no meaningful relationship between the different sensory inputs and even under conditions with no perceptual awareness, as demonstrated in pioneering research on multisensory interactions in the superior colliculus of anesthetized cats (Meredith and Stein 1983, 1985; Stein and Meredith 1993; Stein et al. 2002). Clearly, these findings suggest the fundamental importance of multisensory processing for development (Sur et al. 1990; Shimojo and Shams 2001; Bavelier and Neville 2002) and normal functioning of the nervous system. In recent years, an increasing number of studies has aimed at characterizing multisensory cortical regions, revealing multisensory processing in the superior temporal sulcus, the intraparietal sulcus, frontal regions as well as the insula and claustrum (Calvert 2001; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007). Interestingly, there is increasing evidence that neurons in areas formerly considered unimodal, such as auditory belt areas (Foxe et al. 2002; Kayser et al. 2005; Macaluso and Driver 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007), can also exhibit multisensory characteristics. Furthermore, numerous subcortical structures are involved in multisensory processing. In addition to the superior colliculus (Meredith and Stein 1983, 1985), 115
116
The Neural Bases of Multisensory Processes
this includes the striatum (Nagy et al. 2006), the cerebellum (Baumann and Greenlee 2007), the amygdala (Nishijo et al. 1988), and there is evidence for cross-modal interactions at the level of the thalamus (Komura et al. 2005). Whereas the ubiquity and fundamental relevance of multisensory processing have become increasingly clear, the neural mechanisms underlying multisensory interaction are much less well understood. In this chapter, we review recent studies that may cast new light on this issue. Although classical studies have postulated a feedforward convergence of unimodal signals as the primary mechanism for multisensory integration (Stein and Meredith 1993; Meredith 2002), there is now evidence that both feedback and lateral interaction may also be relevant (Driver and Spence 2000; Foxe and Schroeder 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007). Beyond this changing view on the anatomical substrate, there is increasing awareness that complex dynamic interactions of cell populations, leading to coherent oscillatory firing patterns, may be crucial for mediating cross-systems integration in the brain (von der Malsburg and Schneider 1986; Singer and Gray 1995; Singer 1999; Engel et al. 1992, 2001; Varela et al. 2001; Herrmann et al. 2004a; Fries 2005). Here, we will consider the hypothesis that synchronized oscillations may also provide a potential mechanism for cross-modal integration and for the selection of information that is coherent across different sensory channels. We will (1) contrast the two different views on cross-modal integration that imply different mechanisms (feedforward convergence vs. neural coherence), (2) review recent studies on oscillatory responses and cross-modal processing, and (3) discuss functional aspects and scenarios for the involvement of neural coherence in cross-modal interaction.
7.2 VIEWS ON CROSS-MODAL INTEGRATION 7.2.1 Integration by Convergence The classical view posits that multisensory integration occurs in a hierarchical manner by progressive convergence of pathways and, thus, sensory signals are integrated only in higher association areas and in specialized subcortical regions (Stein and Meredith 1993; Meredith 2002). A core assumption of this approach is that the neural representation of an object is primarily reflected in a firing rate code. Multisensory integration, accordingly, is expressed by firing rate changes in neurons or neural populations receiving convergent inputs from different modalities. A frequently used approach to investigate multisensory processing at the level of single neurons is the comparison of spike rate in response to multisensory stimuli with the firing rate observed when presenting the most effective of these stimuli alone (Meredith and Stein 1983, 1985; Stein and Meredith 1993; Stein et al. 2002). In more recent studies, an approach in which the neuronal responses to multisensory inputs are directly compared with the algebraic sum of the neuronal responses to the unisensory constituents has been applied (Rowland et al. 2007; Stanford et al. 2005). In this approach, multisensory responses that are larger than the sum of the unisensory responses are referred to as superadditive, whereas multisensory responses that are smaller are classified as subadditive. A large body of evidence demonstrates such multisensory response patterns in a wide set of brain regions (Calvert 2001; Macaluso and Driver 2005; Ghazanfar and Schroeder 2006). However, as recognized by numerous authors in recent years, a pure convergence model would probably not suffice to account for all aspects of multisensory processing (Driver and Spence 2000; Foxe and Schroeder 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007). First, strong cross-modal interactions and modulation occur in primary cortices, which is difficult to reconcile with the notion of hierarchical convergence. Second, a convergence scenario does not appear flexible enough because it does not allow for rapid recombination of cross-modal signals into completely novel percepts. Furthermore, a feedforward convergence model does not explain how low-level information about objects can remain accessible because the high-level representation is noncompositional, i.e., it does not explicitly make reference to elementary features.
Multisensory Integration through Neural Coherence
117
7.2.2 Integration through Neural Coherence A different account of multisensory interaction can be derived from data on the functional role of correlated neural activity, which is likely to play a key role for feature integration and response selection in various sensory modalities (von der Malsburg and Schneider 1986; Singer and Gray 1995; Singer 1999; Tallon-Baudry and Bertrand 1999; Engel et al. 1992, 2001; Herrmann et al. 2004a; Fries 2005). As shown by numerous studies in both animals and humans, synchronized oscillatory activity, in particular at frequencies in the gamma band (>30 Hz), is related to a large variety of cognitive and sensorimotor functions. The majority of these studies were conducted in the visual modality, relating gamma band coherence of neural assemblies to processes such as feature integration over short and long distances (Engel et al. 1991a, 1991b; Tallon-Baudry et al. 1996), surface segregation (Gray et al. 1989; Castelo-Branco et al. 2000), perceptual stimulus selection (Fries et al. 1997; Siegel et al. 2007), and attention (Müller et al. 2000; Fries et al. 2001; Siegel et al. 2008). Beyond the visual modality, gamma band synchrony has also been observed in the auditory (Brosch et al. 2002; Debener et al. 2003), somatosensory (Bauer et al. 2006), and olfactory systems (Bressler and Freeman 1980; Wehr and Laurent 1996). Moreover, gamma band synchrony has been implicated in processes such as sensorimotor integration (Roelfsema et al. 1997; Womelsdorf et al. 2006), movement preparation (Sanes and Donoghue 1993; Farmer 1998) or memory formation (Csicsvari et al. 2003; Gruber and Müller 2005; Herrmann et al. 2004b). Collectively, these data provide strong support for the hypothesis that synchronization of neural signals is a key mechanism for integrating and selecting information in distributed networks. This so-called “temporal correlation hypothesis” (Singer and Gray 1995; Singer 1999; Engel et al. 2001) predicts that coherence of neural signals allows to set up highly specific patterns of effective neuronal coupling, thus enabling the flexible and context-dependent binding, the selection of relevant information, and the efficient routing of signals through processing pathways (Salinas and Sejnowski 2001; Fries 2005; Womelsdorf et al. 2007). Based on experimental evidence discussed in the subsequent sections, we suggest that the same mechanism may also serve to establish specific relationships across different modalities, allowing cross-modal interactions of sensory inputs and the preferential routing of matching cross-modal information to downstream assemblies (Senkowski et al. 2008). We would like to note that this view does not contradict the notion that cross-modal interactions have strong effects on neuronal firing rates, but it shifts emphasis to considering a richer dynamic repertoire of neural interactions and a more flexible scenario of cross-modal communication in the brain.
7.3 OSCILLATORY ACTIVITY IN CROSS-MODAL PROCESSING A variety of different paradigms have been used to study the role of oscillatory responses and neural coherence during multisensory processing. Most studies have been performed in humans using electroencephalography (EEG) or magnetoencephalography (MEG), whereas only few animal studies are available. The approaches used address different aspects of multisensory processing, including (1) bottom-up processing of multisensory information, (2) cross-modal semantic matching, (3) modulation by top-down attention, as well as (4) cross-modally induced perceptual changes. In all these approaches, specific changes in oscillatory responses or coherence of neural activity have been observed, suggesting that temporally patterned neural signals may be relevant for more than just one type of multisensory interaction.
7.3.1 Oscillations Triggered by Multisensory Stimuli The first attempt to investigate neural synchronization of oscillatory responses in the human EEG was the comparison of phase coherence patterns for multiple pairs of electrodes during the presentation of auditory and visual object names, as well as pictures of objects (von Stein et al. 1999). Under
118
The Neural Bases of Multisensory Processes
conditions of passive stimulation (i.e., subjects were not required to perform any task), the authors reported an increase of phase coherence in the lower beta band between temporal and parietal electrode sites. The authors therefore suggested that meaningful semantic inputs are processed in a modality-independent network of temporal and parietal areas. Additional evidence for the involvement of oscillatory beta responses in multisensory processing comes from a study in which subjects were instructed to respond to the appearance of any stimulus in a stream of semantically meaningless auditory, visual, and multisensory audiovisual stimuli (Senkowski et al. 2006). In the cross-modal condition, an enhancement was observed for evoked oscillations, i.e., early oscillatory activity that is phase-locked to stimulus onset. This integration effect, which specifically occurred in the beta band, predicted the shortening of reaction times observed for multisensory audiovisual stimuli, suggesting an involvement of beta activity in the multisensory processing of behaviorally relevant stimuli. Cross-modal effects on evoked beta responses have been also reported in a sensory gating paradigm (Kisley and Cornwell 2006), in which auditory and somatosensory stimuli were presented at short or long interstimulus intervals under conditions of passive stimulation. Higher auditory and somatosensory evoked beta responses were found when the preceding stimulus was from the other compared to when it was from the same modality, suggesting a cross-modal gating effect on the oscillatory activity in this frequency range. Further EEG investigations have focused on the examination of oscillatory activity in response to basic auditory, visual, and audiovisual stimuli during passive stimulation (Sakowitz et al. 2000, 2001, 2005). In these studies, multisensory interactions were found in evoked oscillatory responses across a wide range of frequencies and across various scalp sites, indicating an involvement of neural synchronization of cell assemblies in different frequency bands and brain regions. Compelling evidence for an association between oscillatory responses and multisensory processing comes from a recent study on somatosensory modulation of processing in primary auditory cortex of alert monkeys (Lakatos et al. 2007). The authors investigated the effect of median nerve stimulation on auditory responses and observed a pronounced augmentation of oscillations in the delta, theta, and gamma frequency ranges. Further analysis revealed that this effect was mainly due to a phase resetting of auditory oscillations by the somatosensory inputs. Another intriguing observation in the same study was that systematic variation of the relative delay between somatosensory and auditory inputs lead to multisensory response enhancements at intervals corresponding to the cycle length of gamma, theta, and delta band oscillations. In contrast, for intermediate delays, the paired stimulus response was smaller than the responses to auditory stimuli alone. Further support for phase resetting as a potential mechanism of cross-modal interaction comes from a recent study focusing on visual modulation of auditory processing in the monkey (Kayser et al. 2008). Using auditory and visual stimuli while recording in the auditory core and belt regions of awake behaving monkeys, the authors observed both enhancement and suppression of unit and field potential responses. Importantly, visual stimuli could be shown to modulate the phase angle of auditory alpha and theta band activity. Two recent studies have addressed interactions between auditory and multisensory regions in the superior temporal sulcus in behaving monkeys. One of the studies examined the effect of audiovisual looming signals on neural oscillations in the two regions (Maier et al. 2008). The main finding of this study was enhanced gamma band coherence between the two structures for cross-modally coherent looming signals compared to unimodal or receding motion inputs. This suggests that coupling of neuronal populations between primary sensory areas and higher-order multisensory structures may be functionally relevant for the integration of audiovisual signals. In a recent study, Kayser and Logothetis (2009) have investigated directed interactions between auditory cortex and multisensory sites in the superior temporal sulcus. Their analysis, which was confined to frequencies below the gamma band, suggests that superior temporal regions provide one major source of visual influences to the auditory cortex and that the beta band is involved in directed information flow through coupled oscillations.
Multisensory Integration through Neural Coherence
119
In line with other studies (Foxe et al. 2002; Kayser et al. 2005; Ghazanfar and Schroeder 2006; Kayser and Logothetis 2007), these data support the notion that inputs from other modalities and from multisensory association regions can shape, in a context-dependent manner, the processing of stimuli in presumed unimodal cortices. Taken together, the findings discussed above suggest that modulation of both the power and the phase of oscillatory activity could be important mechanisms of cross-modal interaction.
7.3.2 Effects of Cross-Modal Semantic Matching on Oscillatory Activity In addition to spatial and temporal congruency (Stein and Meredith 1993), an important factor influencing cross-modal integration is the semantic matching of information across sensory channels. A recent study has addressed this issue during audiovisual processing in an object recognition task, in which sounds of animals were presented in combination with a picture of either the same or a different animal. Larger gamma band activity (GBA) was observed for semantically congruent compared to semantically incongruent audiovisual stimuli (Yuval-Greenberg and Deouell 2007). We have recently been able to obtain similar results using a visual-to-auditory semantic priming paradigm (Schneider et al. 2008), in which we also observed stronger GBA for trials with cross-modal semantic congruence as compared to incongruent trials (Figure 7.1). Source localization using the method of “linear beamforming” revealed that the matching operation presumably reflected in the GBA involves multisensory regions in the left lateral temporal cortex (Schneider et al. 2008). In line with these results, we have recently observed an enhanced GBA for the matching of visual and auditory inputs in working memory in a visual-to-auditory object-matching paradigm (Senkowski et al. 2009). The effect of multisensory matching of meaningful stimuli on oscillatory activity has also been the subject of studies that have used socially important stimuli such as faces and voices. Exploiting the interesting case of synchronous versus asynchronous audiovisual speech (Doesburg et al. 2007), changes in phase synchrony were shown to occur in a transiently activated gamma oscillatory network. Gamma band phase-locking values were increased for asynchronous as compared to synchronous speech between frontal and left posterior sensors, whereas gamma band amplitude showed an enhancement for synchronous compared to asynchronous speech at long latencies after stimulus onset. A more complex pattern of multisensory interactions between faces and voices of conspecifics has been recently observed in the superior temporal sulcus of macaques (Chandrasekaran and Ghazanfar 2009). Importantly, this study demonstrates that faces and voices elicit distinct bands of activity in the theta, alpha, and gamma frequency ranges in the superior temporal sulcus, and moreover, that these frequency band activities show differential patterns of cross-modal integration effects. The relationship between the early evoked auditory GBA and multisensory processing has also been investigated in an audiovisual symbol-to-sound-matching paradigm (Widmann et al. 2007). An enhanced left-frontally distributed evoked GBA and later parietally distributed induced (i.e., non–phase locked) GBA were found for auditory stimuli that matched the elements of a visual pattern compared to auditory inputs that did not match the visual pattern. In another study, the role of neural synchronization between visual and sensorimotor cortex has been examined in a multisensory matching task in which tactile Braille stimuli and visual dot patterns had to be compared (Hummel and Gerloff 2005). In trials in which subjects performed well compared to trials in which they performed poorly, this study revealed an enhancement of phase coherence in the alpha band between occipital and lateral central regions, whereas no significant effects could be found in other frequency bands. In summary, the available studies suggest that cross-modal matching may be reflected in both local and long-range changes of neural coherence.
7.3.3 Modulation of Cross-Modal Oscillatory Responses by Attention One of the key functions of attention is to enhance perceptual salience and reduce stimulus ambiguity. Behavioral, electrophysiological, and functional imaging studies have shown that attention plays
120
The Neural Bases of Multisensory Processes
(a) Congruent
Visual sheep
Auditory baaa
sheep
ring
Incongruent
Fixation
S1
500 ms
ISI
400 ms
1000 ms
Incongruent
Response
400 ms
10
Frequency (Hz)
(b) Congruent 100 90 80 70 60 50 40 30 –200 0 200 400 600 800 Time (ms)
S2
Difference
4
(%)
–200 0
200 400 600 800 Time (ms)
0
(%)
–200 0
200 400 600 800 Time (ms)
0
(c)
x = –52
y = –32
z = –8
FIGURE 7.1 Enhanced gamma band activity during semantic cross-modal matching. (a) Semantically congruent and incongruent objects were presented in a cross-modal visual-to-auditory priming paradigm. (b) GBA in response to auditory target stimuli (S2) was enhanced following congruent compared to incongruent stimuli. Square in right panel indicates a time-frequency window in which GBA difference was significant. (c) Source localization of GBA (40–50 Hz) between 120 and 180 ms after auditory stimulus onset (S2) using “linear beamforming” method (threshold at z = 2.56). Differences between congruent and incongruent conditions are prominent in left medial temporal gyrus (BA 21; arrow). This suggests that enhanced GBA reflects cross-modal semantic matching processes in lateral temporal cortex. (Adapted with permission from Schneider, T.R. et al., NeuroImage, 42, 1244–1254, 2008.)
an important role in multisensory processing (Driver and Spence 2000; Macaluso et al. 2000; Foxe et al. 2005; Talsma and Woldorff 2005). The effect of spatial selective attention on GBA in a multisensory setting has recently been investigated (Senkowski et al. 2005). Subjects were presented with a stream of auditory, visual, and combined audiovisual stimuli to the left and right hemispaces and had to attend to a designated side to detect occasional target stimuli in either modality. An
Multisensory Integration through Neural Coherence
121
enhancement of the evoked GBA was found for attended compared to unattended multisensory stimuli. In contrast, no effect of spatial attention was observed for unimodal stimuli. An additional analysis of the gamma band phase distribution suggested that attention primarily acts to enhance GBA phase-locking, compatible with the idea already discussed above that cross-modal interactions can affect the phase of neural signals. The effects of nonspatial intermodal attention and the temporal relation between auditory and visual inputs on the early evoked GBA have been investigated in another EEG study (Senkowski et al. 2007). Subjects were presented with a continuous stream of centrally presented unimodal and bimodal stimuli while they were instructed to detect an occasional auditory or visual target. Using combined auditory and visual stimuli with different onset delays revealed clear effects on the evoked GBA. Although there were no significant differences between the two attention conditions, an enhancement of the GBA was observed when auditory and visual inputs of multisensory stimuli were presented simultaneously (i.e., 0 ± 25 ms; Figure 7.2). This suggests that the integration of auditory and visual inputs, as reflected in high-frequency oscillatory activity, is sensitive to the relative onset timing of the sensory inputs.
7.3.4 Percept-Related Multisensory Oscillations A powerful approach to study cross-modal integration is the use of physically identical multisensory events that can lead to different percepts across trials. A well-known example for this approach is the sound-induced visual flash illusion that exploits the effect that a single flash of light accompanied by rapidly presented auditory beeps is often perceived as multiple flashes (Shams et al. 2000). This illusion allows the direct comparison of neural responses to illusory trials (i.e., when more than one flash is perceived) with nonillusory trials (i.e., when a single flash is perceived), whereas keeping the physical parameters of the presented stimuli constant. In an early attempt to study GBA during the sound-induced visual flash illusion, an increase in induced GBA was observed over occipital sites in an early (around 100 ms) and a late time window (around 450 ms) for illusory but not for nonillusory trials (Bhattacharya et al. 2002). Confirming these data, a more recent study has also observed enhanced induced GBA over occipital areas around 130 and 220 ms for illusory compared to nonillusory trials (Mishra et al. 2007). Using a modified version of the McGurk effect, the link between induced GBA and illusory perception during audiovisual speech processing has been addressed in MEG investigations (Kaiser et al. 2005, 2006). In the McGurk illusion, an auditory phoneme is dubbed onto a video showing an incongruent lip movement, which often leads to an illusory auditory percept (McGurk and McDonald 1976). Exploiting this cross-modal effect, an enhanced GBA was observed in epochs in which an illusory auditory percept was induced by a visual deviant within a continuous stream of multisensory audiovisual speech stimuli (Kaiser et al. 2005). Remarkably, the topography of this effect was comparable with the frontal topography of a GBA enhancement obtained in an auditory mismatch study (Kaiser et al. 2002), suggesting that the GBA effect in the McGurk illusion study may represent a perceived auditory pattern change caused by the visual lip movement. Moreover, across subjects, the amplitude of induced GBA over the occipital cortex and the degree of the illusory acoustic change were closely correlated, suggesting that the induced GBA in early visual areas may be directly related to the generation of the illusory auditory percept (Kaiser et al. 2006). Further evidence for a link of gamma band oscillations to illusory cross-modal perception comes from a study on the rubber hand illusion (Kanayama et al. 2007). In this study, a rubber hand was placed atop of a box in which the subject’s own hand was located. In such a setting, subjects can have the illusory impression that a tactile input presented to one of their fingers actually stimulated the rubber hand. Interestingly, there was a strong effect of cross-modal congruence of the stimulation site. Stronger induced GBA and phase synchrony between distant electrodes occurred when a visual stimulus was presented nearby the finger of the rubber hand that corresponded to the subject’s
122
The Neural Bases of Multisensory Processes
(a)
(b)
Experimental setup
Stimulus asynchrony A
A|V(–100±25)
V
A|V(–50±25)
V
A|V(0±25)
V
A|V(50±25)
V V –100 –50 0 50 100 Time (ms)
Frequency (Hz)
(c)
80
Auditory only
A|V(0±25)
A|V(100±25) Timing subranges
Difference
0.2
0.08
70 60 (µV)
(µV)
50 40 30
50–100ms
20 –100
0
100
200 300 400 Time (ms)
–100
0
100
200 300 400 Time (ms)
0
–100
0
100
200 300 400 Time (ms)
0
0.15
0.06
(µV)
(µV)
0
0
FIGURE 7.2 Effect of relative timing of multisensory stimuli on gamma band oscillations. (a) Horizontal gratings and sinusoidal tones were presented with different stimulus onset asynchronies. (b) GBA to auditory and visual components of multisensory audiovisual stimuli were extracted for five asynchrony ranges centered about –100, –50, 0, +50, and +100 ms delay between visual and auditory stimulus, respectively. GBA evoked with multisensory inputs was compared to GBA to unisensory control stimuli. (c) An enhancement of evoked GBA compared to unimodal input was observed when auditory and visual inputs were presented with smallest relative onset asynchrony window (0 ± 25 ms). This shows that precision of temporal synchrony has an effect on early cross-modal processing as reflected by evoked GBA. (Adapted with permission from Senkowski, D. et al., Neuropsychologia, 45, 561–571, 2007.)
finger receiving a tactile stimulus, as compared to a spatial cross-modal misalignment. This finding suggests a close relationship between multisensory tactile–visual stimulation and phase coherence in gamma band oscillations. In sum, the findings discussed in this section suggest that oscillatory activity, in particular at gamma band frequencies, can reflect perceptual changes resulting from cross-modal interactions.
Multisensory Integration through Neural Coherence
123
7.4 FUNCTIONAL ROLE OF NEURAL SYNCHRONY FOR CROSS-MODAL INTERACTIONS The data available support the hypothesis that synchronization of oscillatory responses plays a role for multisensory processing (Senkowski et al. 2008). They consistently show that multisensory interactions are accompanied by condition-specific changes in oscillatory responses which often, albeit not always, occur in the gamma band (Sakowitz et al. 2000, 2001, 2005; Bhattacharya et al. 2002; Kaiser et al. 2005, 2006; Senkowski et al. 2005, 2006, 2007; Lakatos et al. 2007; Mishra et al. 2007; Kanayama et al. 2007; Doesburg et al. 2007; Widmann et al. 2007; Schneider et al. 2008). Interpreting these effects observed in EEG or MEG signals, it is likely that they result not only from changes in oscillatory power, but also from altered phase coherence in the underlying neuronal populations. Several of the studies reviewed above have addressed this directly, providing evidence that coherence of neural signals across cortical areas may be a crucial mechanism involved in multimodal processing (von Stein et al. 1999; Hummel and Gerloff 2005; Doesburg et al. 2007; Maier et al. 2008; Kayser and Logothetis 2009). Theoretical arguments suggest that coherent oscillatory signals may be well-suited to serve crossmodal integration. It has been argued that synchronization of neural activity may help to cope with binding problems that occur in distributed architectures (von der Malsburg and Schneider 1986; Singer and Gray 1995; Engel et al. 1992, 2001; Singer 1999). Clearly, multisensory processing poses binding problems in at least two respects (Foxe and Schroeder 2005): information must be integrated across different neural systems; moreover, real-world scenes comprise multiple objects, creating the need for segregating unrelated neural signals within processing modules while, at the same time, selectively coordinating signals across channels in the correct combination. It seems unlikely that such complex coordination could be achieved by anatomical connections alone because this would not provide sufficient flexibility to cope with a fast-changing multisensory world. In contrast, establishment of relations between signals by neural coherence may provide both the required flexibility and selectivity because transient phase-locking of oscillatory signals allows for the dynamic modulation of effective connectivity between spatially distributed neuronal populations (König et al. 1995; Salinas and Sejnowski 2001; Fries 2005; Womelsdorf et al. 2007). If neural coherence indeed supports multisensory integration, a number of scenarios seem possible regarding the interaction of “lower-order” and “higher-order” regions. The studies reviewed above demonstrate the effects of multisensory interactions on oscillatory responses at multiple levels, including primary sensory areas (Kaiser et al. 2006; Kayser et al. 2008; Lakatos et al. 2007) as well as higher-order multimodal and frontal areas (Kaiser et al. 2005; Senkowski et al. 2006), suggesting that coherent neural activity might play a role for both “early” and “late” integration of multisensory signals. However, the available data do not yet allow to conclusively decide which interaction patterns are most plausibly involved and, likely, these will also depend on the nature of the task and the stimuli. Using the case of audiovisual interactions, a number of hypothetical scenarios are schematically depicted in Figure 7.3. The simplest scenario predicts that during multisensory interactions, neural synchronization changes between early sensory areas. An alternative possibility is that changes in neural coherence or power occur mainly within cell assemblies of multisensory association cortices like, e.g., superior temporal regions. More complex scenarios would result from a combination of these patterns. For instance, changes in neural synchrony among unimodal regions could also be associated with enhanced oscillatory activity in multisensory areas. This could result from reentrant bottom-up and top-down interactions between unimodal and multimodal cortices. In addition, changes in multisensory perception will often also involve frontal regions, which might exert a modulatory influence on temporal patterns in multisensory parietotemporal regions through oscillatory coupling. Most likely, at least for multisensory processing in naturalistic environments, these interactions will combine into a highly complex pattern involving the frontal cortex, temporoparietal regions as well as unimodal cortices and presumably also subcortical structures.
124
The Neural Bases of Multisensory Processes (a)
Multisensory parietal cortex
(b)
Auditory cortex
Visual cortex
Multisensory parietal cortex
(c)
Multisensory temporal cortex
(d)
Multisensory parietal cortex
Premotor cortex
Prefrontal cortex Auditory cortex
Visual Multisensory cortex temporal cortex
Multisensory temporal cortex
FIGURE 7.3 Scenarios for large-scale neural communication during cross-modal perception. The model proposed here is compatible with a number of different patterns of neural interactions. The figure refers to the case of audiovisual interactions. (a) Multisensory interactions by coherence change between early sensory areas. (b) Alternatively, changes in neural coherence or power might occur mainly within or between multisensory association cortices, e.g., superior temporal and parietal regions. (c) Combining both scenarios, neural synchrony among unimodal regions could also be associated with enhanced oscillatory activity in multisensory areas. (d) Multisensory perception might also involve oscillatory activity in frontal regions, which is likely to exert a modulatory influence on temporal patterns in parietal and temporal regions.
Exploiting coherent oscillations as a potential mechanism would be compatible with various modes, or outcomes, of cross-modal interaction. An important case is the integration of spatially or semantically matching cross-modal signals. Congruent multisensory information would lead, very likely, to coherent activation of neurons processing sensory inputs from different modalities. This, in turn, will lead to stronger activation of cells in multisensory temporal, parietal, or frontal regions that receive input from such a synchronized assembly (Figure 7.3). Thus, cross-modal coherence might provide a plausible mechanism to implement the binding of features across different sensory pathways. In addition, cross-modal integration may be considerably facilitated by top-down influences from higher-order regions (Engel et al. 2001; Herrmann et al. 2004a). During the processing of natural multimodal scenes or semantically complex cross-modal information, such top-down influences might express a dynamic “prediction” (Engel et al. 2001) about expected multisensory inputs. In case of a match with newly arriving sensory inputs, “resonance” is likely to occur, which would augment and accelerate the processing and selection of matching multisensory information (Widmann et al. 2007; Schneider et al. 2008; Senkowski et al. 2009). The mechanism postulated here may also account for the processing of conflicting cross-modal information. In this case, the mismatching of spatiotemporal phase patterns would presumably lead to competition between different assemblies and a winner-take-all scenario (Fries et al. 2007). Evidence from work in the visual system suggests that such a competition would lead to an augmentation of temporal coherence in the dominant assembly, but a weakening of the temporal binding in other assemblies (Roelfsema et al. 1994; Fries et al. 1997, 2001). Because synchronized signals are particularly efficient in driving downstream cell populations (König et al. 1996; Womelsdorf
Multisensory Integration through Neural Coherence
125
et al. 2007) and in modulating synaptic weights (Markram et al. 1997; Bi and Poo 2001), such a mechanism would then lead to a selection of strongly synchronized populations and suppression of decorrelated activity. A third case may be cross-modal modulation, i.e., the bias of a percept by concurrent input from a different sensory modality. The model suggested here predicts that the inputs from the second modality can change the temporal structure of activity patterns in the first modality. One possible mechanism for such a modulation by oscillatory inputs is suggested by studies discussed above (Lakatos et al. 2007; Kayser et al. 2008). Both “lateral” interactions between assemblies in early areas as well as top-down influences could lead to a shift in phase of the respective local oscillations, thus entraining the local population into a temporal pattern that may be optimally suited to enhance the effect on downstream assemblies. The prediction is that this phase resetting or phase shifting should be maximally effective in case of spatial, temporal, or semantic matching cross-modal information. Such a mechanism might help to explain why cross-modal context can often lead to biases in the processing of information in one particular sensory system and might contribute to understanding the nature of “early” multisensory integration (Foxe and Schroeder 2005). Because such modulatory effects might occur on a range of time scales (defined by different frequency bands in oscillatory activity), this mechanism may also account for broader temporal integration windows that have been reported for multisensory interactions (Vroomen and Keetels 2010). Finally, our hypothesis might also help to account for key features of multisensory processing such as the superadditivity or subadditivity of responses (Stein and Meredith 1993; Meredith 2002) and the principle of “inverse effectiveness” (Kayser and Logothetis 2007). Because of nonlinear dendritic processing, appropriately timed inputs will generate a much stronger postsynaptic response in target neuronal populations than temporally uncoordinated afferent signals (König et al. 1996; Singer 1999; Fries 2005) and, therefore, matching cross-modal inputs can have an impact that differs strongly from the sum of the unimodal responses. Conversely, incongruent signals from two modalities might result in temporally desynchronized inputs and, therefore, in “multisensory depression” in downstream neural populations (Stein et al. 2002).
7.5 OUTLOOK Although partially supported by data, the hypothesis that neural synchrony may play a role in multisensory processing clearly requires further experimental testing. Thus far, only a relatively small number of multisensory studies have used coherence measures to explicitly address interactions across different neural systems. Very likely, substantial progress can be achieved by studies in humans if the approaches are suitable to capture dynamic cross-systems interactions among specific brain regions. Such investigations may be carried out using MEG (Gross et al. 2001; Siegel et al. 2007, 2008), combination of EEG with functional magnetic resonance imaging (Debener et al. 2006) or intracerebral multisite recordings (Lachaux et al. 2003), if the recordings are combined with advanced source modeling techniques (Van Veen et al. 1997) and analysis methods that quantify, e.g., directed information transfer between the activated regions (Supp et al. 2007). In addition, some of the earlier EEG studies on multisensory oscillations involving visual stimuli (e.g., Yuval-Greenberg and Deouell 2007) seem to be confounded by artifacts relating to microsaccades (Yuval-Greenberg et al. 2008), a methodological issue that needs to be clarified and possibly can be avoided by using MEG (Fries et al. 2008). To characterize the role of correlated activity for multisensory processing at the cellular level, further microelectrode studies in higher mammals will be indispensable. The model put forward here has several implications. We believe that the study of synchronization phenomena may lead to a new view on multisensory processing that considers the dynamic interplay of neural populations as a key to cross-modal integration and stipulates the development of new research approaches and experimental strategies. Conversely, the investigation of multisensory interactions may also provide a crucial test bed for further validation of the temporal correlation
126
The Neural Bases of Multisensory Processes
hypothesis (Engel et al. 1992; Singer and Gray 1995; Singer 1995), because task- or percept-related changes in coherence between independent neural sources have hardly been shown in humans thus far. In this context, the role of oscillations in different frequency bands is yet another unexplored issue that future studies will have to address. As discussed above, multisensory effects are often, but not exclusively, observed in higher frequency ranges, and it is unclear why gamma band oscillations figure so prominently. Finally, abnormal synchronization across sensory channels may play a role in conditions of abnormal cross-modal perception such as synesthesia (Hubbard and Ramachandran 2005) or in disorders such as schizophrenia or autism. In synesthesia, excessively strong multisensory coherence might occur, which then would not just modulate processing in unimodal regions but actually drive sensory neurons even in the absence of a proper stimulus. In contrast, abnormal weakness of cross-modal coupling might account for the impairment of multisensory integration that is observed in patients with schizophrenia (Ross et al. 2007) or autism (Iarocci and McDonald 2006). Thus, research on cross-modal binding may help to advance our understanding of brain disorders that partly result from dysfunctional integrative mechanisms (Schnitzler and Gross 2005; Uhlhaas and Singer 2006).
REFERENCES Amedi, A., K. von Kriegstein, N.M. van Atteveldt, M.S. Beauchamp, M.J. Naumer. 2005. Functional imaging of human crossmodal identification and object recognition. Experimental Brain Research 166:559–571. Bauer, M., R. Oostenveld, M. Peeters, P. Fries. 2006. Tactile spatial attention enhances gamma-band activity in somatosensory cortex and reduces low-frequency activity in parieto-occipital areas. Journal of Neuroscience 26:490–501. Baumann, O., and M.W. Greenlee. 2007. Neural correlates of coherent audiovisual motion perception. Cerebral Cortex 17:1433–1443. Bavelier, D., and H.J. Neville. 2002. Cross-modal plasticity: Where and how? Nature Reviews. Neuroscience 3:443–452. Bhattacharya, J., L. Shams, S. Shimojo. 2002. Sound-induced illusory flash perception: Role of gamma band responses. Neuroreport 13:1727–1730. Bi, G.-Q., and M.-M. Poo. 2001. Synaptic modification by correlated activity: Hebb’s postulate revisited. Annual Review of Neuroscience 24:139–166. Bressler, S.L., and W.J. Freeman. 1980. Frequency analysis of olfactory system EEG in cat, rabbit, and rat. Electroencephalography and Clinical Neurophysiology 50:19–24. Brosch, M., E. Budinger, H. Scheich. 2002. Stimulus-related gamma oscillations in primate auditory cortex. Journal of Neurophysiology 87:2715–2725. Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex 11:1110–1123. Castelo-Branco, M., R. Goebel, S. Neuenschwander, W. Singer. 2000. Neural synchrony correlates with surface segregation rules. Nature 405:685–689. Csicsvari, J., B. Jamieson, K.D. Wise, G. Buzsaki. 2003. Mechanisms of gamma oscillations in the hippocampus of the behaving rat. Neuron 37:311–322. Chandrasekaran, C., and A.A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. Journal of Neurophysiology 101:773–788. Debener, S., C.S. Herrmann, C. Kranczioch, D. Gembris, A.K. Engel. 2003. Top-down attentional processing enhances auditory evoked gamma band activity. Neuroreport 14:683–686. Debener, S., M. Ullsperger, M. Siegel, A.K. Engel. 2006. Single-trial EEG-fMRI reveals the dynamics of cognitive function. Trends in Cognitive Sciences 10:558–563. Doesburg, S.M., L.L. Emberson, A. Rahi, D. Cameron, L.M. Ward. 2007. Asynchrony from synchrony: Long-range gamma-band neural synchrony accompanies perception of audiovisual speech asynchrony. Experimental Brain Research 185:11–20. Driver, J., and C. Spence. 2000. Multisensory perception: Beyond modularity and convergence. Current Biology 10:R731–R735. Engel, A.K., P. König, W. Singer. 1991a. Direct physiological evidence for scene segmentation by temporal coding. Proceedings of the National Academy of Sciences of the United States of America 88:9136–9140.
Multisensory Integration through Neural Coherence
127
Engel, A.K., P. König, A.K. Kreiter, Singer, W. 1991b. Interhemispheric synchronization of oscillatory neuronal responses in cat visual cortex. Science 252:1177–1179. Engel, A.K., P. König, A.K. Kreiter, T.B. Schillen, W. Singer. 1992. Temporal coding in the visual cortex: New vistas on integration in the nervous system. Trends in Neurosciences 15:218–226. Engel, A.K., P. Fries, W. Singer. 2001. Dynamic predictions: Oscillations and synchrony in top-down processing. Nature Reviews. Neuroscience 2:704–716. Farmer, S.F. 1998. Rhythmicity, synchronization and binding in human and primate motor systems. Journal of Physiology 509:3–14. Foxe, J.J., and C.E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical processing. Neuroreport 16:419–423. Foxe, J.J., G.R. Wylie, A. Martinez et al. 2002. Auditory-somatosensory multisensory processing in auditory association cortex: An fMRI study. Journal of Neurophysiology 88:540–543. Foxe, J.J., G.V. Simpson, S.P. Ahlfors, C.D. Saron. 2005. Biasing the brain’s attentional set: I. cue driven deployments of intersensory selective attention. Experimental Brain Research 166:370–392. Fries, P. 2005. A mechanism for cognitive dynamics: Neuronal communication through neuronal coherence. Trends in Cognitive Sciences 9:474–480. Fries, P., P.R. Roelfsema, A.K. Engel, P. König, W. Singer. 1997. Synchronization of oscillatory responses in visual cortex correlates with perception in interocular rivalry. Proceedings of the National Academy of Sciences of the United States of America 94:12699–12704. Fries, P., S. Neuenschwander, A.K. Engel, R. Goebel, W. Singer. 2001. Modulation of oscillatory neuronal synchronization by selective visual attention. Science 291:1560–1563. Fries, P., D. Nikolic, W. Singer. 2007. The gamma cycle. Trends in Neurosciences 30:309–316. Fries, P., R. Scheeringa, R. Oostenveld. 2008. Finding gamma. Neuron 58:303–305. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10:278–285. Gray, C.M., P. König, A.K. Engel, W. Singer. 1989. Oscillatory responses in cat visual cortex exhibit intercolumnar synchronization which reflects global stimulus properties. Nature 338:334–337. Gross, J., J. Kujala, M. Hamalainen et al. 2001. Dynamic imaging of coherent sources: Studying neural interactions in the human brain. Proceedings of the National Academy of Sciences of the United States of America 98:694–699. Gruber, T., and M.M. Müller. 2005. Oscillatory brain activity dissociates between associative stimulus content in a repetition priming task in the human EEG. Cerebral Cortex 15:109–116. Herrmann, C.S., M.H. Munk, A.K. Engel. 2004a. Cognitive functions of gamma-band activity: Memory match and utilization. Trends in Cognitive Sciences 8:347–355. Herrmann, C.S., D. Lenz, S. Junge, N.A. Busch, B. Maess. 2004b. Memory-matches evoke human gammaresponses. BMC Neuroscience 5:13. Hubbard, E.M., and V.S. Ramachandran. 2005. Neurocognitive mechanisms of synesthesia. Neuron 48: 509–520. Hummel, F., and C. Gerloff. 2005. Larger interregional synchrony is associated with greater behavioral success in a complex sensory integration task in humans. Cerebral Cortex 15:670–678. Iarocci, G., and J. McDonald. 2006. Sensory integration and the perceptual experience of persons with autism. Journal of Autism and Developmental Disorders 36:77–90. Kaiser, J., W. Lutzenberger, H. Ackermann, N. Birbaumer. 2002. Dynamics of gamma-band activity induced by auditory pattern changes in humans. Cerebral Cortex 12:212–221. Kaiser, J., I. Hertrich, H. Ackermann, K. Mathiak, W. Lutzenberger. 2005. Hearing lips: Gamma-band activity during audiovisual speech perception. Cerebral Cortex 15:646–653. Kaiser, J., I. Hertrich, W. Ackermann, W. Lutzenberger. 2006. Gamma-band activity over early sensory areas predicts detection of changes in audiovisual speech stimuli. NeuroImage 30:1376–1382. Kanayama, N., A. Sato, H. Ohira. 2007. Crossmodal effect with rubber hand illusion and gamma-band activity. Psychophysiology 44:392–402. Kayser, C., and N.K. Logothetis. 2007. Do early sensory cortices integrate crossmodal information? Brain Structure and Function 212:121–132. Kayser, C., and N.K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices and their role in sensory integration. Frontiers in Integrative Neuroscience 3:7. Kayser, C., C.I. Petkov, M. Augath, N.K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48:373–384. Kayser, C., C.I. Petkov, N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–1574.
128
The Neural Bases of Multisensory Processes
Kisley, M.A., and Z.M. Cornwell. 2006. Gamma and beta neural activity evoked during a sensory gating paradigm: Effects of auditory, somatosensory and cross-modal stimulation. Clinical Neurophysiology 117:2549–2563. Komura, Y., R. Tamura, T. Uwano, H. Nishijo, T. Ono. 2005. Auditory thalamus integrates visual inputs into behavioral gains. Nature Neuroscience 8:1203–1209. König, P., A.K. Engel, W. Singer. 1995. Relation between oscillatory activity and long-range synchronization in cat visual cortex. Proceedings of the National Academy of Sciences of the United States of America 92:290–294. König, P., A.K. Engel, W. Singer. 1996. Integrator or coincidence detector? The role of the cortical neuron revisited. Trends in Neurosciences 19:130–137. Lachaux, J.P., D. Rudrauf, P. Kahane. 2003. Intracranial EEG and human brain mapping. Journal of Physiology 97:613–628. Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, C.E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–292. Macaluso, E., and J. Driver. 2005. Multisensory spatial interactions: A window onto functional integration in the human brain. Trends in Neurosciences 28:264–271. Macaluso, E., C.D. Frith, J. Driver. 2000. Modulation of human visual cortex by crossmodal spatial attention. Science 289:1206–1208. Maier, J.X., C. Chandrasekaran, A.A. Ghazanfar. 2008. Integration of bimodal looming signals through neuronal coherence in the temporal lobe. Current Biology 18:963–968. Markram, H., J. Lübke, M. Frotscher, B. Sakmann. 1997. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Nature 275:213–215. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:746–748. Meredith, M.A. 2002. On the neuronal basis for multisensory convergence: A brief overview. Brain Research. Cognitive Brain Research 14:31–40. Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221:389–391. Meredith, M.A., and B.E. Stein. 1985. Descending efferents from the superior colliculus relay integrated multisensory information. Science 227:657–659. Mishra, J., A. Martinez, T.J. Sejnowski, S.A. Hillyard. 2007. Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. Journal of Neuroscience 27:4120–4131. Müller, M.M., T. Gruber, A. Keil. 2000. Modulation of induced gamma band activity in the human EEG by attention and visual information processing. International Journal of Psychophysiology 38:283–299. Nagy, A., G. Eördegh, Z. Paroczy, Z. Markus, G. Benedek. 2006. Multisensory integration in the basal ganglia. European Journal of Neuroscience 24:917–924. Nishijo, H., T. Ono, H. Nishino. 1988. Topographic distribution of modality-specific amygdalar neurons in alert monkey. Journal of Neuroscience 8:3556–3569. Roelfsema, P.R., P. König, A.K. Engel, R. Sireteanu, W. Singer. 1994. Reduced synchronization in the visual cortex of cats with strabismic amblyopia. European Journal of Neuroscience 6:1645–1655. Roelfsema, P.R., A.K. Engel, P. König, W. Singer. 1997. Visuomotor integration is associated with zero timelag synchronization among cortical areas. Nature 385:157–161. Ross, L.A., D. Saint-Amour, V.M. Leavitt, S. Molholm, D.C. Javitt, J.J. Foxe. 2007. Impaired multisensory processing in schizophrenia: Deficits in the visual enhancement of speech comprehension under noisy environmental conditions. Schizophrenia Research 97:173–183. Rowland B.A., S. Quessy, T.R. Stanford, B.E. Stein. 2007. Multisensory integration shortens physiological response latencies. Journal of Neuroscience 27:5879–5884. Sakowitz, O.W., M. Schürmann, E. Basar. 2000. Oscillatory frontal theta responses are increased upon bisensory stimulation. Clinical Neurophysiology 111:884–893. Sakowitz, O.W., R.Q. Quiroga, M. Schürmann, E. Basar. 2001. Bisensory stimulation increases gammaresponses over multiple cortical regions. Brain Research. Cognitive Brain Research. 11:267–279. Sakowitz, O.W., R.Q. Quiroga, M. Schürmann, E. Basar. 2005. Spatio-temporal frequency characteristics of intersensory components in audiovisual evoked potentials. Brain Research. Cognitive Brain Research 23:316–326. Salinas, E., and T.J. Sejnowski. 2001. Correlated neuronal activity and the flow of neural information. Nature Reviews Neuroscience 2:539–550. Sanes, J.N., and J.P. Donoghue. 1993. Oscillations in local field potentials of the primate motor cortex during voluntary movement. Proceedings of the National Academy of Sciences of the United States of America 90:4470–4474.
Multisensory Integration through Neural Coherence
129
Schnitzler, A., and J. Gross. 2005. Normal and pathological oscillatory communication in the brain. Nature Reviews Neuroscience 6:285–296. Schneider, T.R., S. Debener, R. Oostenveld, A.K. Engel. 2008. Enhanced EEG gamma-band activity reflects multisensory semantic matching in visual-to-auditory object priming. NeuroImage 42:1244–1254. Senkowski, D., D. Talsma, C.S. Herrmann, M.G. Woldorff. 2005. Multisensory processing and oscillatory gamma responses: Effects of spatial selective attention. Experimental Brain Research 3–4:411–426. Senkowski, D., S. Molholm, M. Gomez-Ramirez, J.J. Foxe. 2006. Oscillatory beta activity predicts response speed during a multisensory audiovisual reaction time task: A high-density electrical mapping study. Cerebral Cortex 16:1556–1565. Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, M.G. Woldorff. 2007. Good times for multisensory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations. Neuropsychologia 45:561–571. Senkowski, D., T.R. Schneider, J.J. Foxe, A.K. Engel. 2008. Crossmodal binding through neural coherence: Implications for multisensory processing. Trends in Neurosciences 31:401–409. Senkowski, D., T.R. Schneider, R. Tandler, A.K. Engel. 2009. Gamma-band activity reflects multisensory matching in working memory. Experimental Brain Research 198:363–372. Shams, L., Y. Kamitani, S. Shimojo. 2000. Illusions. What you see is what you hear. Nature 408:788. Shimojo, S., and L. Shams. 2001. Sensory modalities are not separate modalities: Plasticity and interactions. Current Opinion in Neurobiology 11:505–509. Siegel, M., T.H. Donner, R. Oostenveld, P. Fries, A.K. Engel. 2007. High-frequency activity in human visual cortex is modulated by visual motion strength. Cerebral Cortex 17:732–741. Siegel, M., T.H. Donner, R. Oostenveld, P. Fries, A.K. Engel. 2008. Neuronal synchronization along the dorsal visual pathway reflects the focus of spatial attention. Neuron 60:709–719. Singer, W. 1999. Neuronal synchrony: A versatile code for the definition of relations? Neuron 24:49–65. Singer, W., and C.M. Gray. 1995. Visual feature integration and the temporal correlation hypothesis. Annual Review of Neuroscience 18:555–586. Stanford, T.R., S. Quessy, B.E. Stein. 2005. Evaluating the operations underlying multisensory integration in the cat superior colliculus. Journal of Neuroscience 25:6499–6508. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., M.W. Wallace, T.R. Stanford, W. Jiang. 2002. Cortex governs multisensory integration in the midbrain. Neuroscientist 8:306–314. Supp, G.G., A. Schlögl, N. Trujillo-Barreto, M.M. Müller, T. Gruber. 2007. Directed cortical information flow during human object recognition: Analyzing induced EEG gamma-band responses in brain’s source space. PLoS ONE 2:e684. Sur, M., S.L. Pallas, A.W. Roe. 1990. Cross-modal plasticity in cortical development: Differentiation and specification of sensory neocortex. Trends in Neurosciences 13:227–233. Tallon-Baudry, C., and O. Bertrand. 1999. Oscillatory gamma activity in humans and its role in object representation. Trends in Cognitive Sciences 3:151–162. Tallon-Baudry, C., O. Bertrand, C. Delpuech, J. Pernier. 1996. Stimulus specificity of phase-locked and nonphase-locked 40 Hz visual responses in human. Journal of Neuroscience 16:4240–4249. Talsma, D., and M.G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of effects on the evoked brain activity. Journal of Cognitive Neuroscience 17:1098–1114. Uhlhaas, P.J., and W. Singer. 2006. Neural synchrony in brain disorders: Relevance for cognitive dysfunctions and pathophysiology. Neuron 52:155–168. Van Veen, B.D., W. van Drongelen, M. Yuchtman, A. Suzuki. 1997. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Transactions on Bio-Medical Engineering 44:867–880. Varela, F., J.P. Lachaux, E. Rodriguez, J. Martinerie. 2001. The brainweb: Phase synchronization and largescale integration. Nature Reviews. Neuroscience 2:229–239. von der Malsburg, C., and W. Schneider. 1986. A neural cocktail-party processor. Biological Cybernetics 54:29–40. von Stein, A., P. Rappelsberger, J. Sarnthein, H. Petsche. 1999. Synchronization between temporal and parietal cortex during multimodal object processing in man. Cerebral Cortex 9:137–150. Vroomen, J., and M. Keetels. 2010. Perception of intersensory synchrony: A tutorial review. Attention, Perception, & Psychophysics 72:871–884. Wehr, M., and G. Laurent. 1996. Odour encoding by temporal sequences of firing in oscillating neural assemblies. Nature 384:162–166.
130
The Neural Bases of Multisensory Processes
Widmann, A., T. Gruber, T. Kujala, M. Tervaniemi, E. Schröger. 2007. Binding symbols and sounds: Evidence from event-related oscillatory gamma-band activity. Cerebral Cortex 17:2696–2702. Womelsdorf, T., P. Fries, P.P. Mitra, R. Desimone. 2006. Gamma-band synchronization in visual cortex predicts speed of change detection. Nature 439:733–736. Womelsdorf, T., J.M. Schoffelen, R. Oostenveld et al. 2007. Modulation of neuronal interactions through neuronal synchronization. Science 316:1609–1612. Yuval-Greenberg, S., and L.Y. Deouell. 2007. What you see is not (always) what you hear: Induced gamma band responses reflect cross-modal interactions in familiar object recognition. Journal of Neuroscience 27:1090–1096. Yuval-Greenberg, S., O. Tomer, A.S. Keren, I. Nelken, L.Y. Deouell. 2008. Transient induced gamma-band response in EEG as a manifestation of miniature saccades. Neuron 58:429–441.
8
The Use of fMRI to Assess Multisensory Integration Thomas W. James and Ryan A. Stevenson
CONTENTS 8.1 Principles of Multisensory Enhancement.............................................................................. 131 8.2 Superadditivity and BOLD fMRI.......................................................................................... 133 8.3 Problems with Additive Criterion.......................................................................................... 134 8.4 Inverse Effectiveness............................................................................................................. 136 8.5 BOLD Baseline: When Zero Is Not Zero.............................................................................. 138 8.6 A Difference-of-BOLD Measure........................................................................................... 139 8.7 Limitations and Future Directions........................................................................................ 143 8.8 Conclusions............................................................................................................................ 144 Acknowledgments........................................................................................................................... 144 References....................................................................................................................................... 145
Although scientists have only recently had the tools available to noninvasively study the neural mechanisms of multisensory perceptual processes in humans (Calvert et al. 1999), the study of multisensory perception has had a long history in science (James 1890; Molyneux 1688). Before the advent of neuroimaging techniques, such as functional magnetic resonance imaging (fMRI) and high-density electrical recording, the study of neural mechanisms, using single-unit recording, was restricted to nonhuman animals such as monkeys and cats. These groundbreaking neurophysiological studies established many principles for understanding multisensory processing at the level of single neurons (Meredith and Stein 1983), and continue to improve our understanding of multisensory mechanisms at that level (Stein and Stanford 2008). It is tempting to consider that neuroimaging measurements, like blood oxygenation level– dependent (BOLD) activation measured with fMRI, are directly comparable with findings from single-unit recordings. Although several studies have established clear links between BOLD activation and neural activity (Attwell and Iadecola 2002; Logothetis and Wandell 2004; Thompson et al. 2003), there remains a fundamental difference between BOLD activation and single-unit activity: BOLD activation is measured from the vasculature supplying a heterogeneous population of neurons, whereas single-unit measures are taken from individual neurons (Scannell and Young 1999). The ramifications of this difference are not inconsequential because the principles of multisensory phenomena established using single-unit recording may not apply to population-based neuroimaging data (Calvert et al. 2000). The established principles must be tested theoretically and empirically, and where they fail, they must be replaced with new principles that are specific to the new technique.
8.1 PRINCIPLES OF MULTISENSORY ENHANCEMENT Although the definitions of unisensory and multisensory neurons may seem intuitive, for clarity, we will define three different types of neurons that are found in multisensory brain regions. The first class of neurons is unisensory. They produce significant neural activity (measured as an increase in 131
132
The Neural Bases of Multisensory Processes
spike count above spontaneous baseline) with only one modality of sensory input, and this response is not modulated by concurrent input from any other sensory modality. The second class of neurons is bimodal (or trimodal). They produce significant neural activity with two or more modalities of unisensory input (Meredith and Stein 1983; Stein and Stanford 2008). With single-unit recording, bimodal neurons can be identified by testing their response with unisensory stimuli from two different sensory modalities. The premise is simple: if the neuron produces significant activity with both modalities, then it is bimodal. However, bimodal activation only implies a convergence of sensory inputs, not the integration of those inputs (Stein et al. 2009). Bimodal neurons can be further tested for multisensory integration by using multisensory stimuli. When tested with a multisensory stimulus, most bimodal neurons produce activity that is greater than the maximum activity produced with either unisensory stimulus or multisensory enhancement. The criterion usually used to identify multisensory enhancement is called the maximum criterion or rule (AV > Max(A,V)). A minority of neurons produce activity that is lower than the maximum criterion, which is considered multisensory suppression. Whether the effect is enhancement or suppression, a change in activity of a neuron when the subject is stimulated through a second sensory channel only occurs if those sensory channels interact. Thus, multisensory enhancement and suppression are indicators that information is being integrated. The third class of neurons is subthreshold. They have patterns of activity that look unisensory when they are tested with only unisensory stimuli, but when tested with multisensory stimuli, show multisensory enhancement (Allman and Meredith 2007; Allman et al. 2008; Meredith and Allman 2009). For example, a subthreshold neuron may produce significant activity with visual stimuli, but not with auditory stimuli. Because it does not respond significantly with both, it cannot be classified as bimodal. However, when tested with combined audiovisual stimuli, the neuron shows multisensory enhancement and thus integration. For graphical representations of each of these three classes of neurons, see Figure 8.1.
A
Impulse counts
Unisensory
Impulse counts
Unisensory neurons Unisensory auditory
Unisensory visual
A
V AV Input modality
V AV Input modality
A
V AV Input modality
Impulse counts
Impulse counts
V AV Input modality
Bimodal supressed
Bimodal superadditive
A
V AV Input modality
Subthreshold neurons
Subthreshold auditory
A
V AV Input modality
Impulse counts
A
Impulse counts
Impulse counts
Multisensory
Bimodal neurons Bimodal enhanced
Subthreshold Visual
A
V AV Input modality
FIGURE 8.1 Activity profiles of neurons found in multisensory brain regions.
The Use of fMRI to Assess Multisensory Integration
133
A majority of bimodal and subthreshold neurons show multisensory enhancement (i.e., exceed the maximum criterion when stimulated with a multisensory stimulus); however, neurons that show multisensory enhancement can be further subdivided into those that are superadditive and those that are subadditive. Superadditive neurons show multisensory activity that exceeds a criterion that is greater than the sum of the unisensory activities (AV > Sum(A,V); Stein and Meredith 1993). In the case of subthreshold neurons, neural activity is only elicited by a single unisensory modality; therefore, the criterion for superadditivity is the same as (or very similar to) the maximum criterion. However, in the case of bimodal neurons, the criterion for superadditivity is usually much greater than the maximum criterion. Thus, superadditive bimodal neurons can show extreme levels of multisensory enhancement. Although bimodal neurons that are superadditive are, by definition, multisensory (because they must also exceed the maximum criterion), the majority of multisensory enhancing neurons are not superadditive (Alvarado et al. 2007; Perrault et al. 2003; Stanford et al. 2007). To be clear, in single-unit studies, superadditivity is not a criterion for identifying multisensory enhancement, but instead is used to classify the degree of enhancement.
8.2 SUPERADDITIVITY AND BOLD fMRI BOLD activation is measured from the vasculature that supplies blood to a heterogeneous population of neurons. When modeling (either formally or informally) the underlying activity that produces BOLD activation, it is tempting to consider that all of the neurons in that population have similar response properties. However, there is little evidence to support such an idea, especially within multisensory brain regions. Neuronal populations within multisensory brain regions contain a mixture of unisensory neurons from different sensory modalities in addition to bimodal and subthreshold multisensory neurons (Allman and Meredith 2007; Allman et al. 2008; Barraclough et al. 2005; Benevento et al. 1977; Bruce et al. 1981; Hikosaka et al. 1988; Meredith 2002; Meredith and Stein 1983, 1986; Stein and Meredith 1993; Stein and Stanford 2008). It is this mixture of neurons of different classes in multisensory brain regions that necessitates the development of new criteria for assessing multisensory interactions using BOLD fMRI. The first guideline established for studying multisensory phenomena specific to population-based BOLD fMRI measures was superadditivity (Calvert et al. 2000), which we will refer to here as the additive criterion to differentiate it from superadditivity in single units. In her original fMRI study, Calvert used audio and visual presentations of speech (talking heads) and isolated an area of the superior temporal sulcus that produced BOLD activation with a multisensory speech stimulus that was greater than the sum of the BOLD activations with the two unisensory stimuli (AV > Sum(A,V)). The use of this additive criterion was a departure from the established maximum criterion that was used in single-unit studies, but was based on two supportable premises. First, BOLD activation can be modeled as a time-invariant linear system, that is, activation produced by two stimuli presented together can be modeled by summing the activity produced by those same two stimuli presented alone (Boynton et al. 1996; Dale and Buckner 1997; Glover 1999; Heeger and Ress 2002). Second, the null hypothesis to be rejected is that the neuronal population does not contain multisensory neurons (Calvert et al. 2000, 2001; Meredith and Stein 1983). Using the additive criterion, the presence of multisensory neurons can be inferred (and the null hypothesis rejected) if activation with the multisensory stimulus exceeds the additive criterion (i.e., superadditivity). The justification for an additive criterion as the null hypothesis is illustrated in Figure 8.2. Data in Figure 8.2 are simulated based on single-unit recording statistics taken from Laurienti et al. (2005). Importantly, the data are modeled based on a brain region that does not contain multisensory neurons. A brain region that only contains unisensory neurons is not a site of integration, and therefore represents an appropriate null hypothesis. The heights of the two left bars indicate stimulated BOLD activation with unisensory auditory (A) and visual (V) stimulation. The next bar is the simulated BOLD activation with simultaneously presented auditory and visual stimuli (AV). The rightmost bar, Sum(A,V), represents the additive criterion. Assuming that the pools of
134
The Neural Bases of Multisensory Processes
BOLD response
Two-population null hypothesis A cells V cells
A
V
AV
Input modality
Max(A,V) Sum(A,V) Criterion
FIGURE 8.2 Criteria for assessing multisensory interactions in neuronal populations.
unisensory neurons respond similarly under unisensory and multisensory stimulation (otherwise they would be classified as subthreshold neurons), the modeled AV activation is the same as the additive criterion. For comparison, we include the maximum criterion (the Max(A,V) bar), which is the criterion used in single-unit recording, and sometimes used with BOLD fMRI (Beauchamp 2005; van Atteveldt et al. 2007). The maximum criterion is clearly much more liberal than the additive criterion, and the model in Figure 8.2 shows that the use of the maximum criterion with BOLD data could produce false-positives in brain regions containing only two pools of unisensory neurons and no multisensory neurons. That is, if a single voxel contained only unisensory neurons and no neurons with multisensory properties, the BOLD response will still exceed the maximum criterion. Thus, the simple model shown in Figure 8.2 demonstrates both the utility of the additive criterion for assessing multisensory interactions in populations containing a mixture of unisensory and multisensory neurons, and that the maximum criterion, which is sometimes used in place of the additive criterion, may inappropriately identify unisensory areas as multisensory. It should be noted that the utility of the additive criterion applied to BOLD fMRI data is different conceptually from the superadditivity label used with single units. The additive criterion is used to identify multisensory interactions with BOLD activation. This is analogous to maximum criterion being used to identify multisensory interactions in single-unit activity. Thus, superadditivity with single units is not analogous to the additive criterion with BOLD fMRI. The term superadditivity is used with single-unit recordings as a label to describe a subclass of neurons that not only exceeded the maximum criterion, but also the superadditivity criterion.
8.3 PROBLEMS WITH ADDITIVE CRITERION Although the additive criterion tests a more appropriate null hypothesis than the maximum criterion, in practice, the additive criterion has had only limited success. Some early studies successfully identified brain regions that met the additive criterion (Calvert et al. 2000, 2001), but subsequent studies did not find evidence for additivity even in known multisensory brain regions (Beauchamp 2005; Beauchamp et al. 2004a, 2004b; Laurienti et al. 2005; Stevenson et al. 2007). These findings prompted researchers to suggest that the additive criterion may be too strict and thus susceptible to false negatives. As such, some suggested using the more liberal maximum criterion (Beauchamp 2005), which, as shown in Figure 8.2, is susceptible to false-positives. One possible reason for the discrepancy between theory and practice was described by Laurienti et al. (2005) and is demonstrated in Figure 8.3. The values in the bottom row of the table in Figure 8.3 are simulated BOLD activation. Each column in the table is a different stimulus condition,
135
The Use of fMRI to Assess Multisensory Integration Modeled BOLD responses
2.5 AV cells A cells V cells
Unisensory input
en
Modeled responses to AV input
0.60
Su
AV :L
Ma
au ri
ive dit pe
rad
dit
AV : su
0.80
)
0.60
A,V
0.60
0.54
m(
0.60
,V )
0.60
0.49
x(A
0.60
ti
0.80
ive
0.80
ax
V
0.80
rm
A
0.80
pe
0.0
0.80
0.80
ax
0.60
0.49
0.80 0.80
AV : su
0.5
0.79
0.54
0.54
1.03
AV : ad
1.5 1.0
0.49
1.55
AV :m
BOLD response
2.0
Criterion
Neural contributions by class A
V
Max Supermax Additive Superadditive Laurienti Max(A,V) Sum(A,V)
A cells 0.60 V cells 0.00 AV cells 0.54
0.00 0.80 0.48
0.60 0.80 0.54
0.60 0.80 0.79
0.60 0.80 1.03
0.60 0.80 1.88
0.60 0.80 0.80
0.00 0.80 0.49
0.60 0.80 1.03
BOLD
1.29
1.94
2.19
2.43
2.95
2.20
1.29
2.43
1.14
FIGURE 8.3 Models of BOLD activation with multisensory stimulation.
including unisensory auditory, unisensory visual, and multisensory audiovisual. The Sum(A,V) column is simply the sum of the audio and visual BOLD signals and represents the additive criterion (null hypothesis). The audiovisual stimulus conditions were simulated using five different models, the maximum model, the supermaximum model, the additive model, the superadditive model, and the Laurienti model. The first three rows of the table represent the contributions of different classes of neurons to BOLD activation, including auditory unisensory neurons (A cells), visual unisensory neurons (V cells), and audiovisual multisensory neurons (AV cells). To be clear, the BOLD value in the bottom-most row is the sum of the A, V, and AV cell’s contributions. Summing these contributions is based on the assumption that voxels (or clusters of voxels) contain mixtures of unisensory and multisensory neurons, not a single class of neurons. Although the “contributions” have no units, they are simulated based on the statistics of recorded impulse counts (spike counts) from neurons in the superior colliculus, as reported by Laurienti et al. (2005). Unisensory neurons were explicitly modeled to respond similarly under multisensory stimulation as they did under unisensory stimulation, otherwise they would be classified as subthreshold neurons, which were not considered in the models. The five models of BOLD activation under audiovisual stimulation differed in the calculation of only one value: the contribution of the AV multisensory neurons. For the maximum model, the contribution of AV cells was calculated as the maximum of the AV cell contributions with visual and auditory unisensory stimuli. For the super-max model, the contribution of AV neurons was calculated as 150% of the AV cell contribution used for the maximum model. For the additive model, the contribution of AV cells was calculated as the sum of AV cell contributions with visual and auditory unisensory stimuli. For the superadditive model, the contribution of AV cells was calculated as 150% of the AV cell contribution used for the additive model. Finally, for the Laurienti model, the
136
The Neural Bases of Multisensory Processes
contribution of the AV cells was based on the statistics of recorded impulse counts. What the table makes clear is that, based on Laurienti’s statistics, the additive criterion is too conservative, which is consistent with what has been found in practice (Beauchamp 2005; Beauchamp et al. 2004a, 2004b; Laurienti et al. 2005; Stevenson et al. 2007). Laurienti and colleagues (2005) suggest three reasons why the simulated BOLD activation may not exceed the additive criterion based on the known neurophysiology: first, the proportion of AV neurons is small compared to unisensory neurons; second, of those multisensory neurons, only a small proportion are superadditive; and third, superadditive neurons have low impulse counts relative to other neurons. To exceed the additive criterion, the average impulse count of the pool of bimodal neurons must be significantly superadditive for population-based measurements to exceed the additive criterion. The presence of superadditive neurons in the pool is not enough by itself because those superadditive responses are averaged with other subadditive, and even suppressive, responses. According to Laurienti’s statistics, the result of this averaging is a value somewhere between maximum and additive. Thus, even though the additive criterion is appropriate because it represents the correct null hypothesis, the statistical distribution of cell and impulse counts in multisensory brain regions may make it practically intractable as a criterion.
8.4 INVERSE EFFECTIVENESS The Laurienti model is consistent with recent findings suggesting that the additive criterion is too conservative (Beauchamp 2005; Beauchamp et al. 2004a, 2004b; Laurienti et al. 2005; Stevenson et al. 2007); however, those recent studies used stimuli that were highly salient. Another established principle of multisensory single-unit recording is the law of inverse effectiveness. Effectiveness in this case refers to how well a stimulus drives the neurons in question. Multisensory neurons usually increase their proportional level of multisensory enhancement as the stimulus quality is degraded (Meredith and Stein 1986; Stein et al. 2008). That is, the multisensory gain increases as the “effec tiveness” of the stimulus decreases. If the average level of multisensory enhancement of a pool of neurons increases when stimuli are degraded, then BOLD activation could exceed the additive criterion when degraded stimuli are used. Figure 8.4 shows this effect using the simulated data from the Laurienti model (Figure 8.3). In the high stimulus quality condition, the simulated AV activation clearly does not exceed the additive criterion, indicated as Sum(A,V), and it can be seen that this is because of the subadditive Inverse effectiveness with the Laurienti model Subadditive
2.5 AV cells A cells V cells
BOLD response
2.0
0.49 0.80
1.5 1.0 0.5 0.0
Superadditive
0.80
0.54
0.60
A
0.49
0.80
V
0.40
0.80
0.54
0.60
0.60
AV
Sum(A,V)
High stimulus quality
0.13 0.42
A
0.12 0.56
V
0.56
0.12 0.56 0.13
0.42
0.42
AV
Sum(A,V)
Low stimulus quality
FIGURE 8.4 Influence of inverse effectiveness on simulated multisensory BOLD activation.
137
The Use of fMRI to Assess Multisensory Integration
contribution of the multisensory neurons. On the right in Figure 8.4, a similar situation is shown, but with less effective, degraded stimuli. In general, neurons in multisensory regions decrease their impulse counts when stimuli are less salient. However, the size of the decrease is different across different classes of neurons and different stimulus conditions (Alvarado et al. 2007). In our simulation, impulse counts of unisensory neurons were reduced by 30% from the values simulated by the Laurienti model. Impulse counts of bimodal neurons were reduced by 75% under unisensory stimulus conditions, and by 50% under multisensory stimulus conditions. This difference in reduction for bimodal neurons between unisensory and multisensory stimulus conditions reflects inverse effectiveness, that is, the multisensory gain increases with decreasing stimulus effectiveness. Using these reductions in activity with stimulus degradation, BOLD activation with the AV stimulus now exceeds the additive criterion. Admittedly, the reductions that were assigned to the different classes of neurons were chosen somewhat arbitrarily. There are definitely different combinations of reductions that would lead to AV activation that would not exceed the criterion. However, the reductions shown are based on statistics of impulse counts taken from single-unit recording data, and are consistent with the principle of inverse effectiveness reported routinely in the single-unit recording literature (Meredith and Stein 1986). Furthermore, there is empirical evidence from neuroimaging showing an increased likelihood of exceeding the additive criterion as stimulus quality is degraded (Stevenson and James 2009; Stevenson et al. 2007, 2009). Figure 8.5 compares AV activation with the additive criterion at multiple levels of stimulus quality. These are a subset of data from a study reported elsewhere (Stevenson and James 2009). Stimulus quality was degraded by parametrically varying the signal-to-noise ratio (SNR) of the stimuli until participants were able to correctly identify the stimuli at a given accuracy. This was done by embedding the audio and visual signals in constant external noise and lowering the root mean square contrast of the signals. AV activation exceeded the additive criterion at low SNR, but failed to exceed the criterion at high SNR. Although there is significant empirical and theoretical evidence suggesting that the additive criterion is too conservative at high stimulus SNR, the data presented in Figure 8.5 suggest that the additive criterion may be a better criterion at low SNR. However, there are two possible problems with using low-SNR stimuli to assess multisensory integration with BOLD fMRI. First, based on the data in Figure 8.5, the change from failing to meet the additive criterion to exceeding the additive criterion is gradual, not a sudden jump at a particular level of SNR. Thus, the choice of SNR level(s) is extremely important for the interpretation of the results. Second, there may be problems with using the additive criterion with measurements that lack a natural zero, such as BOLD.
Inverse effectiveness in BOLD
BOLD response
0.3
AV response Sum(AV) response
0.25 0.2 0.15 0.1 0.05 0
95%
85% 75% 65% Stimulus quality by % accuracy
FIGURE 8.5 Assessing inverse effectiveness empirically with BOLD activation. These are a subset of data reported elsewhere. (From Stevenson, R.A. and James, T.W., NeuroImage, 44, 1210–23, 2009. With permission.)
138
The Neural Bases of Multisensory Processes
8.5 BOLD BASELINE: WHEN ZERO IS NOT ZERO It is established procedure with fMRI data to transform raw BOLD values to percentage signal change values by subtracting the mean activation for the baseline condition and dividing by the baseline. Thus, for BOLD measurements, “zero” is not absolute, but is defined as the activation produced by the baseline condition chosen by the experimenter (Binder et al. 1999; Stark et al. 2001). Statistically, this means that BOLD measurements would be considered an interval scale at best (Stevens 1946). The use of an interval scale affects the interpretation of the additive criterion because of the fact that calculating the additive criterion is reliant on summing two unisensory activations and comparing with a single multisensory activation. Because the activation values are measured relative to an arbitrary baseline, the value of the baseline condition has a different effect on the summed unisensory activations than on the single multisensory activation. In short, the value of the baseline is subtracted from the additive criterion twice, but is subtracted from the multisensory activation only once (see Equation 8.3). The additive criterion for audiovisual stimuli is described according to the following equation:
AV > A + V
(8.1)
But, Equation 8.1 is more accurately described by AV-baseline A-baseline V-baselinne > + baseline baseline baseline
The baseline problem
620
Raw BOLD signal
600 580 560 540 520 500 480
A
V
AV Baseline 1
A
Experiment 1 0.20
V
AV Baseline 2
Experiment 2
Subadditive
% BOLD change
0.16 Superadditive
0.12 0.08 0.04 0.00
A
V
AV Sum(A,V)
A
Experiment 1
FIGURE 8.6 Influence of baseline activation on additive criterion.
V
AV Sum(A,V)
Experiment 2
(8.2)
The Use of fMRI to Assess Multisensory Integration
139
Equation 8.2 can be rewritten as
AV – baseline > A + V – 2 × baseline,
(8.3)
AV > A + V – baseline.
(8.4)
and then
Equation 8.4 clearly shows that the level of activation produced by the baseline condition influences the additive criterion. An increase in activation of the baseline condition causes the additive criterion to become more liberal (Figure 8.6). The fact that the additive criterion can be influenced by the activation of the experimenter-chosen baseline condition may explain why similar experiments from different laboratories produce different findings when that criterion is used (Beauchamp 2005).
8.6 A DIFFERENCE-OF-BOLD MEASURE We have provided a theoretical rationale for the inconsistency of the additive criterion for assessing multisensory integration using BOLD fMRI as well as a theoretical rationale for the inappropriateness of the maximum criterion as a null hypothesis for this same assessment. The maximum criterion is appropriate when used with single-unit recording data, but when used with BOLD fMRI data, which represent populations of neurons, cannot account for the contribution of unisensory neurons that are found in multisensory brain regions. Without being able to account for the heterogeneity of neuronal populations, the maximum criterion is likely to produce false-positives when used with a population-based measure such as fMRI. Although the null hypothesis tested by the additive criterion is more appropriate than the maximum criterion, the additive criterion is not without issues. First, an implicit assumption with the additive criterion is that the average multisensory neuronal response shows a pattern that is superadditive, an assumption that is clearly not substantiated empirically. Second, absolute BOLD percentage signal change measurements are measured on an interval scale. An interval scale is one with no natural zero, and on which the absolute values are not meaningful (in a statistical sense). The relative differences between absolute values, however, are meaningful, even when the absolute values are measured on an interval scale. To specifically relate relative differences to the use of an additive criterion, imagine an experiment where A, V, and AV were not levels of a sensory modality factor, but instead A, V, and AV were three separate factors, each with at least two different levels (e.g., levels of stimulus quality). Rather than analyzing the absolute BOLD values associated with each condition, a relative difference measurement could be calculated between the levels of each factor, resulting in ΔA, ΔV, and ΔAV measurements. The use of relative differences alleviates the baseline problem because the baseline activations embedded in the measurements cancel out when a difference operation is performed across levels of a factor. If we replace the absolute BOLD values in Equation 8.1 with BOLD differences, the equation becomes
ΔAV ≠ ΔA + ΔV.
(8.5)
Note that the inequality sign is different in Equation 8.5 than in Equation 8.1. Equation 8.1 is used to test the directional hypothesis that AV activation exceeds the additive criterion. Subadditivity, the hypothesis that AV activation is less than the additive criterion, is rarely, if ever, used as a criterion by itself. It has used been used in combination with superadditivity, for instance, showing that a brain region exceeds the additive criterion with semantically congruent stimuli but does not exceed the additive criterion with semantically incongruent stimuli (Calvert et al. 2000). This example (using both superadditivity and subadditivity), however, is testing two directional hypotheses, rather than testing one nondirectional hypothesis. Equation 8.5 is used to test a nondirectional hypothesis,
140
The Neural Bases of Multisensory Processes
and we suggest that it should be nondirectional for two reasons. First, the order in which the two terms are subtracted to produce each delta is arbitrary. For each delta term, if the least effective stimulus condition is subtracted from the most effective condition, then Equation 8.5 can be rewritten as ΔAV < ΔA + ΔV to test for inverse effectiveness, that is, the multisensory difference should be less than the sum of the unisensory differences. If, however, the differences were taken in the opposite direction (i.e., most effective subtracted from least effective), Equation 8.5 would need to be rewritten with the inequality in the opposite direction (i.e., ΔAV > ΔA + ΔV). Second, inverse effectiveness may not be the only meaningful effect that can be seen with difference measures, perhaps especially if the measures are used to assess function across the whole brain. This point is discussed further at the end of the chapter (Figure 8.9). Each component of Equation 8.5 can be rewritten with the baseline activation made explicit. The equation for the audio component would be
∆A=
(A
1
− baseline baseline
) − (A
2
− baseline baseline
) ,
(8.6)
where A1 and A2 represent auditory stimulus conditions with different levels of stimulus quality. When Equation 8.5 is rewritten by substituting Equation 8.6 for each of the three stimulus conditions, all baseline variables in both the denominator and the numerator cancel out, producing the following equation:
(AV1 – AV2) ≠ (A1 – A2) – (V1 – V2).
(8.7)
The key importance of Equation 8.7 is that the baseline variable cancels out when relative differences are used instead of absolute values. Thus, the level of baseline activation has no influence on a criterion calculated from BOLD differences. The null hypothesis represented by Equation 8.5 is similar to the additive criterion in that the sum of two unisensory values is compared to a multisensory value. Those values, however, are relative differences instead of absolute BOLD percentage signal changes. If the multisensory difference is less (or greater) than the additive difference criterion, one can infer an interaction between sensory channels, most likely in the form of a third pool of multisensory neurons in addition to unisensory neurons. The rationale for using additive differences is illustrated in Figure 8.7. The simulated data for the null hypothesis reflect the contributions of neurons in a brain region that contains only unisensory auditory and visual neurons (Figure 8.7a). In the top panel, the horizontal axis represents the stimulus condition, either unisensory auditory (A) or visual (V), or multisensory audiovisual (AV). The subscripts 1 and 2 represent different levels of stimulus quality. For example, A1 is high-quality audio and A2 is low-quality audio. To relate these simulated data to the data in Figure 8.2 and the absolute additive criterion, the height of the stacked bar for AV1 is the absolute additive criterion (or null hypothesis) for the high-quality stimuli, and the height of the AV2 stacked bar is the absolute additive criterion for the low-quality stimuli. Those absolute additive criteria, however, suffer from the issues discussed above. Evaluating the absolute criterion at multiple levels of stimulus quality provides the experimenter with more information than evaluating it at only one level, but a potentially better way of assessing multisensory integration is to use a criterion based on differences between the high- and low-quality stimulus conditions. The null hypothesis for this additive differences criterion is illustrated in the bottom panel of Figure 8.7a. The horizontal axis shows the difference in auditory (ΔA), visual (ΔV), and audiovisual (ΔAV) stimuli, all calculated as differences in the heights of the stacked bars in the top panel. The additive differences criterion, labeled Sum(ΔA,ΔV), is also shown, and is the same as the difference in multisensory activation (ΔAV). Thus, for a brain region containing only two pools of unisensory neurons, the appropriate null hypothesis to be tested is provided by Equation 8.5.
141
The Use of fMRI to Assess Multisensory Integration
1.20
ΔAV
1.00
0.80
0.80
ΔV
0.60 0.40 0.20 0.00
0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00
ΔA 0.60
A1
0.80 0.56
0.42
A2
V1
V2
Input condition
0.24
ΔA
0.56
0.60
0.42
0.24
ΔV Differences
0.18
ΔAV
0.24
0.18
Sum(ΔA,ΔV) Criterion
ΔAV
0.80
1.5 1.0
0.0
AV1 AV2
AV cells A cells V cells
2.0
0.5
ΔAV = Sum(ΔA,ΔV)
A cells V cells
0.18
Additive differences: three-population hypothesis 2.5
A cells V cells
% BOLD change
% BOLD change
1.40
BOLD differences
(b)
Additive differences: two-population null hypothesis
1.40 BOLD differences
(a)
1.20 1.00
ΔA
0.54 0.60
0.49
0.80
0.42
A2
V1
0.56
V2
Input condition
0.20 0.00
0.56 0.60
0.42
AV1 AV2
0.37
0.40
0.60 0.41
0.40
ΔAV < Sum(ΔA,ΔV)
AV cells A cells V cells
0.80 0.40
0.80
0.12
0.13
A1
ΔV
0.37 0.24
0.18
0.24
0.18
ΔA
ΔV
ΔAV
Differences
0.24
0.41 0.18
Sum(ΔA,ΔV) Criterion
FIGURE 8.7 Additive differences criterion.
The data in Figure 8.7b apply the additive differences criterion to the simulated BOLD activation data shown in Figure 8.4. Recall from Figure 8.4 that the average contribution of the multisensory neurons is subadditive for high-quality stimuli (A1, V1, AV1), but is superadditive with low-quality stimuli (A2, V2, AV2). In other words, the multisensory pool shows inverse effectiveness. The data in the bottom panel of Figure 8.7b are similar to the bottom panel of Figure 8.7a, but with the addition of this third pool of multisensory neurons to the population. Adding the third pool makes ΔAV (the difference in multisensory activation) significantly less than the additive differences criterion (Sum(ΔA,ΔV)), and rejects the null hypothesis of only two pools of unisensory neurons. Figure 8.8 shows the same additive differences analysis performed on the empirical data from Figure 8.5 (Stevenson and James 2009; Stevenson et al. 2009). The empirical data show the same pattern as the simulated data. With both the simulated and empirical data, ΔAV was less than Sum(ΔA,ΔV), a pattern of activation similar to inverse effectiveness seen in single units. In singleunit recording, there is a positive relation between stimulus quality and impulse count (or effectiveness). This same relation was seen between stimulus quality and BOLD activation. Although most neurons show this relation, the multisensory neurons tend to show smaller decreases (proportionately) than the unisensory neurons. Thus, as the effectiveness of the stimuli decreases, the multisensory gain increases. Decreases in stimulus quality also had a smaller effect on multisensory BOLD activation than on unisensory BOLD activation, suggesting that the results in Figure 8.8 could (but do not necessarily) reflect the influence of inversely-effective neurons. In summary, we have demonstrated some important theoretical limitations of the criteria commonly used in BOLD fMRI studies to assess multisensory integration. First, the additive criterion
142
The Neural Bases of Multisensory Processes
BOLD differences
0.12 0.1
Additive differences in BOLD ΔAV Sum(ΔA,ΔV)
0.08 0.06 0.04 0.02 0
95-85% 85-75% 75-65% Stimulus quality by % accuracy
FIGURE 8.8 Assessing multisensory interactions empirically with additive differences.
is susceptible to variations in baseline. Second, the additive criterion is sensitive only if the average activity profile of the multisensory neurons in the neuronal population is superadditive, which, empirically, only occurs with very low-quality stimuli. A combination of these two issues may explain the inconsistency in empirical findings using the additive criterion (Beauchamp 2005; Calvert et al. 2000; Stevenson et al. 2007). Third, the maximum criterion tests a null hypothesis that is based on a homogeneous population of only multisensory neurons. Existing single-unit recording data suggest that multisensory brain regions have heterogeneous populations containing unisensory, bimodal, and sometimes, subthreshold neurons. Thus, the null hypothesis tested with the maximum criterion is likely to produce false-positive results in unisensory brain regions. Possible BOLD additive-difference interactions Direct gain enhancement ΔAV > Sum(ΔA,ΔV)
BOLD activity
BOLD activity
Direct gain suppression ΔAV < Sum(ΔA,ΔV)
A V AV High quality
A V AV Low quality
A V AV High quality
Indirect gain enhancement ΔAV < Sum(ΔA,ΔV)
BOLD activity
BOLD activity
Indirect gain suppression ΔAV > Sum(ΔA,ΔV)
A V AV Low quality
A V AV High quality
A V AV Low quality
A V AV High quality
A V AV Low quality
FIGURE 8.9 A whole-brain statistical parametric map of regions demonstrating audiovisual neuronal convergence as assessed by additive differences criterion.
The Use of fMRI to Assess Multisensory Integration
143
As a potential solution to these concerns, we have developed a new criterion for assessing multisensory integration using relative BOLD differences instead of absolute BOLD measurements. Relative differences are not influenced by changes in baseline, protecting the criterion from inconsistencies across studies. The null hypothesis to be tested is the sum of unisensory differences (additive differences), which is based on the assumption of a heterogeneous population of neurons. In addition to the appropriateness of the null hypothesis tested, the additive differences criterion produced positive results in known multisensory brain regions when tested empirically (Stevenson et al. 2009). Evidence for inverse effectiveness with audiovisual stimuli was found in known multisensory brain regions such as the superior temporal gyrus and inferior parietal lobule, but also in regions that have garnered less attention from the multisensory community, such as the medial frontal gyrus and parahippocampal gyrus (Figure 8.9). These results were found across different pairings of sensory modalities and with different experimental designs, suggesting the use of additive differences may be of general use for assessing integration across sensory channels. A number of different brain regions, such as the insula and caudate nucleus, also showed an effect that appeared to be the opposite of inverse effectiveness (Figure 8.9). BOLD activation in these brain regions showed the opposite relation with stimulus quality as sensory brain regions, that is, highquality stimuli produced less activation than low-quality stimuli. Because of this opposite relation, we termed the effect observed in these regions indirect inverse effectiveness. More research will be needed to assess the contribution of indirect inverse effectiveness to multisensory neural processing and behavior.
8.7 LIMITATIONS AND FUTURE DIRECTIONS All of the simulations above made the assumption that BOLD activation could be described by a time-invariant linear system. Although there is clearly evidence supporting this assumption (Boynton et al. 1996; Dale and Buckner 1997; Glover 1999; Heeger and Ress 2002), studies using serial presentation of visual stimuli suggest that nonlinearities in BOLD activation may exist when stimuli are presented closely together in time, that is, closer than a few seconds (Boynton and Finney 2003; Friston et al. 1999). Simultaneous presentation could be considered just a serial presentation with the shortest asynchrony possible. In that case, the deviations from linearity with simultaneous presentation may be substantial. A careful examination of unisensory integration and a comparison of unisensory with multisensory integration could provide valuable insights about the linearity assumption of BOLD responses. The simulations above were also based on only one class of multisensory neuron, the bimodal neurons, which respond with two or more sensory modalities. Another class of multisensory neurons has recently been discovered, which was not used in the simulations presented here. Subthreshold neurons respond to only one sensory modality when stimulated with unisensory stimuli. However, when stimulated with multisensory stimuli, these neurons show multisensory enhancement (Allman and Meredith 2007; Allman et al. 2008; Meredith and Allman 2009). Adding this class of neurons to the simulations may increase the precision of the predictions for population models with more than two populations of neurons. The goal of the simulations presented here, however, was to develop null hypotheses based on neuronal populations composed of only two unisensory pools of neurons. Rejecting the null hypothesis then implies the presence of at least one other pool of neurons besides the unisensory pools. In our simulations, we modeled that pool as bimodal; however, we could have also modeled subthreshold neurons or a combination of bimodal and subthreshold neurons. Our impression is that the addition of subthreshold neurons to the simulations would not qualitatively change the results, because subthreshold neurons are found in relatively small numbers (less than the number of subadditive bimodal neurons), and their impulse counts are low compared to other classes of neurons (Allman and Meredith 2007). The simulations above made predictions about levels of BOLD activation, but were based on principles of multisensory processing that were largely derived from spike (action potential) count data
144
The Neural Bases of Multisensory Processes
collected using single-unit recording. BOLD activation reflects a hemodynamic response, which itself is the result of local neural activity. The exact relationship, however, between neural activity and BOLD activation is unclear. There is evidence that increased spiking produces small brief local reductions in tissue oxygenation, followed by large sustained increases in tissue oxygenation (Thompson et al. 2003). Neural spike count, however, is not the only predictor of BOLD activation levels nor is it the best predictor. The correlation of BOLD activation with local field potentials is stronger than the correlation of BOLD with spike count (Heeger et al. 2000; Heeger and Ress 2002; Logothetis and Wandell 2004). Whereas spikes reflect the output of neurons, local field potentials are thought to reflect the postsynaptic potentials or input to neurons. This distinction between input and output and the relationship with BOLD activation raises some concerns about the relating studies using BOLD fMRI to studies using single-unit recording. Of course, spike count is also highly correlated with local field potentials, suggesting that spike count, local field potentials, and BOLD activation are all interrelated and, in fact, that the correlations among them may be related to another variable that is responsible for producing all of the phenomena (Attwell and Iadecola 2002). Multisensory single-unit recordings are mostly performed in monkey and cat superior colliculus and monkey superior temporal sulcus or cat posterolateral lateral suprasylvian area (Allman and Meredith 2007; Allman et al. 2008; Barraclough et al. 2005; Benevento et al. 1977; Bruce et al. 1981; Hikosaka et al. 1988; Meredith 2002; Meredith and Stein 1983, 1986; Stein and Meredith 1993; Stein and Stanford 2008). With BOLD fMRI, whole-brain imaging is routine, which allows for exploration of the entire cortex. The principles that are derived from investigation of specific brain areas may not always apply to other areas of the brain. Thus, whole-brain investigation has the distinct promise of producing unexpected results. The unexpected results could be because of the different proportions of known classes of neurons, or the presence of other classes of multisensory neurons that have not yet been found with single-unit recording. It is possible that the indirect inverse effectiveness effect described above (Figure 8.9) may reflect the combined activity of types of multisensory neurons with response profiles that have not yet been discovered with single-unit recording.
8.8 CONCLUSIONS We must stress that each method used to investigate multisensory interactions has a unique set of limitations and assumptions, whether the method is fMRI, high-density recording, single-unit recording, behavioral reaction time, or others. Differences between methods can have a great impact on how multisensory interactions are assessed. Thus, it should not be assumed that a criterion that is empirically tested and theoretically sound when used with one method will be similarly sound when applied to another method. We have developed a method for assessing multisensory integration using BOLD fMRI that makes fewer assumptions than established methods. Because BOLD measurements have an arbitrary baseline, a criterion that is based on relative BOLD differences instead of absolute BOLD values is more interpretable and reliable. Also, the use of BOLD differences is not limited to comparing across multisensory channels, but should be equally effective when comparing across unisensory channels. Finally, it is also possible that the use of relative differences may be useful with other types of measures, such as EEG, which also use an arbitrary baseline. However, before using the additive differences criterion with other measurement methods, it should be tested both theoretically and empirically, as we have done here with BOLD fMRI.
ACKNOWLEDGMENTS This research was supported in part by the Indiana METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc., the IUB Faculty Research Support Program, and the Indiana University GPSO Research Grant. We appreciate the insights provided by Karin Harman James, Sunah Kim, and James Townsend, by other members the Perception and Neuroimaging Laboratory, and by other members of the Indiana University Neuroimaging Group.
The Use of fMRI to Assess Multisensory Integration
145
REFERENCES Allman, B.L., and M.A. Meredith. 2007. Multisensory processing in “unimodal” neurons: Cross-modal subthreshold auditory effects in cat extrastriate visual cortex. Journal of Neurophysiology 98:545–9. Allman, B.L., L.P. Keniston, and M.A. Meredith. 2008. Subthreshold auditory inputs to extrastriate visual neurons are responsive to parametric changes in stimulus quality: Sensory-specific versus non-specific coding. Brain Research 1242:95–101. Alvarado, J.C., J.W. Vaughan, T.R. Stanford, and B.E. Stein. 2007. Multisensory versus unisensory integration: Contrasting modes in the superior colliculus. Journal of Neurophysiology 97:3193–205. Attwell, D., and C. Iadecola. 2002. The neural basis of functional brain imaging signals. Trends in Neurosciences 25:621–5. Barraclough, N.E., D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience 17:377–91. Beauchamp, M.S. 2005. Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics 3:93–113. Beauchamp, M.S., B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin. 2004a. Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7:1190–2. Beauchamp, M.S., K.E. Lee, B.D. Argall, and A. Martin. 2004b. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41:809–23. Benevento, L.A., J. Fallon, B.J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57:849–72. Binder, J.R., J.A. Frost, T.A. Hammeke et al. 1999. Conceptual processing during the conscious resting state. A functional MRI study. Journal of Cognitive Neuroscience 11:80–95. Boynton, G.M., S.A. Engel, G.H. Glover, and D.J. Heeger. 1996. Linear systems analysis of functional magnetic resonance imaging in human V1. Journal of Neuroscience 16:4207–21. Boynton, G.M., and E.M. Finney. 2003. Orientation-specific adaptation in human visual cortex. The Journal of Neuroscience 23:8781–7. Bruce, C., R. Desimone, and C.G. Gross. 1981. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology 46:369–84. Calvert, G.A., M.J. Brammer, E.T. Bullmore et al. 1999. Response amplification in sensory-specific cortices during crossmodal binding. NeuroReport 10:2619–23. Calvert, G.A., R. Campbell, and M.J. Brammer. 2000. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology 10:649–57. Calvert, G.A., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage 14:427–38. Dale, A.M., and R.L. Buckner. 1997. Selective averaging of rapidly presented individual trials using fMRI. Human Brain Mapping 5:329–40. Friston, K.J., E. Zarahn, O. Josephs, R.N. Henson, and A.M. Dale. 1999. Stochastic designs in event-related fMRI. NeuroImage 10:607–19. Glover, G.H. 1999. Deconvolution of impulse response in event-related BOLD fMRI. NeuroImage 9:416–29. Heeger, D.J., A.C. Huk, W.S. Geisler, and D.G. Albrecht. 2000. Spikes versus BOLD: What does neuroimaging tell us about neuronal activity? Nature Neuroscience 3:631–3. Heeger, D.J., and D. Ress. 2002. What does fMRI tell us about neuronal activity? Nature Reviews Neuroscience 3:142–51. Hikosaka, K., E. Iwai, H. Saito, and K. Tanaka. 1988. Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. Journal of Neurophysiology 60:1615–37. James, W. 1890. The Principles of Psychology. New York: Henry Holt & Co. Laurienti, P.J., T.J. Perrault, T.R. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research 166:289–97. Logothetis, N.K., and B.A. Wandell. 2004. Interpreting the BOLD signal. Annual Review of Physiology 66:735–69. Meredith, M.A. 2002. On the neuronal basis for multisensory convergence: A brief overview. Brain Research. Cognitive Brain Research 14:31–40. Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex. NeuroReport 20:126–31.
146
The Neural Bases of Multisensory Processes
Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221:389–91. Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56:640–62. Molyneux, W. 1688. Letter to John Locke. In E.S. de Beer (ed.), The correspondence of John Locke. Oxford: Clarendon Press. Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2003. Neuron-specific response characteristics predict the magnitude of multisensory integration. Journal of Neurophysiology 90:4022–6. Scannell, J.W., and M.P. Young. 1999. Neuronal population activity and functional imaging. Proceedings of the Royal Society of London. Series B. Biological Sciences 266:875–81. Stanford, T.R., and B.E. Stein. 2007. Superadditivity in multisensory integration: Putting the computation in context. NeuroReport 18:787–92. Stark, C.E., and L.R. Squire. 2001. When zero is not zero: The problem of ambiguous baseline conditions in fMRI. Proceedings of the National Academy of Sciences of the United States of America 98:12760–6. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: The MIT Press. Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience 9:255–66. Stein, B.E., T.R. Stanford, R. Ramachandran, T.J. Perrault Jr., and B.A. Rowland. 2009. Challenges in quantifying multisensory integration: Alternative criteria, models, and inverse effectiveness. Experimental Brain Research 198:113–26. Stevens, S.S. 1946. On the theory of scales of measurement. Science 103:677–80. Stevenson, R.A., and T.W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage 44:1210–23. Stevenson, R.A., M.L. Geoghegan, and T.W. James. 2007. Superadditive BOLD activation in superior temporal sulcus with threshold non-speech objects. Experimental Brain Research 179:85–95. Stevenson, R.A., S. Kim, and T.W. James. 2009. An additive-factors design to disambiguate neuronal and areal convergence: Measuring multisensory interactions between audio, visual, and haptic sensory streams using fMRI. Experimental Brain Research 198:183–94. Thompson, J.K., M.R. Peterson, and R.D. Freeman. 2003. Single-neuron activity and tissue oxygenation in the cerebral cortex. Science 299:1070–2. van Atteveldt, N.M., E. Formisano, L. Blomert, and R. Goebel. 2007. The effect of temporal asynchrony on the multisensory integration of letters and speech sounds. Cerebral Cortex 17:962–74.
9
Perception of Synchrony between the Senses Mirjam Keetels and Jean Vroomen
CONTENTS 9.1 Introduction........................................................................................................................... 147 9.2 Measuring Intersensory Synchrony: Temporal Order Judgment Task and Simultaneity Judgment Task....................................................................................................................... 148 9.3 Point of Subjective Simultaneity............................................................................................ 150 9.3.1 Attention Affecting PSS: Prior Entry........................................................................ 151 9.4 Sensitivity for Intersensory Asynchrony............................................................................... 152 9.4.1 Spatial Disparity Affects JND................................................................................... 153 9.4.2 Stimulus Complexity Affects JND............................................................................ 154 9.4.3 Stimulus Rate Affects JND....................................................................................... 155 9.4.4 Predictability Affects JND........................................................................................ 155 9.4.5 Does Intersensory Pairing Affect JND?.................................................................... 156 9.5 How the Brain Deals with Lags between the Senses............................................................ 156 9.5.1 Window of Temporal Integration.............................................................................. 156 9.5.2 Compensation for External Factors........................................................................... 158 9.5.3 Temporal Recalibration............................................................................................. 161 9.5.4 Temporal Ventriloquism............................................................................................ 164 9.6 Temporal Synchrony: Automatic or Not?.............................................................................. 167 9.7 Neural Substrates of Temporal Synchrony............................................................................ 169 9.8 Conclusions............................................................................................................................ 170 References....................................................................................................................................... 171
9.1 INTRODUCTION Most of our real-world perceptual experiences are specified by synchronous redundant and/or complementary multisensory perceptual attributes. As an example, a talker can be heard and seen at the same time, and as a result, we typically have access to multiple features across the different senses (i.e., lip movements, facial expression, pitch, speed, and temporal structure of the speech sound). This is highly advantageous because it increases perceptual reliability and saliency and, as a result, it might enhance learning, discrimination, or the speed of a reaction to the stimulus (Sumby and Pollack 1954; Summerfield 1987). However, the multisensory nature of perception also raises the question about how the different sense organs cooperate so as to form a coherent representation of the world. In recent years, this has been the focus of much behavioral and neuroscientific research (Calvert et al. 2004). The most commonly held view among researchers in multisensory perception is what has been referred to as the “assumption of unity.” It states that, as information from different modalities share more (amodal) properties, the more likely the brain will treat them as originating from a common object or source (see, e.g., Bedford 1989; Bertelson 1999; Radeau 1994; Stein and Meredith 1993; Welch 1999; Welch and Warren 1980). Without a doubt, the most important amodal 147
148
The Neural Bases of Multisensory Processes
property is temporal coincidence (e.g., Radeau 1994). From this perspective, one expects intersensory interactions to occur if, and only if, information from the different sense organs arrives at around the same time in the brain; otherwise, two separate events are perceived rather than a single multimodal one. The perception of time and, in particular, synchrony between the senses is not straightforward because there is no dedicated sense organ that registers time in an absolute scale. Moreover, to perceive synchrony, the brain has to deal with differences in physical (outside the body) and neural (inside the body) transmission times. Sounds, for example, travel through air much slower than visual information does (i.e., 300,000,000 m/s for vision vs. 330 m/s for audition), whereas no physical transmission time through air is involved for tactile stimulation as it is presented directly at the body surface. The neural processing time also differs between the senses, and it is typically slower for visual than it is for auditory stimuli (approximately 50 vs. 10 ms, respectively), whereas for touch, the brain may have to take into account where the stimulation originated from as the traveling time from the toes to the brain is longer than from the nose (the typical conduction velocity is 55 m/s, which results in a ~30 ms difference between toe and nose when this distance is 1.60 m; Macefield et al. 1989). Because of these differences, one might expect that for audiovisual events, only those occurring at the so-called “horizon of simultaneity” (Pöppel 1985; Poppel et al. 1990)—a distance of approximately 10 to 15 m from the observer—will result in the approximate synchronous arrival of auditory and visual information at the primary sensory cortices. Sounds will arrive before visual stimuli if the audiovisual event is within 15 m from the observer, whereas vision will arrive before sounds for events farther away. Although surprisingly, despite these naturally occurring lags, observers perceive intersensory synchrony for most multisensory events in the external world, and not only for those at 15 m. In recent years, a substantial amount of research has been devoted to understanding how the brain handles these timing differences (Calvert et al. 2004; King 2005; Levitin et al. 2000; Spence and Driver 2004; Spence and Squire 2003). Here, we review several key issues about intersensory timing. We start with a short overview of how intersensory timing is generally measured, and then discuss several factors that affect the point of subjective simultaneity and sensitivity. In the sections that follow, we address several ways in which the brain might deal with naturally occurring lags between the senses.
9.2 MEASURING INTERSENSORY SYNCHRONY: TEMPORAL ORDER JUDGMENT TASK AND SIMULTANEITY JUDGMENT TASK Before examining some of the basic findings, we first devote a few words on how intersensory synchrony is usually measured. There are two classic tasks that have been used most of the time in the literature. In both tasks, observers are asked to judge—in a direct way—the relative timing of two stimuli from different modalities: the temporal order judgment (TOJ) task and the simultaneity judgment (SJ) task. In the TOJ task, stimuli are presented in different modalities at various stimulus onset asynchronies (SOA; Dixon and Spitz 1980; Hirsh and Sherrick 1961; Sternberg and Knoll 1973), and observers may judge which stimulus came first or which came second. In an audiovisual TOJ task, participants may thus respond with “sound-first” or “light-first.” If the percentage of “sound-first” responses is plotted as a function of the SOA, one usually obtains an S-shaped logistic psychometric curve. From this curve, one can derive two measures: the 50% crossover point, and the steepness of the curve at the 50% point. The 50% crossover point is the SOA at which observers were—presumably—maximally unsure about temporal order. In general, this is called the “point of subjective simultaneity” (PSS) and it is assumed that at this SOA, the information from the different modalities is perceived as being maximally simultaneous. The second measure—the steepness at the crossover point—reflects the observers’ sensitivity to temporal asynchronies. The steepness can also be expressed in terms of the just noticeable difference (JND; half the difference in SOA between the 25% and 75% point), and it represents the smallest interval observers can reliably
149
Perception of Synchrony between the Senses
notice. A steep psychometric curve thus implies a small JND, and sensitivity is thus good as observers are able to detect small asynchronies (see Figure 9.1). The second task that has been used often is the SJ task. Here, stimuli are also presented at various SOAs, but rather than judging which stimulus came first, observers now judge whether the stimuli were presented simultaneously or not. In the SJ task, one usually obtains a bell-shaped Gaussian curve if the percentage of “simultaneous” responses is plotted as a function of the SOA. For the audiovisual case, the raw data are usually not mirror-symmetric, but skewed toward more “simultaneous” responses on the “light-first” side of the axis. Once a curve is fitted on the raw data, one can, as in the TOJ task, derive the PSS and the JND: the peak of the bell shape corresponds to the PSS, and the width of the bell shape corresponds to the JND. The TOJ and SJ tasks have, in general, been used more or less interchangeably, despite the fact that comparative studies have found differences in performance measures derived from both tasks. Possibly, it reflects that judgments about simultaneity and temporal order are based on different sources of information (Hirsh and Fraisse 1964; Mitrani et al. 1986; Schneider and Bavelier 2003; Zampini et al. 2003a). As an example, van Eijk et al. (2008) examined task effects on the PSS. They presented observers a sound and light, or a bouncing ball and an impact sound at various SOAs, and had them perform three tasks: an audiovisual TOJ task (“sound-first” or “light-first” responses required), an SJ task with two response categories (SJ2; “synchronous” or “asynchronous” responses required), and an SJ task with three response categories (SJ3; “sound-first,” “synchronous,” or “light-first” responses required). Results from both stimulus types showed that the individual PSS values for the two SJ tasks correlated well, but there was no correlation between the
Simultaneity judgment task: Synchronous or asynchronous?
Temporal order judgment task: Sound or light first?
Percentage of “Synchronous” or “V-first” responses
100
75
JND
50
25 PSS 0 A-first –80
–60
–40
–20
20
40
60
Stimulus onset asynchrony (in ms)
80 V-first
FIGURE 9.1 S-shaped curve that is typically obtained for a TOJ task and a bell-shaped curve typically obtained in a simultaneity task (SJ). Stimuli from different modalities are presented at varying SOAs, ranging from clear auditory-first (A-first) to clear vision-first (V-first). In a TOJ task, the participant’s task is to judge which stimulus comes first, sound or light, whereas in a SJ task, subjects judge whether stimuli are synchronous or not. The PSS represents the interval at which information from different modalities is perceived as being maximally simultaneous (~0 ms). In a SJ task, this is the point at which the most synchronous responses are given; in TOJ task, it is the point at which 50% of responses is vision-first and 50% is auditory-first. The JND represents the smallest interval observers can reliably notice (in this example ~27 ms). In a SJ task, this is the average interval (of A-first and V-first) at which a participant responds with 75% synchronous responses. In a TOJ task, it is the difference in SOA at 25% and 75% point divided by two.
150
The Neural Bases of Multisensory Processes
TOJ and SJ tasks. This made the authors conclude, arguably, that the SJ task should be preferred over the TOJ task if one wants to measure perception of audiovisual synchrony. In our view, there is no straightforward solution about how to measure the PSS or JND for intersensory timing because the tasks are subject to different kinds of response biases (see Schneider and Bavelier 2003; Van Eijk et al. 2008; Vatakis et al. 2007, 2008b for discussion). In the TOJ task, in which only temporal order responses can be given (“sound-first” or “light-first”), observers may be inclined to adopt the assumption that stimuli are never simultaneous, which thus may result in rather low JNDs. On the other hand, in the SJ task, observers may be inclined to assume that stimuli actually belong together because the “synchronous” response category is available. Depending on criterion settings, this may result in many “synchronous” responses, and thus, a wide bell-shaped curve which will lead to the invalid conclusion that sensitivity is poor. In practice, both the SJ and TOJ task will have their limits. The SJ2 task suffers heavily from the fact that observers have to adopt a criterion about what counts as “simultaneous/nonsimultaneous.” And in the SJ3 task, the participant has to dissociate sound-first stimuli from synchronous ones, and light-first stimuli from synchronous ones. Hence, in the SJ3 task there are two criteria: a “sound-first/ simultaneous” criterion, and a “light-first/simultaneous” criterion. If observers change, for whatever reason, their criterion (or criteria) along the experiment or between experimental manipulations, it changes the width of the curve and the corresponding JND. If sensitivity is the critical measure, one should thus be careful using the SJ task because JNDs depend heavily on these criterion settings. A different critique can be applied to the TOJ task. Here, the assumption is made that observers respond at about 50% for each of the two response alternatives when maximally unsure about temporal order. Although in practice, participants may adopt a different strategy and respond, for example, “sound-first” (and others may, for arbitrary reasons, respond “light-first”) whenever unsure about temporal order. Such a response bias will shift the derived 50% point toward one side of the continuum or the other, and the 50% point will then not be a good measure of the PSS, the point at which simultaneity is supposed to be maximal. If performance of an individual observer on an SJ task is compared with a TOJ task, it should thus not come as too big of a surprise that the PSS and JND derived from both tasks do not converge.
9.3 POINT OF SUBJECTIVE SIMULTANEITY The naïve reader might think that stimuli from different modalities are perceived as being maximally simultaneous if they are presented the way nature does, that is, synchronous, so at 0 ms SOA. Although surprisingly, most of the time, this is not the case. For audiovisual stimuli, the PSS is usually shifted toward a visual–lead stimulus, so perceived simultaneity is maximal if vision comes slightly before sounds (e.g., Kayser et al. 2008; Lewald and Guski 2003; Lewkowicz 1996; Slutsky and Recanzone 2001; Zampini et al. 2003a, 2005b, 2005c). This bias was found in a classic study by Dixon and Spitz (1980). Here, participants monitored continuous videos consisting of an audiovisual speech stream or an object event consisting of a hammer hitting a peg. The videos started off in synchrony and were then gradually desynchronized at a constant rate of 51 ms/s up to a maximum asynchrony of 500 ms. Observers were instructed to respond as soon as they noticed the asynchrony. They were better at detecting the audiovisual asynchrony if the sound preceded the video rather than if the video preceded the sound (131 vs. 258 ms thresholds for speech, and 75 vs. 188 ms thresholds for the hammer, respectively). PSS values also pointed in the same direction as simultaneity was maximal when the video preceded the audio by 120 ms for speech, and by 103 ms for the hammer. Many other studies have reported this vision-first PSS (Dinnerstein and Zlotogura 1968; Hirsh and Fraisse 1964; Jaskowski et al. 1990; Keetels and Vroomen 2005; Spence et al. 2003; Vatakis and Spence 2006a; Zampini et al. 2003a), although some also reported opposite results (Bald et al. 1942; Rutschmann and Link 1964; Teatini et al. 1976; Vroomen et al. 2004). There have been many speculations about the underlying reason for this overall visual–lead asymmetry, the main one being that observers are tuned toward the natural situation in which lights arrive before
Perception of Synchrony between the Senses
151
sounds on the sense organs (King and Palmer 1985). There will then be a preference for vision to have a head start over sound so as to be perceived as simultaneous. Besides this possibility, though, there are many other reasons why the PSS can differ quite substantially from 0 ms SOA. To point out just a few: the PSS depends, among others, on stimulus intensity (more intense stimuli are processed faster or come to consciousness more quickly (Jaskowski 1999; Neumann and Niepel 2004; Roefs 1963; Sanford 1971; Smith 1933), stimulus duration (Boenke et al. 2009), the nature of the response that participants have to make (e.g., “Which stimulus came first?” vs. “Which stimulus came second?”; see Frey 1990; Shore et al. 2001), individual differences (Boenke et al. 2009; Mollon and Perkins 1996; Stone et al. 2001), and the modality to which attention is directed (Mattes and Ulrich 1998; Schneider and Bavelier 2003; Shore et al. 2001, 2005; Stelmach and Herdman 1991; Zampini et al. 2005c). We do not intend to list all the factors known thus far, but we only pick out the one that has been particularly important in theorizing about perception in general, that is, the role of attention.
9.3.1 Attention Affecting PSS: Prior Entry A vexing issue in experimental psychology is the idea that attention speeds up sensory processing. Titchener (1908) termed it the “law of prior entry,” implying that attended objects come to consciousness more quickly than unattended ones. Many of the old studies on prior entry suffered from the fact that they might simply reflect response biases (see Schneider and Bavelier 2003; Shore et al. 2001; Spence et al. 2001; Zampini et al. 2005c for discussions on the role of response bias in prior entry). As an example, observers may, whenever unsure, just respond that the attended stimulus was presented first without really having that impression. This strategy would reflect a change in decision criterion rather than a low-level sensory interaction between attention and the attended target stimulus. To disentangle response biases from truly perceptual effects, Spence et al. (2001) performed a series of important TOJ experiments in which visual–tactile, visual–visual, or tactile– tactile stimulus pairs were presented from the left or right of fixation. The focus of attention was directed toward either the visual or tactile modality by varying the probability of each stimulus modality (e.g., in the attend–touch condition, there were 50% tactile–tactile pairs, 0% visual–visual, and 50% critical tactile–visual pairs). Participants had to indicate whether the left or right stimulus was presented first. The idea tested was that attention to one sensory modality would speed up perception of stimuli in that modality, thus resulting in a change of the PSS (see also Mattes and Ulrich 1998; Schneider and Bavelier 2003; Shore et al. 2001, 2005; Stelmach and Herdman 1991; Zampini et al. 2005c). Their results indeed supported this notion: when attention was directed to touch, visual stimuli had to lead by much greater intervals (155 ms) than when attention was directed to vision (22 ms) for them to be perceived as simultaneous. Additional experiments demonstrated that attending to one side (left or right) also speeded perception of stimuli presented at that side. Therefore, both spatial attention and attention to modality were effective in shifting the PSS, presumably because they speeded up perceptual processes. To minimize the contribution of any simple response bias on the PSS, Spence et al. (2001) performed these experiments in which attention was manipulated in a dimension (modality or side) that was orthogonal to that of responding (side or modality, respectively). Thus, while attending to vision or touch, participants had to judge which side came first; and while attending to the left or right, participants judged which modality came first. The authors reported similar shifts of the PSS in these different tasks, thus favoring a perceptual basis for prior entry. Besides such behavioral data, there is also extensive electrophysiological support for the idea that attention affects perceptual processing. Very briefly, in the electroencephalogram (EEG) one can measure the event-related response (ERP) of stimuli that were either attended or unattended. Naïvely speaking, if attention speeds up stimulus processing, one would expect ERPs of attended stimuli to be faster than unattended ones. In a seminal study by Hillyard and Munte (1984), participants were presented a stream of brief flashes and tones on the left or right of fixation. The
152
The Neural Bases of Multisensory Processes
participant’s task was to attend either the auditory or visual modality, and to respond to infrequent targets in that modality at an attended location (e.g., respond to a slightly longer tone on the left). The attended modality was constant during the experiment (but varied between subjects), and the relevant location was specified at the beginning of each block of trials. The authors found enhanced negativity in the ERP for stimuli at attended locations if compared to nonattended locations. The negativity started at about 150 ms poststimulus for visual stimuli and at about 100 ms for auditory stimuli. Evidence for a cross-modal link in spatial attention was also found, as the enhancement (although smaller) was also found for stimuli at the attended location in the unattended modality (see also Spence and Driver 1996; Spence et al. 2000 for behavioral results). Since then, analogous results have been found by many others. For example, Eimer and Schröger (1998) found similar results using a different design in which the side of the attended location varied from trial to trial. Again, their results demonstrated enhanced negativities (between 160 and 280 ms after stimulus onset) for attended locations as compared to unattended locations, and the effect was again bigger for the relevant rather than irrelevant modality. The critical issue for the idea prior entry is whether these ERP effects also reflect that attended stimuli are processed faster. In most EEG studies, attention affects the amplitude of the ERP rather than speed (for a review, see Eimer and Driver 2001). The problem is that there are many other interpretations for an amplitude modulation rather than increased processing speed (e.g., less smearing of the EEG signal over trials if attended). A shift in the latencies of the ERP would have been easier to interpret in terms of increased processing speed, but the problem is that even if a latency shift in the ERP is obtained, it is usually small if compared to the behavioral data. As an example, in an ERP study by Vibell et al. (2007), attention was directed toward the visual or tactile modality in a visual–tactile TOJ task. Results showed that the peak latency of the visual evoked potentials (P1 and N1) was earlier when attention was directed to vision (P1 = 147 ms, and N1 = 198 ms) rather than when directed to touch (P1 = 151 ms, and N1 = 201 ms). This shift in the P1 may be taken as evidence that attention indeed speeds up perception in the attended modality, but it should also be noted that the 4-ms shift in the ERP is in a quite different order of magnitude than the 38 ms shift of the PSS in the behavioral data, or the 133 ms shift reported by Spence et al. (2001) in a similar study. In conclusion, there is both behavioral and electrophysiological support for the idea that attention speeds up perceptual processing, but the underlying neural mechanisms remain, for the time being, elusive.
9.4 SENSITIVITY FOR INTERSENSORY ASYNCHRONY Besides the point at which simultaneity is perceived to be maximal (the PSS), the second measure that one can derive from the TOJ and SJ task—but which is unfortunately not always reported—is the observers’ sensitivity to timing differences, the JND. The sensitivity to intersensory timing differences is not only of interest for theoretical reasons, but it is also of practical importance, for example, in video broadcasting or multimedia Internet where standards are required for allowable audio or video delays (Finger and Davis 2001; Mortlock et al. 1997; Rihs 1995). One of the classic studies on sensitivity for intersensory synchrony was done by Hirsh and Sherrick (1961). They presented audio–visual, visual–tactile, and audio–tactile stimuli in a TOJ task and reported JNDs to be approximately 20 ms regardless of the modalities used. Although more recent studies have found substantially bigger JNDs and larger differences between the sensory modalities. For simple cross-modal stimuli such as auditory beeps and visual flashes, JNDs have been reported in the order of approximately 25 to 50 ms (Keetels and Vroomen 2005; Zampini et al. 2003a, 2005b), but for audio–tactile pairs, Zampini et al. (2005a) obtained JNDs of about 80 ms, and for visual–tactile pairs, JNDs have been found in the order of 35 to 65 ms (Keetels and Vroomen 2008b; Spence et al. 2001). More importantly, JNDs are not constant, but have been shown to depend on various other factors like the spatial separation between the components of the stimuli, stimulus complex-
Perception of Synchrony between the Senses
153
ity, whether it is speech or not, and—more controversial—the semantic congruency. Some of these factors will be described below.
9.4.1 Spatial Disparity Affects JND A factor that has been shown to affect sensitivity for intersensory timing is the spatial separation between the components of a stimulus pair. Typically, sensitivity for temporal order improves if the components of the cross-modal stimuli are spatially separated (i.e., lower JNDs; Bertelson and Aschersleben 2003; Spence et al. 2003; Zampini et al. 2003a, 2003b, 2005b). Bertelson and Aschersleben, for example, reported audiovisual JNDs to be lower when a beep and a flash were presented from different locations rather than from a common and central location. Zampini et al. (2003b) qualified these findings and observed that sensitivity in an audiovisual TOJ task improved if the sounds and lights were presented from different locations, but only so if presented at the left and right from the median (at 24°). No effect of separation was found for vertically separated stimuli. This made Zampini et al. conclude that the critical factor for the TOJ improvement was that the individual components of an audiovisual stimulus were presented in different hemifields. Keetels and Vroomen (2005), though, examined this notion and varied the (horizontal) size of the spatial disparity. Their results showed that JNDs also improved when spatial disparity was large rather than small, even if stimuli did not cross hemifields. Audiovisual JNDs thus depend on both the relative position from which stimuli are presented and on whether hemifields are crossed or not. Spence et al. (2001) further demonstrated that sensitivity improves for spatially separated visual–tactile stimulus pairs, although no such effect was found for audio–tactile pairs (Zampini et al. 2005a). In blind people, on the other hand, audio–tactile temporal sensitivity was found to be affected by spatial separation (Occelli et al. 2008) and similar spatial modulation effects were demonstrated in rear space (Kitagawa 2005). What is the underlying reason that sensitivity to temporal differences improves if the sources are spatially separated? Or, why does the brain fail to notice temporal intervals when stimuli comes from a single location? Two accounts have been proposed (Spence et al. 2003). First, it has been suggested that intersensory pairing impairs sensitivity for temporal order. The idea underlying “intersensory pairing” is that the brain has a list of criteria on which it decides whether information from different modalities belong together or not. Commonality in time is, without a doubt, a very important criterion, but there may be others like commonality in space, association based on cooccurrence, or semantic congruency. Stimuli from the same location may, for this reason, be more likely paired into a single multimodal event if compared to stimuli presented far apart (see Radeau 1994). Any such tendency to pair stimuli could then make the relative temporal order of the components lost, thereby worsening the temporal sensitivity in TOJ or SJ tasks. In contrast with this notion, many cross-modal effects occur despite spatial discordance, and there are reasons to argue that spatial congruency may not be an important criterion for intersensory pairing (Bertelson 1994; Colin et al. 2001; Jones and Munhall 1997; Keetels et al. 2007; Keetels and Vroomen 2007, 2008a; Stein et al. 1996; Teder-Salejarvi et al. 2005; Vroomen and Keetels 2006). But why, then, does sensitivity for temporal order improve with spatially separated stimuli if not because intersensory pairing is impeded? A second reason why JNDs may improve is that of spatial redundancy. Whenever multisensory information is presented from different locations, observers actually have extra spatial information on which to base their response. That is, observers may initially not know which modality had been presented first, but still know on which side the first stimulus appeared, and they may then infer which modality had been presented first. As an example, in an audiovisual TOJ task, an observer may have noticed that the first stimulus came from the left (possibly because attention was captured by the first stimulus toward that side). They may also remember that the light was presented on the right. By inference, then, the sound must have been presented first. Sensitivity for temporal order for spatially separated stimuli then improves because there are extra spatial cues that are not present for colocated stimuli.
154
The Neural Bases of Multisensory Processes
9.4.2 Stimulus Complexity Affects JND Many studies exploring temporal sensitivity have used relatively simple stimuli such as flashes and beeps that have a single and rather sharp transient onset. However, in real-world situations, the brain has to deal with much more complex stimuli that often have complicated variations in temporal structure over time (e.g., seeing and hearing someone speaking; or seeing, hearing, and touching the keys on a computer keyboard). How does the brain notice timing differences between these more complicated and dynamic stimuli? Theoretically, one might expect that more complex stimuli also provide a richer base on which to judge temporal order. Audiovisual speech would be the example “par excellence” because it is rich in content and fluctuating over time. Although in fact, several studies have found the opposite, and in particular for audiovisual speech, the “temporal window” for which the auditory and visual streams are perceived as synchronous is rather wide (Conrey and Pisoni 2006; Dixon and Spitz 1980; Jones and Jarick 2006; Stekelenburg and Vroomen 2007; a series of studies by Vatakis and Spence 2006a; Vatakis, Ghanzanfar and Spence 2008a; van Wassenhove et al. 2007). For example, in a study by van Wassenhove et al. (2007), observers judged in an SJ task whether congruent audiovisual speech stimuli and incongruent McGurk-like speech stimuli* (McGurk and MacDonald 1976) were synchronous or not. The authors found a temporal window of 203 ms for the congruent pairs (ranging from −76 ms sound-first to +127 ms vision-first, with PSS at 26 ms vision-first) and a 159 ms window for the incongruent pairs (ranging from –40 to +119 ms, with PSS at 40 ms vision-first). These windows are rather wide if compared to the much smaller windows found for simple flashes and beeps (mostly below 50 ms; Hirsh and Sherrick 1961; Keetels and Vroomen 2005; Zampini et al. 2003a, 2005b). The relatively wide temporal window for complex stimuli has also been demonstrated by indirect tests. For example, the McGurk effect was found to diminish if the auditory and visual information streams are out of sync, but this only occurred at rather long intervals (comparable with the ones found in SJ tasks; Grant et al. 2004; Massaro et al. 1996; McGrath and Summerfield 1985; Munhall et al. 1996; Pandey et al. 1986; Tanaka et al. 2009b; van Wassenhove et al. 2007). There have been several recent attempts to compare sensitivity for intersensory timing in audiovisual speech with other audiovisual events such as music (guitar and piano) and object actions (e.g., smashing a television set with a hammer, or hitting a soda can with a block of wood; Vatakis and Spence 2006a, 2006b). Observers made TOJs about which stream (auditory or visual) appeared first. Overall, results showed better temporal sensitivity for audiovisual stimuli of “lower complex ity” in comparison with stimuli having continuously varying properties (i.e., syllables vs. words and/or sentences). Similar findings were reported by Stekelenburg and Vroomen (2007), who compared JNDs of audiovisual speech (pronunciation of the syllable /bi/) with that of natural nonspeech events (a video of a handclap) in a TOJ task. Again, JNDs were much better for the nonspeech events (64 ms) than for speech (105 ms). On the basis of these findings, some have concluded that “speech is special” (van Wassenhove et al. 2007; Vatakis et al. 2008a) or that when “stimulus complexity” increases, sensitivity for temporal order deteriorates (Vatakis and Spence 2006a). Although in our view, these proposals do not really clarify the issue because the notion of “speech is special” and “stimulus complexity” are both ill-defined, and most likely, these concepts are confounded with other stimulus factors that can be described more clearly. As an example, it is known that the rate at which stimuli are presented affects audiovisual JNDs for intersensory timing (Benjamins et al. 2008; Fujisaki and Nishida 2005). Sensitivity may also be affected by whether there is anticipatory information that predicts the onset of an audiovisual event (Stekelenburg and Vroomen 2007; Van Eijk 2008; Vroomen and * In the McGurk illusion (McGurk and MacDonald 1976), it is shown that the perception of nonambiguous speech tokens can be modified by the simultaneous presentation of visually incongruent articulatory gestures. Typically, when presented with an auditory syllable /ba/ dubbed onto a face articulating /ga/, participants report hearing /da/. The occurrence of this so-called McGurk effect has been taken as a particularly powerful demonstration of the use of visual information in speech perception.
Perception of Synchrony between the Senses
155
Stekelenburg 2009), and by whether there is a sharp transition that can serve as a temporal anchor (Fujisaki and Nishida 2005). Each of these stimulus characteristics—and likely many others—need to be controlled if one wants to compare across stimuli in a nonarbitrary way. Below, we address some of these factors.
9.4.3 Stimulus Rate Affects JND It has been demonstrated that perception of intersensory synchrony breaks down if stimuli are presented with a temporal frequency of above ~4Hz. This is very slow if compared to unimodal visual or auditory sensitivity for temporal coherence. Fujisaki and Nishida (2005) examined this using audiovisual stimuli consisting of a luminance-modulated Gaussian blob and an amplitudemodulated white noise presented at various rates. They demonstrated that synchrony–asynchrony discrimination for temporally dense random pulse trains became nearly impossible at temporal frequencies larger than 4 Hz, even when the audiovisual interval was large enough for discrimination of single pulses (the discrimination thresholds were 75, 81, and 119 ms for single pulses, 2 and 4 Hz repetitive stimuli, respectively). This 4-Hz boundary was also reported by Benjamins et al. (2008). They explored the temporal limit of audiovisual integration using a visual stimulus that alternated in color (red or green) and a sound that alternated in frequency (high or low). Observers had to indicate which sound (high or low) accompanied the red disk. Their results demonstrated that at rates of 4.2 Hz and higher, observers were no longer able to match the visual and auditory stimuli across modalities (proportion correct matches dropped from 0.9 at 1.9 Hz to 0.5 at a 4.2 Hz). Further experiments also demonstrated that manipulating other temporal stimulus characteristics such as the stimulus offsets and/or audiovisual SOAs did not change the 4-Hz threshold. Here, it should be mentioned that the 4-Hz rate is also the approximate rate with which syllables are spoken in continuous speech, and temporal order in audiovisual speech might thus be difficult simply because stimulus presentation is too fast, and not because speech is special.*
9.4.4 Predictability Affects JND Another factor that may play a role in intersensory synchrony judgments, but one that has not yet been studied extensively, is the extent to which (one of the components of) a multisensory event can be predicted. As an example, for many natural events—such as the clapping of hands—vision provides predictive information about when a sound is to occur, as there is visual anticipatory information about sound onset. Stimuli with predictive information allow observers to make a clear prediction about when a sound is to occur, and this might improve sensitivity for temporal order. A study by van Eijk et al. (2008, Chapter 4) is of relevance here. They explored the effect of visual predictive information (or, the way the authors called it, “apparent causality”) on perceived audiovisual synchrony. Visual predictive information was either present or absent by showing all or part of a Newton’s cradle toy (i.e., a ball that appears to fall from a suspended position on the left of the display, strikes the leftmost of four contiguous balls, and then launches the rightmost ball into an arc motion away from the other balls). The collision of the balls was accompanied by a sound that varied around the time of the impact. The predictability of the sound was varied by showing either the left side of the display (motion followed by a collision and sound so that visual motion predicted sound occurrence) or the right side of the display (a sound followed by visual motion; so no predictable information about sound onset). In line with the argument made here, the authors reported * It has also been reported that the presentation rate may shift the PSS. In a study by Arrighi et al. (2006), participants were presented a video of hands drumming on a conga at various rates (1, 2, and 4 Hz). Observers were asked to judge whether the auditory and visual streams appeared to be synchronous or not (an SJ task). Results showed that the auditory delay for maximum simultaneity (the PSS) varied inversely with drumming tempo from about 80 ms at 1 Hz, and 60 ms at 2 Hz, to 40 ms at 4 Hz. Video sequences of random drumming motion and of a disk moving along the motion profile matching the hands of to the drummer produced similar results, with higher tempos requiring less auditory delay.
156
The Neural Bases of Multisensory Processes
better temporal sensitivity if visual predictive information about sound onset was available (the left display) rather than if it was absent (the right display).
9.4.5 Does Intersensory Pairing Affect JND? A more controversial issue in the literature on intersensory timing is the extent to which information from different modalities is treated by the brain as belonging to the same event. Some have headed it under the already mentioned notion of “intersensory pairing,” others under the “unity assumption” (Welch and Warren 1980). The idea is that observers find it difficult to judge temporal order if the information streams naturally belong together, for reasons other than temporal coincidence, because there is then more intersensory integration; in which case, temporal order is lost. Several studies have examined this issue but with varying outcomes. In a study by Vatakis and Spence (2007), participants judged the temporal order of audiovisual speech stimuli that varied in gender and phonemic congruency. Face and voice congruency could vary in gender (a female face articulating /pi/ with a sound of either a female or male /pi/), or phonemic content (a face saying /ba/ with a voice saying /ba/ or /da/). In support of the unity assumption, results showed that for both the gender and phonemic congruency manipulation, sensitivity for temporal order improved if the auditory and visual streams were incongruent rather than congruent. In a recent study, Vatakis et al. (2008a) qualified these findings and reported that this effect may be specific for human speech. In this study, the effect of congruency was examined using matching or mismatching call types of monkeys (“cooing” vs. “grunt” or threat calls). For audiovisual speech, the sensitivity of temporal order was again better for the incongruent rather than congruent trials, but there was no congruency effect for the monkey calls. In another study, Vatakis and Spence (2008) also found no congruency effect for audiovisual music and object events that either matched (e.g., the sight of a note being played on a piano together with the corresponding sound, or the video of a bouncing ball with a corresponding sound) or mismatched. At this stage, it therefore appears that the “unity assumption” may only apply to audiovisual speech. It leaves one to wonder, though, whether this effect is best explained in terms of the “special” nature of audiovisual speech, or whether other factors are at play (e.g., the high level of exposure to speech stimuli in daily life, the possibly more attention-grabbing nature of speech stimuli, or the specific low-level acoustic stimulus features of speech; Vatakis et al. 2008a).
9.5 HOW THE BRAIN DEALS WITH LAGS BETWEEN THE SENSES In any multisensory environment, the brain has to deal with lags in arrival and processing time between the different senses. Surprisingly though, despite these lags, temporal coherence is usually maintained, and only in exceptional circumstances such as the thunder, which is heard after the lightning, a single multisensory event is perceived as being separated. This raises the question of how temporal coherence is maintained. In our view, at least four options are available: (1) the brain might be insensitive for small lags, or it could just ignore them (a window of temporal integration); (2) the brain might be “intelligent” and bring deeply rooted knowledge about the external world into play that allows it to compensate for various external factors; (3) the brain might be flexible and shift its criterion about synchrony in an adaptive fashion (recalibration); or (4) the brain might actively shift the time at which one information stream is perceived to occur toward the other (temporal ventriloquism). Below, we discuss each of these notions. It should be noted beforehand that none of these options mutually excludes the other.
9.5.1 Window of Temporal Integration The first notion, that the brain is rather insensitive to lags, comes close to the idea that there is a “window of temporal integration.” Any information that falls within this hypothetical window is potentially assigned to the same external event and streams within the window are then treated as to
157
Perception of Synchrony between the Senses
have occurred simultaneously (see Figure 9.2, panel 1). Many have alluded to this concept, but what is less satisfying about it is that it is basically a description rather than an explanation. To make this point clear, some have reported that the temporal window for audiovisual speech can be quite large because it can range from approximately 40 ms audio-first to 240 ms vision-first. However, sensitivity for intersensory asynchronies (JND) is usually much smaller than the size of this window. For example, Munhall et al. (1996) demonstrated that exact temporal coincidence between the auditory and visual parts of audiovisual speech stimuli is not a very strict constraint on the McGurk effect (McGurk and MacDonald 1976). Their results demonstrated that the McGurk effect was biggest when vowels were synchronized (see also McGrath and Summerfield 1985), but the effect survived even if audition lagged vision by 180 ms (see also Soto-Faraco and Alsius 2007, 2009; these studies 1) A wide window of temporal integration
Time
= Air travel time = Neural processing time
= Actual stimulus onset time
= Window of integration
= Perceived temporal occurrence
2) The brain compensates for auditory delays caused by sound distance Close sound:
Far sound:
3) Adaptation to intersensory asynchrony via: a. Adjustment of criterion
b. Widening of the window
c. Adjustment of the sensory threshold
4) Temporal ventriloquism: The perceived visual onset time is shifted towards audition
FIGURE 9.2 Synchrony can be perceived despite lags. How is this accomplished? Four possible mechanisms are depicted for audiovisual stimuli like a flash and beep. Similar mechanisms might apply for other stimuli and other modality pairings. Time is represented on the x-axis, and accumulation of sensory evidence on the y-axis. A stimulus is time-stamped once it surpasses a sensory threshold. Stimuli in audition and vision are perceived as being synchronous if they occur within a certain time window. (1) The brain might be insensitive for naturally occurring lags because the window of temporal integration is rather wide. (2) The brain might compensate for predictable variability—here, sound distance—by adjusting perceived occurrence of a sound in accordance with sound travel time. (3) Temporal recalibration. Three different mechanisms might underlie adaptation to asynchrony: (a) a shift in criterion about synchrony for adapted stimuli or modalities, (b) a widening of temporal window for adapted stimuli or modalities, and (c) a change in threshold of sensory detection (when did the stimulus occur?) within one of adapted modalities. (4) Temporal ventriloquism: a visual event is actively shifted toward an auditory event.
158
The Neural Bases of Multisensory Processes
show that participants can still perceive a McGurk effect when they can quite reliably perform TOJs). Outside the speech domain, similar findings have been reported. In a study by Shimojo et al. (2001), the role of temporal synchrony was examined using the streaming–bouncing illusion (i.e., two identical visual targets that move across each other and are normally perceived as a streaming motion are typically perceived to bounce when a brief sound is presented at the moment that the visual targets coincide; Sekuler et al. 1997). The phenomenon is dependent on the timing of the sound relative to the coincidence of the moving objects. Although it has been demonstrated that a brief sound induced the visual bouncing percept most effectively when it was presented about 50 ms before the moving objects coincide, their data furthermore showed a rather large temporal window of integration because intervals ranging from 250 ms before visual coincidence to 150 ms after coincidence still induced the bouncing percept (see also Bertelson and Aschersleben 1998, for the effect of temporal asynchrony on spatial ventriloquism; or Shams et al. 2002, for the illusory-flash effect). All these intersensory effects thus occur at asynchronies that are much larger than JNDs normally reported when directly exploring the effect of asynchrony using TOJ or SJ tasks (van Wassenhove et al. 2007). One might argue that despite the fact that observers do notice small delays between the senses, the brain can still ignore it if it is of help for other purposes, such as understanding speech (Soto-Faraco and Alsius 2007, 2009). But the question then becomes, why is there more than one window; that is, one for understanding, the other for noticing timing differences. Besides the width of the temporal window varying with the purpose of the task, it has also been found to vary for different kinds of stimuli. As already mentioned, the temporal window is much smaller for clicks and flashes than it is for audiovisual speech. However, why would the size be different for different stimuli? Does the brain have a separate window for each stimulus and each purpose? If so, we are left with explaining how and why it varies. Some have taken the concept of a window quite literally, and have argued that “speech is special” because the window for audiovisual speech is wide (van Wassenhove et al. 2007; Vatakis et al. 2008a). Although we would rather refrain from such speculations, and consider it more useful to examine what the critical features are that determine when perception of simultaneity becomes easy (a small window) or difficult (a large window). The size of the window is thus, in our view, the factor that needs to be explained rather than that it is the explanation itself.
9.5.2 Compensation for External Factors The second possibility—the intelligent brain that compensates for various delays—is a controversial issue that has received support mainly from studies that examined whether observers take distance into account when judging audiovisual synchrony (see Figure 9.2, panel 2). The relatively slow transduction time of sounds through air causes natural differences in arrival time between sounds and lights. It implies that the farther away an audiovisual event, the more the sound will lag the visual stimulus; although such a lag might be compensated for by the brain if distance were known. The brain might then treat a lagging sound as being synchronous to a light, provided that the audiovisual event occurred at the right distance. Some have indeed reported that the brain does just that as judgments about audiovisual synchrony were found to depend on perceived distance (Alais and Carlile 2005; Engel and Dougherty 1971; Heron et al. 2007; Kopinska and Harris 2004). Although others have failed to demonstrate compensation for distance (Arnold et al. 2005; Lewald and Guski 2004). Sugita and Suzuki (2003) explored compensation for distance with an audiovisual TOJ task. The visual stimuli were delivered by light-emitting diodes (LEDs) at distances ranging from 1 to 50 m in free-field circumstances (and were compensated for by intensity, although not size). Of importance, the sounds were delivered through headphones, and no attempt was made to equate the distance of the sound with that of the light. Note that this, in essence, undermines the whole idea that the brain compensates for lags of audiovisual events out in space. Nevertheless, PSS values were found to shift with visual stimulus distance. When the visual stimulus was 1 m away, the PSS was at about a
Perception of Synchrony between the Senses
159
~5 ms sound delay, and the delay increased when the LEDs were farther away. The increment was consistent with the velocity of sounds up to a viewing distance of about 10 m, after which it leveled off. This led the authors to conclude that lags between auditory and visual inputs are perceived as synchronous not because the brain has a wide temporal window for audiovisual integration, but because the brain actively changes the temporal location of the window depending on the distance of the source. Alais and Carlile (2005) came to similar conclusions, but with different stimuli. In their study, auditory stimuli were presented over a loudspeaker and auditory distance was simulated by varying the direct-to-reverberant energy ratio as a depth cue for sounds (Bronkhorst 1995; Bronkhorst and Houtgast 1999). The near sounds simulated a depth of 5 m and had substantial amounts of direct energy with a sharp transient onset; the far sounds simulated a depth of 40 m and did not have a transient. The visual stimulus was a Gaussian blob on a computer screen in front of the observer without variations in the distance. Note that, again, no attempt was made to equate auditory and visual distance, thus again undermining the underlying notion. The effect of apparent auditory distance on temporal alignment with the blob on the screen was measured in a TOJ task. The authors found compensation for depth, thus the PSS in the audiovisual TOJ task shifted with the apparent distance of the sound in accordance with the speed of sounds through air up to 40 m. Although on closer inspection of their data, it is clear that the shift in the PSS was mainly caused by the fact that sensitivity for intersensory synchrony became increasingly worse for more distant sounds. Judging from their figures, sensitivity for nearby sounds at 5 m was in the normal range, but for the most distant sound, sensitivity was extremely poor as it never reached plateau, and even at a sound delay of 200 ms, 25% of the responses was still “auditory-first” (see also Arnold et al. 2005; Lewald and Guski 2004). This suggests that observers, while performing the audiovisual TOJ task, could not use the onset of the far sound as a cue for temporal order, possibly because it lacks a sharp transient and that they had to rely on other cues instead. Besides controversial stimuli and data, there are others who simply failed to observe compensation for distance (Arnold et al. 2005; Heron et al. 2007; Lewald and Guski 2004; Stone et al. 2001). For example, Stone et al. (2001) used an audiovisual SJ task and varied stimulus–observer distances from 0.5 m in the near condition to 3.5 m in the far condition. This resulted in a 3-m difference that would theoretically correspond to an 11 ms difference in the PSS if sound–travel time would not be compensated (sound velocity of 330 m/s corresponds to ~3.5 m/11 ms). For three out of five subjects, the PSS values were indeed shifted in that direction, which led the authors to conclude that distance was not compensated. Against this conclusion, it should be said that the SJ tasks depend heavily on criterion settings, that “three-out-of-five” is not persuasively above chance, and that the range of distances was rather restricted. Less open to these kinds of criticisms is a study by Lewald and Guski (2004). They used a rather wide range of distances (1, 5, 10, 20, and 50 m), and their audiovisual stimuli (a sequence of five beeps/flashes) were delivered by colocated speakers/LEDs placed in the open field. Note that in this case, there were no violations in the “naturalness” of the audiovisual stimuli and that they were physically colocated. Using this setup, the authors did not observe compensation for distance. Rather, their results showed that when the physical observer–stimulus distance increased, the PSS shifted precisely with the variation in sound transmission time through air. For audiovisual stimuli that are far away, sounds thus had to be presented earlier than for nearby stimuli to be perceived as simultaneous, and there was no sign that the brain would compensate for sound–traveling time. The authors also suggested that the discrepancy between their findings and those who did find compensation for distance lies in the fact that the latter simulated distance rather than using the natural situation. Similar conclusions were also reached by Arnold et al. (2005), who examined whether the stream/ bounce illusion (Sekuler et al. 1997) varies with distance. The authors examined whether the optimal time to produce a “bounce” percept varied with the distance of the display, which ranged from ~1 to ~15 m. The visual stimuli were presented on a computer monitor—keeping retinal properties constant—and the sounds were presented either over loudspeakers at these distances or over
160
The Neural Bases of Multisensory Processes
headphones. The optimal time to induce a bounce percept shifted with the distance of the sound if they were presented over loudspeakers, but there was no shift if the sound was presented over headphones. Similar effects of timing shifts with viewing distance after loudspeaker, but not headphone, presentation were obtained in an audiovisual TOJ task in which observers judged whether a sound came before or after two disks collided. This led the authors to conclude that there is no compensation for distance if distance is real and presented over speakers rather than simulated and presented over headphones. This conclusion might well be correct, but it raises the question of how to account for the findings by Kopinska and Harris (2004). These authors reported complete compensation for distance despite using colocated sounds and lights produced at natural distances. In their study, the audiovisual stimulus was a bright disk that flashed once on a computer monitor and it was accompanied by a tone burst presented from the computer’s inbuilt speaker. Participants were seated at various distances from the screen (1, 4, 8, 16, 24, and 32 m) and made TOJs about the flash and the sound. The authors also selectively slowed down visual processing by presenting the visual stimulus at 20° of eccentricity rather than in the fovea, or by having observers wear darkened glasses. As an additional control, they used simple reaction time tasks and found that all these variations—distance, eccentricity, and dark glasses—had predictable effects on auditory or visual speeded reaction. However, audiovisual simultaneity was not affected by distance, eccentricity, or darkened glasses. Thus, there was no shift in the PSS despite the fact that the change in distance, illumination, and retinal location affected simple reaction times. This made the authors conclude that observers recover the external world by taking into account all kinds of predictable variations, most importantly distance, alluding to similar phenomena such as size or color constancy. There are some studies that varied audiovisual distance in a natural way, but came to diametrically opposing conclusions: Lewald and Guski (2004) and Arnold et al. (2005) found no compensation for distance, whereas Kopinska and Harris (2004) reported complete compensation. What’s the critical difference between them? Our conjecture is that they differ in two critical aspects, that is, (1) whether distance was randomized on a trial-by-trial basis or blocked, and (2) whether sensitivity for temporal order was good or poor. In the study by Lewald and Guski, the distance of the stimuli was varied on a trial-by-trial basis as they used a setup of five different speakers/LEDs. In Kopinska and Harris’s study, though, the distance between the observer and the screen was blocked over trials because otherwise subjects would have to be shifted back and forth after each trial. If the distance is blocked, then either adaptation to the additional sound lag may occur (i.e., recalibration), or subjects may equate response probabilities to the particular distance that they are seated. Either way, the effect of distance on the PSS will diminish if trials are blocked, and no shift in the PSS will then be observed, leading to the “wrong” conclusion that distance is compensated. This line of reasoning corresponds with a recent study by Heron et al. (2007). In their study, participants performed a TOJ task in which audiovisual stimuli (a white disk and a click) were presented at varying distances (0, 5, 10, 20, 30, and 40 m). Evidence for compensation was only found after a period of adaptation (1 min + 5 top-up adaptation stimuli between trials) to the naturally occurring audiovisual asynchrony associated with a particular viewing distance. No perceptual compensation for distanceinduced auditory delays could be demonstrated whenever there was no adaptation period (although we should notice that in the present study, observer distance was always blocked). The second potentially relevant difference between studies that do or do not demonstrate compensation is the difficulty of the stimuli. Lewald and Guski (2004) used a sequence of five pulses/ sounds, whereas Kopinska and Harris (2004) presented a single sound/flash. In our experience, a sequence of pulses/flashes drastically improves accuracy for temporal order if compared to a single pulse/flash because there are many more cues in the signal. In the study by Arnold et al. (2005), judgments about temporal order could also be relatively accurate because the two colliding disks provided anticipatory information about when to expect the sound. Most likely, observers in the study of Kopinska and Harris were inaccurate because their single sound/flash stimuli without anticipatory information were difficult (unfortunately, none of the studies reported JNDs). In effect,
Perception of Synchrony between the Senses
161
this amounts to adding noise to the psychometric function, which then effectively masks the effect of distance on temporal order. It might easily lead one to conclude “falsely” that there is compensation for distance.
9.5.3 Temporal Recalibration The third possibility of how the brain might deal with lags between the senses entails that the brain is flexible in adopting what it counts as synchronous (see Figure 9.2, panel 3). This phenomenon is also known as “temporal recalibration.” Recalibration is a well-known phenomenon in the spatial domain, but it has only recently been demonstrated in the temporal domain (Fujisaki et al. 2004; Vroomen et al. 2004). As for the spatial case, more than a century ago, von Helmholtz (1867) had already shown that the visual–motor system was remarkably flexible as it adapts to shifts of the visual field induced by wedge prisms. If prism-wearing subjects had to pick up a visually displaced object, they would quickly adapt to the new sensor–motor arrangement and even after only a few trials, small visual displacements might get unnoticed. Recalibration was the term used to explain this phenomenon. In essence, recalibration is thought to be driven by a tendency of the brain to minimize discrepancies between the senses about objects or events that normally belong together. For the prism case, it is the position of where the hand is seen and felt. Nowadays, it is also known that the least reliable source is adjusted toward the more reliable one (Ernst and Banks 2002; Ernst et al. 2000; Ernst and Bulthoff 2004). The first evidence of recalibration in the temporal domain came from two studies with very similar designs: an exposure–test paradigm. Both Fujisaki et al. (2004) and Vroomen et al. (2004) first exposed observers to a train of sounds and light flashes with a constant but small intersensory interval, and then tested them by using an audiovisual TOJ or SJ task. The idea was that observers would adapt to small audiovisual lags in such a way that the adapted lag is eventually perceived as synchronous. Therefore, after a light-first exposure, light-first trials would be perceived as synchronous, and after a sound-first exposure, a sound-first stimulus would be perceived as synchronous (see Figure 9.3). Both studies indeed observed that the PSS was shifted in the direction of the exposure lag. For example, Vroomen and Keetels exposed subjects for ~3 min to a sequence of sound bursts/ light flashes with audiovisual lags of either ±100 or ±200 ms (sound-first or light-first). During the test, the PSS was shifted, on average, by 27 and 18 ms (PSS difference between sound-first and light-first) for the SJ and TOJ tasks, respectively. Fujisaki et al. used slightly bigger lags (±235 ms sound-first or light-first) and found somewhat bigger shifts in the PSS (59 ms shifts of the PSS in SJ and 51 ms in TOJ), but data were, in essence, comparable. Many others have reported similar effects (Asakawa et al. 2009; Di Luca et al. 2007; Hanson et al. 2008; Keetels and Vroomen 2007, 2008b; Navarra et al. 2005, 2007, 2009; Stetson et al. 2006; Sugano et al. 2010; Sugita and Suzuki 2003; Takahashi et al. 2008; Tanaka et al. 2009a; Yamamoto et al. 2008). The mechanism underlying temporal recalibration, though, remains elusive at this point. One option is that there is a shift in the criterion for simultaneity in the adapted modalities (Figure 9.2, panel 3a). After exposure to light-first pairings, participants may thus change their criterion for audiovisual simultaneity in such a way that light-first stimuli are taken to be simultaneous. On this view, other modality-pairings (e.g., vision–touch) would be unaffected and the change in criterion should then not affect unimodal processing of visual and auditory stimuli presented in isolation. Another strong prediction is that stimuli that were once synchronous, before adaptation, can become asynchronous after adaptation. The most dramatic case of this phenomenon can be found in motor–visual adaptation. In a study by Eagleman and Holcombe (2002), participants were asked to repeatedly tap their finger on a key, and after each key tap, a delayed flash was presented. If the visual flash occurred at an unexpectedly short delay after the tap (or synchronous), it was actually perceived as occurring before the tap, an experience that runs against the law of causality. It may also be the case that one modality (vision, audition, or touch) is “shifted” toward the other, possibly because the sensory threshold for stimulus detection in one of the adapted modalities is
162
The Neural Bases of Multisensory Processes (a) Exposure lag –100 ms
100 ms AV-lag Time
Exposure lag 0 ms
Exposure lag 100 ms
(b) Exposure lag –100 ms
100 ms TV-lag Time
= Visual stimulus = Sound = Vibro-tactile stimulus
FIGURE 9.3 Schematic illustration of exposure conditions typically used in a temporal recalibration paradigm. During exposure, participants are exposed to a train of auditory–visual (AV) or tactile–visual (TV) stimulus pairs (panels a and b, respectively) with a lag of –100, 0, or +100 ms. To explore possible shifts in perceived simultaneity or sensitivity to asynchrony, typically a TOJ or SJ task is performed in a subsequent test phase. (From Fujisaki, W. et al., Nat. Neurosci., 7, 773–8, 2004; Vroomen, J. et al., Cogn. Brain Res., 22, 32–5, 2004; Keetels, M., Vroomen, J., Percept. Psychophys., 70, 765–71, 2008; Keetels, J., Vroomen, M., Neurosci. Lett., 430, 130–4, 2008. With permission.)
changed (see Figure 9.2, panel 3b). For example, as an attempt to perceive simultaneity during lightfirst exposure, participants might delay processing time in the visual modality by adopting a more stringent criterion for sensory detection of visual stimuli. After exposure to light-first audiovisual pairings, one might then expect slower processing times of visual stimuli in general, and other modality pairings that involve the visual modality, say vision–touch, would then also be affected. Two strategies have been undertaken to explore the mechanism underlying temporal recalibration. The first is to examine whether temporal recalibration generalizes to other stimuli within the adapted modalities, the second is to examine whether temporal recalibration affects different modality pairings than the ones adapted. Fujisaki et al. (2004) have already demonstrated that the effect of adaptation in temporal misalignment was effective even when the visual test stimulus was very different from the exposure situation. The authors exposed observers to asynchronous toneflash stimulus pairs and later tested them on the “stream/bounce” illusion (Sekuler et al. 1997). Fujisaki et al. reported that the optimal delay for obtaining a bounce percept in the stream/bounce illusion was shifted in the same direction as the adapted lag. Furthermore, after exposure to a “walldisplay,” in which tones were timed with a ball bouncing off the inner walls of a square, similar shifts in the PSS on the bounce percept were found (a ~45 ms difference when comparing the PSS of the –235 ms sound-first exposure with the +235 ms vision-first exposure). Audiovisual temporal recalibration thus generalized well to other visual stimuli. Navarra et al. (2005) and Vatakis et al. (2008b) also tested generalization for audiovisual temporal recalibration using stimuli from different domains (speech/nonspeech). Their observers had to monitor a continuous speech stream for target words that were presented either in synchrony with the video of a speaker, or with the audio stream lagging 300 ms behind. During the monitoring
Perception of Synchrony between the Senses
163
task, participants performed a TOJ (Navarra et al. 2005; Vatakis et al. 2007) or SJ task (Vatakis et al. 2008b) on simple flashes and white noise bursts that were overlaid on the video. Their results showed that sensitivity, rather than a shift in the PSS, became worse if subjects were exposed to desynchronized rather than synchronized audiovisual speech. Similar effects (larger JNDs) were found with music stimuli. This led the authors to conclude that the “window of temporal integration” was widened (see Figure 9.2, panel 3c) because of asynchronous exposure (see also Navarra et al. 2007 for effects on JND after adaptation to asynchronous audio–tactile stimuli). The authors argued that this effect on the JND may reflect an initial stage of recalibration in which a more lenient criterion is adopted for simultaneity. With prolonged exposure, subjects may then shift the PSS. An alternative explanation—also considered by the authors, but rejected—might be that subjects became confused by the nonmatching exposure stimuli, which as a result may also affect the JND rather than the PSS because it adds noise to the distribution. The second way to study the underlying mechanisms of temporal recalibration is to examine whether temporal recalibration generalizes to different modality pairings. Hanson et al. (2008) explored whether a “supramodal” mechanism might be responsible for the recalibration of multisensory timing. They examined whether adaptation to audiovisual, audio–tactile, and tactile–visual asynchronies (10 ms flashes, noise bursts, and taps on the left index finger) generalized across modalities. The data showed that a brief period of repeated exposure to ±90 ms asynchrony in any of these pairings resulted in shifts of about 70 ms of the PSS on subsequent TOJ tasks, and that the size and nature of the shifts were very similar across all three pairings. This made them conclude that there is a “general mechanism.” Opposite conclusions though, were reached by Harrar and Harris (2005). They exposed participants for 5 min to audiovisual pairs with a fixed time lag (250 ms light-first), but did not obtain shifts in the PSSs for touch–light pairs. In an extension of this topic (Harrar and Harris 2008), observers were exposed for 5 min to ~100 ms lags of light-first stimuli for the audiovisual case, and touch-first stimuli for the auditory–tactile and visual–tactile case. Participants were tested on each of these pairs before and after exposure. Shifts of the PSS in the predicted direction were only found in the audiovisual exposure–test stimuli, but not for the other cases. Di Luca et al. (2007) also exposed participants to asynchronous audiovisual pairs (~200 ms lags of sound-first and light-first) and measured the PSS for audiovisual, audio–tactile, and visual– tactile test stimuli. Besides obtaining a shift in the PSS for audiovisual pairs, the effect was found to generalize to audio–tactile, but not to visual–tactile test pairs. This pattern made the authors conclude that adaptation resulted in a phenomenal shift of the auditory event (Di Luca et al. 2007). Navarra et al. (2009) also recently reported that the auditory rather than visual modality is more flexible. Participants were exposed to synchronous or asynchronous audiovisual stimuli (224 ms vision-first, or 84 ms auditory-first for 5 min of exposure) after which they performed a speeded reaction time task on unimodal visual or auditory stimuli. In contrast with the idea that visual stimuli get adjusted in time to the relatively more accurate auditory stimuli (Hirsh and Sherrick 1961; Shipley 1964; Welch 1999; Welch and Warren 1980), their results seemed to show the opposite, namely, that auditory rather than visual stimuli were shifted in time. The authors reported that simple reaction times to sounds became approximately 20 ms faster after vision-first exposure and about 20 ms slower after auditory-first exposure, whereas simple reaction times for visual stimuli remained unchanged. They explained this finding by alluding to the idea that visual information can serve as the temporal anchor because it is a more exact estimate of the time of occurrence of a distal event rather than auditory information because light travel time does not depend on distance. Further research is needed, however, to examine whether a change in simple reaction times is truly reflective of a change in the timing of that event, as there is quite some evidence showing that the two do not always go hand-in-hand (e.g., reaction times are more affected by variations in intensity than TOJs; Jaskowski and Verleger 2000; Neumann and Niepel 2004). To summarize, until now, there is no clear explanation for the mechanism underlying temporal recalibration as there is some discrepancy in the data regarding generalization across modalities. It seems safe to conclude that the audiovisual exposure–test situation is the most reliable one to obtain
164
The Neural Bases of Multisensory Processes
a shift in the PSS. Arguably, audiovisual pairs are more flexible because the brain has to correct for timing differences between auditory and visual stimuli because of naturally occurring delays caused by distance. Tactile stimuli might be more rigid in time because visual–tactile and audio– tactile events always occur at the body surface, so less compensation for latency differences might be required here. As already mentioned above, a widening of the JND, rather than a shift in the PSS, has also been observed and it might possibly reflect an initial stage of recalibration in which a more lenient criterion about simultaneity is adopted. The reliability of each modality on its own is also likely to play a role. For visual stimuli, it is known that they are less reliable in time than auditory or tactile stimuli (Fain 2003), and as a consequence they may be more malleable (Ernst and Banks 2002; Ernst et al. 2000; Ernst and Bulthoff 2004), but there is also evidence that the auditory modality is, in fact, shifted.
9.5.4 Temporal Ventriloquism The fourth possibility of how the brain might deal with lags between the senses, and how they may get unnoticed, is that the perceived timing of a stimulus in one modality is actively shifted toward the other (see Figure 9.2, panel 4). This phenomenon is also known as “temporal ventriloquism,” and it is named in analogy with the spatial ventriloquist effect. For spatial ventriloquism, it was already known for a long time that listeners who heard a sound while seeing a spatially displayed flash had the (false) impression that the sound originated from the flash. This phenomenon was named the “ventriloquist illusion” because it was considered a stripped-down version of what the ventriloquist was doing when performing on stage. The temporal ventriloquist effect is analogous to the spatial variant, except that here, sound attracts vision in the time dimension rather than vision attracting sound in the spatial dimension. There are, by now, many demonstrations of this phenomenon, and we describe several in subsequent paragraphs. They all show that small lags between sound and vision go unnoticed because the perceived timing of visual events is flexible and is attracted toward events in other modalities. Scheier et al. (1999) were one of the first to demonstrate temporal ventriloquism using a visual TOJ task (see Figure 9.4). Observers were presented with two lights at various SOAs, one above and one below a fixation point, and their task was to judge which light came first (the upper or the lower). To induce temporal ventriloquism, Scheier et al. added two sounds that could either be presented before the first and after the second light (condition AVVA), or the sounds could be presented in between the two lights (condition VAAV). Note that they used a visual TOJ task, and that sounds were task-irrelevant. The results showed that observers were more sensitive (i.e., smaller intervals were still perceived correctly) in the AVVA condition compared to the VAAV condition (visual JNDs were approximately 24 and 39 ms, respectively). Presumably, the two sounds attracted the temporal occurrence of the two lights, and thus, effectively pulled the lights farther apart in the AVVA condition, and closer together in the VAAV condition. In single-sound conditions, AVV and VVA, sensitivity was not different from a visual-only baseline, indicating that the effects were not because of the initial sound acting as a warning signal, or some cognitive factor related to the observer’s awareness of the sounds. Morein-Zamir et al. (2003) replicated these effects and further explored the sound–light intervals at which the effect occurred. Sound–light intervals of ~100 to ~600 ms were tested, and it was shown that the second sound was mainly responsible for the temporal ventriloquist effect up to a sound–light interval of 200 ms, whereas the interval of the first sound had little effect. The results were also consistent with earlier findings of Fendrich and Corballis (2001) who used a paradigm in which participants judged when a flash occurred by reporting the clock position of a rotating marker. The repeating flash was seen earlier when it was preceded by a click and later when the click lagged the visual stimulus. Another demonstration of temporal ventriloquism using a different paradigm came from a study by Vroomen and de Gelder (2004b). Here, temporal ventriloquism was demonstrated using the flash-lag effect (FLE). In the typical FLE (Mackay 1958;
165
Perception of Synchrony between the Senses (a) 0 ms AV Interval
100 ms AV Interval
Actual SOA
‘Perceived’ SOA Time
(b) 0 ms TV Interval
100 ms TV Interval
= Visual stimulus = Sound = Vibro-tactile stimulus
FIGURE 9.4 A schematic illustration of conditions typically used to demonstrate auditory–visual temporal ventriloquism (panel a) and tactile–visual temporal ventriloquism (panel b). The first capturing stimulus (i.e., either a sound or a vibro–tactile stimulus) precedes the first light by 100 ms, whereas the second capturing stimulus trails the second light by 100 ms. Baseline condition consists of presentation of two capturing stimuli simultaneous with light onsets. Temporal ventriloquism is typically shown by improved visual TOJ sensitivity when capture stimuli are presented with a 100-ms interval. (From Scheier, C.R. et al., Invest. Ophthalmol. Vis. Sci., 40, 4169, 1999; Morein-Zamir, S. et al., Cogn. Brain Res., 17, 154–63, 2003; Vroomen, J., Keetels, M., J. Exp. Psychol. Hum. Percept. Perform., 32, 1063–71, 2006; Keetels, M. et al., Exp. Brain Res., 180, 449–56, 2007; Keetels, M., Vroomen, J., Percept. Psychophys., 70, 765–71, 2008, Keetels, M., Vroomen, J., Neurosci. Lett., 430, 130–4, 2008. With permission.)
Nijhawan 1994, 1997, 2002), a flash appears to lag behind a moving visual stimulus even though the stimuli are presented at the same physical location. To induce temporal ventriloquism, Vroomen and de Gelder added a single click presented slightly before, at, or after the flash (intervals of 0, 33, 66, and 100 ms). The results showed that the sound attracted the temporal onset of the flash and shifted it in the order of ~5%. A sound ~100 ms before the flash thus made the flash appear ~5 ms earlier, and a sound 100 ms after the flash made the flash appear ~5 ms later. A sound, including the synchronous one, also improved sensitivity on the visual task because JNDs on the visual task were better if a sound was present rather than absent. Yet another recent manifestation of temporal ventriloquism used an apparent visual motion paradigm. Visual apparent motion occurs when a stimulus is flashed in one location and is followed by another identical stimulus flashed in another location (Korte 1915). Typically, an illusory movement is observed that starts at the lead stimulus and is directed toward the second lagging stimulus (the strength of the illusion depends on the exposure time of the stimuli, and the temporal and spatial separation between them). Getzmann (2007) explored the effects of irrelevant sounds on this motion illusion. In their study, two temporally separated visual stimuli (SOAs ranged from 0 to 350 ms) were presented and participants classified their impression of motion using a categorization system. The results demonstrated that sounds intervening between the visual stimuli facilitated the impression of apparent motion relative to no sounds, whereas sounds presented before the first and after the second visual stimulus reduced motion perception (see Bruns and Getzmann 2008 for similar results). The idea was that because exposure time and spatial separation were both held constant in this study, the impression of apparent motion was systematically affected by the perceived length of the interstimulus interval. The effect was explained in terms of temporal ventriloquism, as sounds attracted the illusory onset of visual stimuli. Freeman and Driver (2008) investigated whether the timing of a static sound could influence spatiotemporal processing of visual apparent motion. Apparent motion was induced by visual stimuli
166
The Neural Bases of Multisensory Processes
alternating between opposite hemifields. The perceived direction typically depends on the relative timing interval between the left–right and right–left flashes (e.g., rightward motion dominating when left–right interflash intervals are shortest; von Grunau 1986). In their study, the interflash intervals were always 500 ms (ambiguous motion), but sounds could slightly lead the left flash and lag the right flash by 83 ms or vice versa. Because of temporal ventriloquism, this variation made visual apparent motion depend on the timing of the sound stimuli (e.g., more rightward responses if a sound preceded the left flash, and lagged the right flash, and more leftward responses if a sound preceded the right flash, and lagged the left flash). The temporal ventriloquist effect has also been used as a diagnostic tool to examine whether commonality in space is a constraint on intersensory pairing. Vroomen and Keetels (2006) adopted the visual TOJ task of Scheier et al. (1999) and replicated that sounds improved sensitivity in the AVVA version of the visual TOJ task. Importantly, the temporal ventriloquist effect was unaffected by whether sounds and lights were colocated or not. For example, the authors varied whether the sounds came from a central location or a lateral one, whether the sounds were static or moving, and whether the sounds and lights came from the same or different sides of fixation at either small or large spatial disparities. All these variations had no effect on the temporal ventriloquist effect, despite that discordant sounds were shown to attract reflexive spatial attention and to interfere with speeded visual discrimination. These results made the author conclude that intersensory interactions in general do not require spatial correspondence between the components of the cross-modal stimuli (see also Keetels et al. 2007). In another study (Keetels and Vroomen 2008a), it was explored whether touch affects vision on the time dimension as audition does (visual–tactile ventriloquism), and whether spatial disparity between the vibrator and lights modifies this effect. Given that tactile stimuli are spatially better defined than tones because of their somatotopic rather than tonotopic initial coding, this study provided a strong test case for the notion that spatial co-occurrence between the senses is required for intersensory temporal integration. The results demonstrated that tactile–visual stimuli behaved like audiovisual stimuli, in that temporally misaligned tactile stimuli captured the onsets of the lights and spatial discordance between the stimuli did not harm this phenomenon. Besides exploring whether spatial disparity affects temporal ventriloquism, the effect of synesthetic congruency between modalities was also recently explored (Keetels and Vroomen 2010; Parise and Spence 2008). Parise and Spence (2008) suggested that pitch size synesthetic congruency (i.e., a natural association between the relative pitch of a sound and the relative size of a visual stimulus) might affect temporal ventriloquism. In their study, participants made visual TOJs about small-sized and large-sized visual stimuli whereas high-pitched or low-pitched tones were presented before the first and after the second light. The results showed that, at large sound–light intervals, sensitivity for visual temporal order was better for synesthetically congruent than incongruent pairs. In a more recent study, Keetels and Vroomen (2010) reexamined this effect and showed that this congruency effect could not be attributed to temporal ventriloquism, as it disappeared at short sound–light intervals if compared to a synchronous AV baseline condition that excludes response biases. In addition, synesthetic congruency did not affect temporal ventriloquism even if participants were made explicitly aware of congruency before testing, challenging the view that synesthetic congruency affects temporal ventriloquism. Stekelenburg and Vroomen (2005) also investigated the time course and the electrophysiological correlates of the audiovisual temporal ventriloquist effect using ERPs in the FLE. Their results demonstrated that the amplitude of the visual N1 was systematically affected by the temporal interval between the visual target flash and the task-irrelevant sound in the FLE paradigm (Mackay 1958; Nijhawan 1994, 1997, 2002). If a sound was presented in synchrony with the flash, the N1 amplitude was larger than when the sound lagged the visual stimulus, and it was smaller when the sound lead the flash. No latency shifts, however, were found. Yet, based on the latency of the crossmodal effect (N1 at 190 ms) and its localization in the occipitoparietal cortex, this study confirmed the sensory nature of temporal ventriloquism. An explanation for the absence of a temporal shift of
Perception of Synchrony between the Senses
167
the ERP components may lie in the small size of the temporal ventriloquist effect found (3 ms). Such a small temporal difference may not be reliably reflected in the ERPs because it reaches the lower limit of the temporal resolution of the sampled EEG. In most of the studies examining temporal ventriloquism (visual TOJ, FLE, reporting clock position or motion direction), the timing of the visual stimulus is the task-relevant dimension. Although recently, Vroomen and Keetels (2009) explored whether a temporally offset sound could improve the identification of a visual stimulus whereas temporal order is not involved. In this study, it was examined whether four-dot masking was affected by temporal ventriloquism. In the four-dot masking paradigm, visual target identification is impaired when a briefly presented target is followed by a mask that consists of four dots that surround but do not touch the visual target (Enns 2004; Enns and DiLollo 1997, 2000). The idea tested was that a sound presented slightly before the target and slightly after the mask might lengthen the perceived interval between target and mask. By lengthening the perceived target–mask interval, there is more time for the target to consolidate, and in turn target identification should be easier. Results were in line with this hypothesis as a small release from four-dot masking was reported (1% improvement, which corresponds to an increase of the target–mask ISI of 4.4 ms) if two sounds were presented at approximately 100-ms intervals before the target and after the mask, rather than if only a single sound was presented before the target or a silent condition. To summarize, there are by now many demonstrations that vision is flexible on the time dimension. In general, the perceived timing of a visual event is attracted toward other events in audition and touch, provided that the lag between them is less than ~200 ms. The deeper reason why there is this mutual attraction is still untested. Although in our view, it serves to reduce natural lags between the senses so that they become unnoticed, thus maintaining coherence between the senses. If so, one can ask what the relationship is between temporal ventriloquism and temporal recalibration. Despite the fact that occurs immediately when a temporal asynchrony is presented, whereas temporal recalibration manifests itself as an aftereffect, both effects are explained as perceptual solutions to maintain intersensory synchrony. The question can then be asked whether the same mechanism underlies the two phenomena. At first sight, one might argue that the magnitude of the temporal ventriloquist effect seems smaller than the temporal recalibration effects (temporal ventriloquism: Morein-Zamir et al. 2003, ~15 ms JND improvement; Scheier et al. 1999, 15 ms JND improvement; Vroomen and Keetels 2006, ~6 ms JND improvement; temporal recalibration: Fujisaki et al. 2004, ~30 ms PSS shifts for 225 ms adaptation lags; Hanson et al. 2008, ~35 ms PSS shifts for 90 ms adaptation lags; Navarra et al. 2009, ~20 ms shifts in reaction times; although relatively small effects were found by Vroomen et al. 2004, ~8 ms PSS shifts for 100 ms adaptation lags). However, these magnitudes cannot be compared directly because the temporal ventriloquist effect refers to an improvement in JNDs, whereas the temporal recalibration effect is typically a shift of the PSS. Moreover, in studies measuring temporal recalibration, there is usually much more exposure to temporal asynchronies than in studies measuring temporal ventriloquism. Therefore, it remains up to future studies to examine whether the mechanisms that are involved in temporal ventriloquism and temporal recalibration are the same.
9.6 TEMPORAL SYNCHRONY: AUTOMATIC OR NOT? An important property about the perception of intersensory synchrony is to know whether it is perceived in an automatic fashion or not. As is often the case, there are two opposing views on this issue. Some have reported that the detection of temporal alignment is a slow, serial, and attentiondemanding process, whereas others have argued that it is fast and only requires a minimal amount of attention that is needed to perceive the visual stimulus, but once this criterion is met, audiovisual or visual–tactile integration comes for free. An important signature of automatic processing is that the stimulus in question is salient and “pops out.” If so, the stimulus is easy to find among distracters. What about intersensory synchrony:
168
The Neural Bases of Multisensory Processes
does it “pop out”? In a study by van de Par and Kohlrausch (2004), this question was addressed by presenting observers a visual display of a number of independently moving circles moving up and down along a Gaussian profile. Along with the motion display, a concurrent sound was presented in which amplitude was modulated coherently with one of the circles. The participant’s task was to identify the coherently moving visual circle as quickly as possible. The authors found that response times increased approximately linearly with the numbers of distracters (~500 ms/distracter), indicating a slow serial search process rather than pop-out. Fujisaki et al. (2006) came to similar conclusions. They examined search functions for a visual target that changed in synchrony with an auditory stimulus. The visual display consisted of two, four, or eight luminance-modulated Gaussian blobs presented at 5, 10, 20, and 40 Hz that were accompanied by a white noise sound whose amplitude was modulated in synch with one of the visual stimuli. Other displays contained clockwise/counterclockwise rotations of windmills synchronized with a sound whose frequency was modulated up or down at a rate of 10 Hz. The observers’ task was to indicate which visual stimulus was luminance-modulated in synch with the sound. Search functions for both displays were slow (~1 s/distractor in target-present displays), and increased linearly with the number of visual distracters. In a control experiment, it was also shown that synchrony discrimination was unaffected by the presence of distractors if attention was directed at the visual target. Fujisaki et al. therefore concluded that perception of audiovisual synchrony is a slow and serial process based on a comparison of salient temporal features that need to be individuated from within-modal signal streams. Others, though, came to quite opposing conclusions and found that intersensory synchrony can be detected in an automatic fashion. Most notably, van der Burg et al. (2008b) reported an interesting study in which they showed that a simple auditory pip can drastically reduce search times for a color-changing object that is synchronized with the pip. The authors presented a horizontal or vertical target line among a large array of oblique lines. Each of the lines (target and distracters) changed color from green-to-red or red-to-green in a random fashion. If a pip sound was synchronized with a color change, visual attention was automatically drawn to the location of the line that changed color. When the sound was synchronized with the color change of the target, search times improved drastically and the number of irrelevant distracters had virtually no effect on search times (a nearly flat slope indicating pop-out). The authors concluded that the temporal information of the auditory signal was integrated with the visual signal generating a relatively salient emergent feature that automatically draw spatial attention (see also van der Burg et al. 2008a). Similar effects were also demonstrated for tactile stimuli instead of auditory pips (Olivers and van der Burg 2008; van der Burg et al. 2009). Kanai et al. (2007) also explored temporal correspondences in visually ambiguous displays. They presented multiple disks flashing sequentially at one of eight locations in a circle, thus inducing the percept of a disk revolving around fixation. A sound was presented at one particular location in every cycle, and participants had to indicate the disk that was temporally aligned with the sound. The disk seen as being synchronized with the sound was perceived as brighter with a sharper onset and offset (Vroomen and de Gelder 2000). Moreover, it fluctuated over time and its position changed every 5 to 10 s. Kanai et al. explored whether this flexibility was dependent on attention by having observers perform a concurrent task in which they had to count the number of X’s in a letter stream. The results demonstrated that the transitions disappeared whenever attention was distracted from the stimulus. On the other hand, if attention was directed to one particular visual event—either by making it “pop-out” by a using a different color, by presenting a cue next to the target dot, or by overtly cueing it—the perceived timing of the sound was attracted toward that event. These results thus suggest that perception of intersensory synchrony is flexible, and is not completely immune to attention. These opposing views on the role of attention can be reconciled on the assumption that perception of synchrony depends on a matching process of salient temporal features (Fujisaki et al. 2006; Fujisaki and Nishida 2007). Saliency may be lost when stimuli are presented at fast rates (typically above 4 Hz), when perceptually grouped into other streams, or if they lack a sharp transition
Perception of Synchrony between the Senses
169
(Keetels et al. 2007; Sanabria et al. 2004; Vroomen and de Gelder 2004a; Watanabe and Shimojo 2001). In line with this notion, studies reporting that audiovisual synchrony detection is slow, either presented stimuli at fast rates (>4 Hz up to 80/s) or the stimuli lacked a sharp onset/offset (e.g., van de Par, using a Gaussian amplitude modulation). Others reporting automatic detection of auditory– visual synchrony used much slower rates (1.11 Hz; van der Burg et al. 2008b) and sharp transitions (a pip).
9.7 NEURAL SUBSTRATES OF TEMPORAL SYNCHRONY Although temporal correspondence is frequently considered one of the most important constraints on cross-modal integration (e.g., Bedford 1989; Bertelson 1999; Radeau 1994; Stein and Meredith 1993; Welch 1999; Welch and Warren 1980), the neural correlates for the ability to detect and use temporal synchrony remain largely unknown. Most likely, however, a whole network is involved. Seminal studies examining the neural substrates of intersensory temporal correspondence were done in animals. The finding that the firing rate of a subsample of cells in the superior colliculi (SC) increases dramatically and more than what can be expected by summing the unimodal impulses when auditory (tones) and visual stimuli (flashes) occur in close temporal and spatial proximity (Meredith et al. 1987; Stein et al. 1993) is well-known. More recently, Calvert et al. (2001) used functional magnetic resonance imaging (fMRI) on human subjects for studying brain areas that demonstrate facilitation and suppression effects in the blood oxygenation level–dependent (BOLD) signal for temporally aligned and temporally misaligned audiovisual stimuli. Their stimulus consisted of a reversing checkerboard pattern of alternating black and white squares with sounds presented either simultaneously with the onset of a reversal (synchronous condition) or a randomly phase-shifted asynchronous condition. The results showed an involvement of the SC as its response was superadditive for temporally matched stimuli and depressed for the temporally mismatched ones. Other cross-modal interactions were also identified in a network of cortical brain areas that included several frontal cortical sites; the right inferior frontal gyrus, multiple sites within the right lateral sulcus, and the ventromedial frontal gyrus. Furthermore, response enhancement and depression was observed in the insula bilaterally, right superior parietal lobule, right inferior parietal sulcus, left superior occipital gyrus, and left superior temporal sulcus (STS). Bushara et al. (2001) examined the effect of temporal asynchrony in a positron emission tomography study. Here, observers had to decide whether a colored circle was presented simultaneously with a tone or not. The stimulus pairs could either be auditory-first (AV) or vision-first (VA) at three levels of SOAs that varied in difficulty. A control condition (C) was included in which the auditory and visual stimuli were presented simultaneously, and in which participants performed a visual color discrimination task whenever a sound was present. The brain areas involved in auditory–visual synchrony detection were identified by subtracting the activity of the control condition from that in the asynchronous conditions (AV-C and VA-C). Results revealed a network of heteromodal brain areas that included the right anterior insula, the right ventrolateral prefrontal cortex, right inferior parietal lobe, and left cerebellar hemisphere. The activity in the areas that correlated positively with decreasing asynchrony revealed a cluster within the right insula, suggesting that this region is most important for the detection of auditory–visual synchrony. Given that interactions were also found between the insula, the posterior thalamus, and the SC, it was suggested that intersensory temporal processing is mediated via subcortical tecto-thalamo-insula pathways. In a positron emission tomography study by Macaluso et al. (2004), subjects were looking at a video monitor that showed a face mouthing words. In different blocks of trials, the audiovisual signals were either presented synchronously or asynchronously (the auditory stimulus was leading by a clearly noticeable 240 ms). In addition, the visual and auditory sources were either presented at the same location or in opposite hemifields. Results showed that activity in ventral occipital areas and left STS increased during synchronous audiovisual speech, regardless of the relative location of the auditory and visual input.
170
The Neural Bases of Multisensory Processes
More recently, in an fMRI study, Dhamala et al. (2007) examined the networks that are involved in the perception of physically synchronous versus asynchronous audiovisual events. Two timing parameters were varied: the SOA between sound and light (–200 to +200 ms) and the stimulation rate (0.5–3.5 Hz). In the behavioral task, observers had to report whether stimuli were perceived as simultaneous, sound-first, light-first, or “Can’t tell,” resulting in the classification of three distinct perceptual states, that is, the perception of synchrony, asynchrony, and “no clear perception.” The fMRI data showed that each of these stages involved activation in different brain networks. Perception of asynchrony activated the primary sensory, prefrontal, and inferior parietal cortices, whereas perception of synchrony disengaged the inferior parietal cortex and further recruited the SC. An fMRI study by Noesselt et al. (2007) also explored the effect of temporal correspondence between auditory and visual streams. The stimuli were arranged such that auditory and visual streams were temporally corresponding or not, using irregular and arrhythmic temporal patterns that either matched between audition and vision or mismatched substantially whereas maintaining the same overall temporal statistics. For the coincident audiovisual streams, there was an increase in the BOLD response in multisensory STS contralateral to the visual stream. The contralateral primary visual and auditory cortex were also found to be affected by the synchrony–asynchrony manipulations, and a connectivity analysis indicated enhanced influence from mSTS on primary sensory areas during temporal correspondence. In an EEG paradigm, Senkowski et al. (2007) examined the neural mechanisms underlying intersensory synchrony by measuring oscillatory gamma-band responses (GBRs; 30–80 Hz). Oscillatory GBRs have been linked to feature integration mechanisms and to multisensory processing. The authors reasoned that GBRs might also be sensitive to the temporal alignment of intersensory stimulus components. The temporal synchrony of auditory and visual components of a multisensory signal was varied (tones and horizontal gratings with SOAs ranging from –125 to +125 ms). The GBRs to the auditory and visual components of multisensory stimuli were extracted for five subranges of asynchrony and compared with GBRs to unisensory control stimuli. The results revealed that multisensory interactions were strongest in the early GBRs when the sound and light stimuli were presented with the closest synchrony. These effects were most evident over medial– frontal brain areas after 30 to 80 ms and over occipital areas after 60 to 120 ms, indicating that temporal synchrony may have an effect on early intersensory interactions in the human cortex. Overall, it should be noted that there is a lot of variation in the outcomes of studies that have examined the neural basis of intersensory temporal synchrony. At present, the issue is far from resolved and more research has to be performed to unravel the exact neural substrates underlying it. The overall finding is that the SC and mSTS are repeatedly reported in intersensory synchrony detection studies, which at least suggests a prominent role for these structures in the processing of intersensory stimuli based on their temporal correspondence. For the time being, however, it is unknown how these areas would affect the perception of intersensory synchrony if they were damaged or temporarily blocked by, for example, transcranial magnetic stimulation.
9.8 CONCLUSIONS In recent years, a substantial amount of research has been devoted to understanding how the brain handles lags between the senses. The most important conclusion we draw is that intersensory timing is flexible and adaptive. The flexibility is clearly demonstrated by studies showing one or another variant of temporal ventriloquism. In that case, small lags go unnoticed because the brain actively shifts one information stream (usually vision) toward the other, possibly to maintain temporal coherence. The adaptive part rests on studies of temporal recalibration demonstrating that observers are flexible in adopting what counts as synchronous. The extent to which temporal recalibration generalizes to other stimuli and domains, however, remains to be further explored. The idea that the brain compensates for predictable variability between the senses—most notably distance—is, in
Perception of Synchrony between the Senses
171
our view, not well-founded. We are more enthusiastic about the notion that intersensory synchrony is perceived mostly in an automatic fashion, provided that the individual components of the stimuli are sufficiently salient. The neural mechanisms that underlie this ability are of clear importance for future research.
REFERENCES Alais, D., and S. Carlile. 2005. Synchronizing to real events: Subjective audiovisual alignment scales with perceived auditory depth and speed of sound. Proceedings of the National Academy of Sciences of the United States of America 102(6);2244–7. Arnold, D.H., A. Johnston, and S. Nishida. 2005. Timing sight and sound. Vision Research 45(10);1275–84. Arrighi, R., D. Alais, and D. Burr. 2006. Perceptual synchrony of audiovisual streams for natural and artificial motion sequences. Journal of Vision 6(3);260–8. Asakawa, K., A. Tanaka, and H. Imai. 2009. Temporal Recalibration in Audio-Visual Speech Integration Using a Simultaneity Judgment Task and the McGurk Identification Task. Paper presented at the 31st Annual Meeting of the Cognitive Science Society (July 29–August 1, 2009). Amsterdam, The Netherlands. Bald, L., F.K. Berrien, J.B. Price, and R.O. Sprague. 1942. Errors in perceiving the temporal order of auditory and visual stimuli. Journal of Applied Psychology 26;283–388. Bedford, F.L. 1989. Constraints on learning new mappings between perceptual dimensions. Journal of Experimental Psychology. Human Perception and Performance 15(2);232–48. Benjamins, J.S., M.J. van der Smagt, and F.A. Verstraten. 2008. Matching auditory and visual signals: Is sensory modality just another feature? Perception 37(6);848–58. Bertelson, P. 1994. The cognitive architecture behind auditory-visual interaction in scene analysis and speech identification. Cahiers de Psychologie Cognitive 13(1);69–75. Bertelson, P. 1999. Ventriloquism: A case of crossmodal perceptual grouping. In G. Aschersleben, T. Bachmann, and J. Musseler (eds.), Cognitive Contributions to the Perception of Spatial and Temporal Events, 347–63. North-Holland: Elsevier. Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic Bulletin & Review 5(3);482–89. Bertelson, P., and G. Aschersleben. 2003. Temporal ventriloquism: Crossmodal interaction on the time dimension: 1. Evidence from auditory–visual temporal order judgment. International Journal of Psychophysiology 50(1–2);147–55. Boenke, L.T., M. Deliano, and F.W. Ohl. 2009. Stimulus duration influences perceived simultaneity in audiovisual temporal-order judgment. Experimental Brain Research 198(2–3);233–44. Bronkhorst, A.W. 1995. Localization of real and virtual sound sources. Journal of the Acoustical Society of America 98(5);2542–53. Bronkhorst, A.W., and T. Houtgast. 1999. Auditory distance perception in rooms. Nature 397;517–20. Bruns, P., and S. Getzmann. 2008. Audiovisual influences on the perception of visual apparent motion: Exploring the effect of a single sound. Acta Psychologica 129(2);273–83. Bushara, K.O., J. Grafman, and M. Hallett. 2001. Neural correlates of auditory-visual stimulus onset asynchrony detection. Journal of Neuroscience 21(1);300–4. Calvert, G., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage 14(2);427–38. Calvert, G., C. Spence, and B. Stein. 2004. The Handbook of Multisensory Processes. Cambridge, MA: The MIT Press. Colin, C., M. Radeau, P. Deltenre, and J. Morais. 2001. Rules of intersensory integration in spatial scene analysis and speechreading. Psychologica Belgica 41(3);131–44. Conrey, B., and D.B. Pisoni. 2006. Auditory–visual speech perception and synchrony detection for speech and nonspeech signals. Journal of the Acoustical Society of America 119(6);4065–73. Dhamala, M., C.G. Assisi, V.K. Jirsa, F.L. Steinberg, and J.A. Kelso. 2007. Multisensory integration for timing engages different brain networks. NeuroImage 34(2);764–73. Di Luca, M., T. Machulla, and M.O. Ernst. 2007. Perceived Timing Across Modalities. Paper presented at the International Intersensory Research Symposium 2007: Perception and Action (July 3, 2007). Sydney, Australia. Dinnerstein, A.J., and P. Zlotogura. 1968. Intermodal perception of temporal order and motor skills: Effects of age. Perceptual and Motor Skills 26(3);987–1000. Dixon, N.F., and L. Spitz. 1980. The detection of auditory visual desynchrony. Perception 9(6);719–21.
172
The Neural Bases of Multisensory Processes
Eagleman, D.M., and A.O. Holcombe. 2002. Causality and the perception of time. Trends in Cognitive Sciences 6(8);323–5. Eimer, M., and J. Driver. 2001. Crossmodal links in endogenous and exogenous spatial attention: Evidence from event-related brain potential studies. Neuroscience and Biobehavioral Reviews 25(6);497–511. Eimer, M., and E. Schroger. 1998. ERP effects of intermodal attention and cross-modal links in spatial attention. Psychophysiology 35(3);313–27. Engel, G.R., and W.G. Dougherty. 1971. Visual–auditory distance constancy. Nature 234(5327);308. Enns, J.T. 2004. Object substitution and its relation to other forms of visual masking. Vision Research 44(12);1321–31. Enns, J.T., and V. DiLollo. 1997. Object substitution: A new form of masking in unattended visual locations. Psychological Science 8;135–9. Enns, J.T., and V. DiLollo. 2000. What’s new in visual masking? Trends in Cognitive Sciences 4(9);345–52. Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870);429–33. Ernst, M.O., and H.H. Bulthoff. 2004. Merging the senses into a robust percept. Trends in Cognitive Sciences 8(4);162–9. Ernst, M.O., M.S. Banks, and H.H. Bulthoff. 2000. Touch can change visual slant perception. Nature Neuro science 3(1);69–73. Fain, G.L. 2003. Sensory Transduction. Sunderland, MA: Sinauer Associates. Fendrich, R., and P.M. Corballis. 2001. The temporal cross-capture of audition and vision. Perception & Psychophysics 63(4);719–25. Finger, R., and A.W. Davis. 2001. Measuring Video Quality in Videoconferencing Systems. Technical Report SN187-D. Los Gatos, CA: Pixel Instrument Corporation. Freeman, E., and J. Driver. 2008. Direction of visual apparent motion driven solely by timing of a static sound. Current Biology 18(16);1262–6. Frey, R.D. 1990. Selective attention, event perception and the criterion of acceptability principle: Evidence supporting and rejecting the doctrine of prior entry. Human Movement Science 9;481–530. Fujisaki, W., and S. Nishida. 2005. Temporal frequency characteristics of synchrony–asynchrony discrimination of audio-visual signals. Experimental Brain Research 166(3–4);455–64. Fujisaki, W., and S. Nishida. 2007. Feature-based processing of audio-visual synchrony perception revealed by random pulse trains. Vision Research 47(8);1075–93. Fujisaki, W., S. Shimojo, M. Kashino, and S. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature Neuroscience 7(7);773–8. Fujisaki, W., A. Koene, D. Arnold, A. Johnston, and S. Nishida. 2006. Visual search for a target changing in synchrony with an auditory signal. Proceedings of Biological Science 273(1588);865–74. Getzmann, S. 2007. The effect of brief auditory stimuli on visual apparent motion. Perception 36(7);1089–103. Grant, K.W., V. van Wassenhove, and D. Poeppel. 2004. Detection of auditory (cross-spectral) and auditory– visual (cross-modal) synchrony. Speech Communication 44;43–53. Hanson, J.V., J. Heron, and D. Whitaker. 2008. Recalibration of perceived time across sensory modalities. Experimental Brain Research 185(2);347–52. Harrar, V., and L.R. Harris. 2005. Simultaneity constancy: Detecting events with touch and vision. Experimental Brain Research 166(3–4);465–73. Harrar, V., and L.R. Harris. 2008. The effect of exposure to asynchronous audio, visual, and tactile stimulus combinations on the perception of simultaneity. Experimental Brain Research 186(4);517–24. Heron, J., D. Whitaker, P.V. McGraw, and K.V. Horoshenkov. 2007. Adaptation minimizes distance-related audiovisual delays. Journal of Vision 7(13);51–8. Hillyard, S.A., and T.F. Munte. 1984. Selective attention to color and location: An analysis with event-related brain potentials. Perception & Psychophysics 36(2);185–98. Hirsh, I.J., and P. Fraisse. 1964. Simultaneous character and succession of heterogenous stimuli. L’Année Psychologique 64;1–19. Hirsh, I.J., and C.E. Sherrick. 1961. Perceived order in different sense modalities. Journal of Experimental Psychology 62(5);423–32. Jaskowski, P. 1999. Reaction time and temporal-order judgment as measures of perceptual latency: The problem of dissociations. In G. Aschersleben, T. Bachmann, and J. Müsseler (eds.), Cognitive Contributions to the Perception of Spatial and Temporal Events (pp. 265–82). North-Holland: Elsevier Science B.V. Jaskowski, P., and R. Verleger. 2000. Attentional bias toward low-intensity stimuli: An explanation for the intensity dissociation between reaction time and temporal order judgment? Consciousness and Cognition 9(3);435–56.
Perception of Synchrony between the Senses
173
Jaskowski, P., F. Jaroszyk, and D. Hojan-Jezierska. 1990. Temporal-order judgments and reaction time for stimuli of different modalities. Psychological Research, 52(1);35–8. Jones, J.A., and M. Jarick. 2006. Multisensory integration of speech signals: The relationship between space and time. Experimental Brain Research 174(3);588–94. Jones, J.A., and K.G. Munhall. 1997. The effects of separating auditory and visual sources on the audiovisual integration of speech. Canadian Acoustics 25(4);13–9. Kanai, R., B.R. Sheth, F.A. Verstraten, and S. Shimojo. 2007. Dynamic perceptual changes in audiovisual simultaneity. PLoS ONE 2(12);e1253. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18(7);1560–74. Keetels, M., and J. Vroomen. 2005. The role of spatial disparity and hemifields in audio-visual temporal order judgements. Experimental Brain Research 167;635–40. Keetels, M., and J. Vroomen. 2007. No effect of auditory-visual spatial disparity on temporal recalibration. Experimental Brain Research 182(4);559–65. Keetels, M., and J. Vroomen. 2008a. Tactile–visual temporal ventriloquism: No effect of spatial disparity. Perception & Psychophysics 70(5);765–71. Keetels, M., and J. Vroomen. 2008b. Temporal recalibration to tactile–visual asynchronous stimuli. Neuroscience Letters 430(2);130–4. Keetels, M., and J. Vroomen. 2010. No effect of synesthetic congruency on temporal ventriloquism. Attention, Perception, & Psychophysics 72(4);871–4. Keetels, M., J. Stekelenburg, and J. Vroomen. 2007. Auditory grouping occurs prior to intersensory pairing: Evidence from temporal ventriloquism. Experimental Brain Research 180(3);449–56. King, A.J. 2005. Multisensory integration: Strategies for synchronization. Current Biology 15(9);R339–41. King, A.J., and A.R. Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus. Experimental Brain Research 60(3);492–500. Kitagawa, N., M. Zampini, and C. Spence. 2005. Audiotactile interactions in near and far space. Experimental Brain Research 166(3–4);528–37. Kopinska, A., and L.R. Harris. 2004. Simultaneity constancy. Perception 33(9);1049–60. Korte, A. 1915. Kinematoskopische untersuchungen. Zeitschrift für Psychologie mit Zeitschrift für Angewandte Psychologie 72;194–296. Levitin, D., K. MacLean, M. Mathews, and L. Chu. 2000. The perception of cross-modal simultaneity. International Journal of Computing and Anticipatory Systems, 323–9. Lewald, J., and R. Guski. 2003. Cross-modal perceptual integration of spatially and temporally disparate auditory and visual stimuli. Cognitive Brain Research 16(3);468–78. Lewald, J., and R. Guski. 2004. Auditory–visual temporal integration as a function of distance: No compensation for sound-transmission time in human perception. Neuroscience Letters 357(2);119–22. Lewkowicz, D.J. 1996. Perception of auditory-visual temporal synchrony in human infants. Journal of Experimental Psychology. Human Perception and Performance 22(5);1094–106. Macaluso, E., N. George, R. Dolan, C. Spence, and J. Driver. 2004. Spatial and temporal factors during processing of audiovisual speech: A PET study. NeuroImage 21(2);725–32. Macefield, G., S.C. Gandevia, and D. Burke. 1989. Conduction velocities of muscle and cutaneous afferents in the upper and lower limbs of human subjects. Brain 112(6);1519–32. Mackay, D.M. 1958. Perceptual stability of a stroboscopically lit visual field containing self-luminous objects. Nature 181(4607);507–8. Massaro, D.W., M.M. Cohen, and P.M. Smeele. 1996. Perception of asynchronous and conflicting visual and auditory speech. Journal of the Acoustical Society of America 100(3);1777–86. Mattes, S., and R. Ulrich. 1998. Directed attention prolongs the perceived duration of a brief stimulus. Perception & Psychophysics 60(8);1305–17. McGrath, M., and Q. Summerfield. 1985. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. Journal of the Acoustical Society of America 77(2);678–85. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264(5588);746–8. Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. Journal of Neuroscience 7(10);3215–29. Mitrani, L., S. Shekerdjiiski, and N. Yakimoff. 1986. Mechanisms and asymmetries in visual perception of simultaneity and temporal order. Biological Cybernetics 54(3);159–65. Mollon, J.D., and A.J. Perkins. 1996. Errors of judgement at Greenwich in 1796. Nature 380(6570);101–2. Morein-Zamir, S., S. Soto-Faraco, and A. Kingstone. 2003. Auditory capture of vision: Examining temporal ventriloquism. Cognitive Brain Research 17(1);154–63.
174
The Neural Bases of Multisensory Processes
Mortlock, A.N., D. Machin, S. McConnell, and P. Sheppard. 1997. Virtual conferencing. BT Technology Journal 15;120–9. Munhall, K.G., P. Gribble, L. Sacco, and M. Ward. 1996. Temporal constraints on the McGurk effect. Perception & Psychophysics 58(3);351–62. Navarra, J., A. Vatakis, M. Zampini et al. 2005. Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain Research 25(2);499–507. Navarra, J., S. Soto-Faraco, and C. Spence. 2007. Adaptation to audiotactile asynchrony. Neuroscience Letters 413(1);72–6. Navarra, J., J. Hartcher-O’Brien, E. Piazza, and C. Spence. 2009. Adaptation to audiovisual asynchrony modulates the speeded detection of sound. Proceedings of the National Academy of Sciences of the United States of America 106(23);9169–73. Neumann, O., and M. Niepel. 2004. Timing of “perception” and perception of “time.” In C. Kaernbach, E. Schröger, and H. Müller (eds.), Psychophysics Beyond Sensation: Laws and Invariants of Human Cognition (pp. 245–70): Lawrence Erlbaum Associates, Inc. Nijhawan, R. 1994. Motion extrapolation in catching. Nature 370(6487);256–7. Nijhawan, R. 1997. Visual decomposition of colour through motion extrapolation. Nature 386(6620);66–9. Nijhawan, R. 2002. Neural delays, visual motion and the flash-lag effect. Trends in Cognitive Science 6(9);387. Noesselt, T., J.W. Rieger, M.A. Schoenfeld et al. 2007. Audiovisual temporal correspondence modulates humultisensory man superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience 27(42);11431–41. Occelli, V., C. Spence, and M. Zampini. 2008. Audiotactile temporal order judgments in sighted and blind individuals. Neuropsychologia 46(11);2845–50. Olivers, C.N., and E. van der Burg. 2008. Bleeping you out of the blink: Sound saves vision from oblivion. Brain Research 1242;191–9. Pandey, P.C., H. Kunov, and S.M. Abel. 1986. Disruptive effects of auditory signal delay on speech perception with lipreading. Journal of Auditory Research 26(1);27–41. Parise, C., and C. Spence. 2008. Synesthetic congruency modulates the temporal ventriloquism effect. Neuroscience Letters 442(3);257–61. Pöppel, E. 1985. Grenzes des bewusstseins, Stuttgart: Deutsche Verlags-Anstal, translated as Mindworks: Time and Conscious Experience. New York: Harcourt Brace Jovanovich. 1988. Poppel, E., K. Schill, and N. von Steinbuchel. 1990. Sensory integration within temporally neutral systems states: A hypothesis. Naturwissenschaften 77(2);89–91. Radeau, M. 1994. Auditory-visual spatial interaction and modularity. Cahiers de Psychologie Cognitive 13(1);3–51. Rihs, S. 1995. The Influence of Audio on Perceived Picture Quality and Subjective Audio-Visual Delay Tolerance. Paper presented at the MOSAIC Workshop: Advanced methods for the evaluation of television picture quality, Eindhoven, 18–19 September. Roefs, J.A.J. 1963. Perception lag as a function of stimulus luminance. Vision Research 3;81–91. Rutschmann, J., and R. Link. 1964. Perception of temporal order of stimuli differing in sense mode and simple reaction time. Perceptual and Motor Skills 18;345–52. Sanabria, D., S. Soto-Faraco, and C. Spence. 2004. Exploring the role of visual perceptual grouping on the audiovisual integration of motion. Neuroreport 15(18);2745–9. Sanford, A.J. 1971. Effects of changes in the intensity of white noise on simultaneity judgements and simple reaction time. Quarterly Journal of Experimental Psychology 23;296–303. Scheier, C.R., R. Nijhawan, and S. Shimojo. 1999. Sound alters visual temporal resolution. Investigative Ophthalmology & Visual Science 40;4169. Schneider, K.A., and D. Bavelier. 2003. Components of visual prior entry. Cognitive Psychology 47(4); 333–66. Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature 385;308–08. Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, and M.G. Woldorff. 2007. Good times for multisensory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations. Neuropsychologia 45(3);561–71. Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research 14(1);147–52. Shimojo, S., C. Scheier, R. Nijhawan et al. 2001. Beyond perceptual modality: Auditory effects on visual perception. Acoustical Science & Technology 22(2);61–67. Shipley, T. 1964. Auditory flutter-driving of visual flicker. Science 145;1328–30.
Perception of Synchrony between the Senses
175
Shore, D.I., C. Spence, and R.M. Klein. 2001. Visual prior entry. Psychological Science 12(3);205–12. Shore, D.I., C. Spence, and R.M. Klein. 2005. Prior entry. In L. Itti, G. Rees, and J. Tsotsos (eds.), Neurobiology of Attention (pp. 89–95). North Holland: Elsevier. Slutsky, D.A., and G.H. Recanzone. 2001. Temporal and spatial dependency of the ventriloquism effect. Neuroreport 12(1);7–10. Smith, W.F. 1933. The relative quickness of visual and auditory perception. Journal of Experimental Psychology 16;239–257. Soto-Faraco, S., and A. Alsius. 2007. Conscious access to the unisensory components of a cross-modal illusion. Neuroreport 18(4);347–50. Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimen tal Psychology. Human Perception and Performance 35(2);580–7. Spence, C., and J. Driver. 1996. Audiovisual links in endogenous covert spatial attention. Journal of Experimen tal Psychology. Human Perception and Performance 22(4);1005–30. Spence, C., and J. Driver. 2004. Crossmodal Space and Crossmodal Attention. Oxford: Oxford University Press. Spence, C., F. Pavani, and J. Driver. 2000. Crossmodal links between vision and touch in covert endogenous spatial attention. Journal of Experimental Psychology. Human Perception and Performance 26(4);1298–319. Spence, C., and S. Squire. 2003. Multisensory integration: Maintaining the perception of synchrony. Current Biology 13(13);R519–21. Spence, C., D.I. Shore, and R.M. Klein. 2001. Multisensory prior entry. Journal of Experimental Psychology. General 130(4);799–832. Spence, C., R. Baddeley, M. Zampini, R. James, and D.I. Shore. 2003. Multisensory temporal order judgments: When two locations are better than one. Perception & Psychophysics 65(2);318–28. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: The MIT Press. Stein, B.E., M.A. Meredith, and M.T. Wallace. 1993. The visually responsive neuron and beyond: Multisensory integration in cat and monkey. Progress in Brain Research 95;79–90. Stein, B.E., N. London, L.K. Wilkinson, and D.D. Price. 1996. Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience 8(6);497–506. Stekelenburg, J.J., and J. Vroomen. 2005. An event-related potential investigation of the time-course of temporal ventriloquism. Neuroreport 16;641–44. Stekelenburg, J.J., and J. Vroomen. 2007. Neural correlates of multisensory integration of ecologically valid audiovisual events. Journal of Cognitive Neuroscience 19(12);1964–73. Stelmach, L.B., and C.M. Herdman. 1991. Directed attention and perception of temporal order. Journal of Experimental Psychology. Human Perception and Performance 17(2);539–50. Sternberg, S., and R.L. Knoll. 1973. The perception of temporal order: Fundamental issues and a general model. In S. Kornblum (ed.), Attention and Performance (vol. IV, pp. 629–85). New York: Academic Press. Stetson, C., X. Cui, P.R. Montague, and D.M. Eagleman. 2006. Motor–sensory recalibration leads to an illusory reversal of action and sensation. Neuron 51(5);651–9. Stone, J.V., N.M. Hunkin, J. Porrill et al. 2001. When is now? Perception of simultaneity. Proceedings of the Royal Society of London. Series B. Biological Sciences 268(1462);31–8. Sugano, Y., M. Keetels, and J. Vroomen. 2010. Adaptation to motor–visual and motor–auditory temporal lags transfer across modalities. Experimental Brain Research 201(3);393–9. Sugita, Y., and Y. Suzuki. 2003. Audiovisual perception: Implicit estimation of sound-arrival time. Nature 421(6926);911. Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26;212–15. Summerfield, Q. 1987. A comprehensive account of audio-visual speech perception. In B. Dodd and R. Campbell (eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3–51). London: Lawrence Erlbaum Associates. Takahashi, K., J. Saiki, and K. Watanabe. 2008. Realignment of temporal simultaneity between vision and touch. Neuroreport 19(3);319–22. Tanaka, A., S. Sakamoto, K. Tsumura, and S. Suzuki. 2009a. Visual speech improves the intelligibility of timeexpanded auditory speech. Neuroreport 20;473–7. Tanaka, A., S. Sakamoto, K. Tsumura, and Y. Suzuki. 2009b. Visual speech improves the intelligibility of timeexpanded auditory speech. Neuroreport 20(5);473–7. Teatini, G., M. Ferne, F. Verzella, and J.P. Berruecos. 1976. Perception of temporal order: Visual and auditory stimuli. Giornale Italiano di Psicologia 3;157–64.
176
The Neural Bases of Multisensory Processes
Teder-Salejarvi, W.A., F. Di Russo, J.J. McDonald, and S.A. Hillyard. 2005. Effects of spatial congruity on audio-visual multimodal integration. Journal of Cognitive Neuroscience 17(9);1396–409. Titchener, E.B. 1908. Lectures on the Elementary Psychology of Feeling and Attention. New York: Macmillan. van de Par, S., and A. Kohlrausch. 2004. Visual and auditory object selection based on temporal correlations between auditory and visual cues. Paper presented at the 18th International Congress on Acoustics, Kyoto, Japan. van der Burg, E., C.N. Olivers, A.W. Bronkhorst, and J. Theeuwes. 2008a. Audiovisual events capture attention: Evidence from temporal order judgments. Journal of Vision 8(5);2, 1–10. van der Burg, E., C.N. Olivers, A.W. Bronkhorst, and J. Theeuwes. 2008b. Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology. Human Perception and Performance 34(5);1053–65. van der Burg, E., C.N. Olivers, A.W. Bronkhorst, and J. Theeuwes. 2009. Poke and pop: Tactile–visual synchrony increases visual saliency. Neuroscience Letters 450(1);60–4. Van Eijk, R.L. 2008. Audio-Visual Synchrony Perception. Thesis, Technische Universiteit Eindhoven, The Netherlands. Van Eijk, R.L., A. Kohlrausch, J.F. Juola, and S. van de Par. 2008. Audiovisual synchrony and temporal order judgments: Effects of experimental method and stimulus type. Perception & Psychophysics 70(6);955–68. van Wassenhove, V., K.W. Grant, and D. Poeppel. 2007. Temporal window of integration in auditory–visual speech perception. Neuropsychologia 45;598–601. Vatakis, A., and C. Spence. 2006a. Audiovisual synchrony perception for music, speech, and object actions. Brain Research 1111(1);134–42. Vatakis, A., and C. Spence. 2006b. Audiovisual synchrony perception for speech and music assessed using a temporal order judgment task. Neuroscience Letters 393(1);40–4. Vatakis, A., and C. Spence. 2007. Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli. Perception & Psychophysics 69(5);744–56. Vatakis, A., and C. Spence. 2008. Evaluating the influence of the ‘unity assumption’ on the temporal perception of realistic audiovisual stimuli. Acta Psychologica 127(1);12–23. Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2007. Temporal recalibration during asynchronous audiovisual speech perception. Experimental Brain Research 181(1);173–81. Vatakis, A., A.A. Ghazanfar, and C. Spence. 2008a. Facilitation of multisensory integration by the “unity effect” reveals that speech is special. Journal of Vision 8(9);14 1–11. Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2008b. Audiovisual temporal adaptation of speech: Temporal order versus simultaneity judgments. Experimental Brain Research 185(3);521–9. Vibell, J., C. Klinge, M. Zampini, C. Spence, and A.C. Nobre. 2007. Temporal order is coded temporally in the brain: Early event-related potential latency shifts underlying prior entry in a cross-modal temporal order judgment task. Journal of Cognitive Neuroscience 19(1);109–20. von Grunau, M.W. 1986. A motion aftereffect for long-range stroboscopic apparent motion. Perception & Psychophysics 40(1);31–8. Von Helmholtz, H. 1867. Handbuch der Physiologischen Optik. Leipzig: Leopold Voss. Vroomen, J., and B. de Gelder. 2000. Sound enhances visual perception: Cross-modal effects of auditory organization on vision. Journal of Experimental Psychology. Human Perception and Performance 26(5);1583–90. Vroomen, J., and B. de Gelder. 2004a. Perceptual effects of cross-modal stimulation: Ventriloquism and the freezing phenomenon. In G.A. Calvert, C. Spence, and B.E. Stein (eds.). The Handbook of Multisensory Processes. Cambridge, MA: MIT Press. Vroomen, J., and B. de Gelder. 2004b. Temporal ventriloquism: Sound modulates the flash-lag effect. Journal of Experimental Psychology. Human Perception and Performance 30(3);513–8. Vroomen, J., and M. Keetels. 2006. The spatial constraint in intersensory pairing: No role in temporal ventriloquism. Journal of Experimental Psychology. Human Perception and Performance 32(4);1063–71. Vroomen, J., and M. Keetels. 2009. Sounds change four-dot masking. Acta Psychologica 130(1);58–63. Vroomen, J., and J.J. Stekelenburg. 2009. Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli. Journal of Cognitive Neuroscience 22(7);1583–96. Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by exposure to audio-visual asynchrony. Cognitive Brain Research 22(1);32–5. Watanabe, K., and S. Shimojo. 2001. When sound affects vision: Effects of auditory grouping on visual motion perception. Psychological Science 12(2);109–16.
Perception of Synchrony between the Senses
177
Welch, R.B. 1999. Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, and J. Müsseler (eds.), Cognitive Contributions to the Perception of Spatial and Temporal Events (pp. 371–87). Amsterdam: Elsevier. Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 88(3);638–67. Yamamoto, S., M. Miyazaki, T. Iwano, and S. Kitazawa. 2008. Bayesian calibration of simultaneity in audiovisual temporal order judgment. Paper presented at the 9th International Multisensory Research Forum (July 16–19, 2008). Hamburg, Germany. Zampini, M., D.I. Shore, and C. Spence. 2003a. Audiovisual temporal order judgments. Experimental Brain Research 152(2);198–210. Zampini, M., D.I. Shore, and C. Spence. 2003b. Multisensory temporal order judgments: The role of hemispheric redundancy. International Journal of Psychophysiology 50(1–2);165–80. Zampini, M., T. Brown, D.I. Shore et al. 2005a. Audiotactile temporal order judgments. Acta Psychologica 118(3);277–91. Zampini, M., S. Guest, D.I. Shore, and C. Spence. 2005b. Audio-visual simultaneity judgments. Perception & Psychophysics 67(3);531–44. Zampini, M., D.I. Shore, and C. Spence. 2005c. Audiovisual prior entry. Neuroscience Letters 381(3);217–22.
10
Representation of Object Form in Vision and Touch Simon Lacey and Krish Sathian
CONTENTS 10.1 Introduction........................................................................................................................... 179 10.2 Cortical Regions Involved in Visuo-Haptic Shape Processing............................................. 179 10.2.1 Lateral Occipital Complex......................................................................................... 179 10.2.2 Parietal Cortical Regions........................................................................................... 180 10.3 Do Vision and Touch Share a Common Shape Representation?........................................... 180 10.3.1 Potential Role of Visual Imagery.............................................................................. 180 10.3.2 A Modality-Independent Shape Representation?...................................................... 181 10.4 Properties of Shared Representation..................................................................................... 182 10.4.1 View-Dependence in Vision and Touch.................................................................... 182 10.4.2 Cross-Modal View-Independence............................................................................. 183 10.5 An Integrative Framework for Visuo- Haptic Shape Representation..................................... 183 Acknowledgments........................................................................................................................... 184 References....................................................................................................................................... 184
10.1 INTRODUCTION The idea that the brain processes sensory inputs in parallel modality-specific streams has given way to the concept of a “metamodal” brain with a multisensory task-based organization (Pascual-Leone and Hamilton 2001). For example, recent research shows that many cerebral cortical regions previously considered to be specialized for processing various aspects of visual input are also activated during analogous tactile or haptic tasks (reviewed by Sathian and Lacey 2007). In this article, which concentrates on shape processing in humans, we review the current state of knowledge about the mental representation of object form in vision and touch. We begin by describing the cortical regions showing multisensory responses to object form. Next, we consider the extent to which the underlying representation of object form is explained by cross-modal visual imagery or multisensory convergence. We then review recent work on the view-dependence of visuo-haptic shape representations and the resulting model of a multisensory, view-independent representation. Finally, we discuss a recently presented conceptual framework of visuo-haptic shape processing as a basis for future investigations.
10.2 CORTICAL REGIONS INVOLVED IN VISUO-HAPTIC SHAPE PROCESSING 10.2.1 Lateral Occipital Complex Most notable among the several cortical regions implicated in visuo-haptic shape processing is the lateral occipital complex (LOC), an object-selective region in the ventral visual pathway (Malach et al. 1995). Part of the LOC responds selectively to objects in both vision and touch and has been termed LOtv (Amedi et al. 2001, 2002). The LOC is shape-selective during both haptic 179
180
The Neural Bases of Multisensory Processes
three- dimensional shape perception (Amedi et al. 2001; Stilla and Sathian 2008; Zhang et al. 2004) and tactile two-dimensional shape perception (Stoesz et al. 2003; Prather et al. 2004). Neurological case studies indicate that the LOC is necessary for both haptic and visual shape perception: a patient with a left occipitotemporal cortical lesion, likely including the LOC, was found to exhibit tactile in addition to visual agnosia (inability to recognize objects), although somatosensory cortex and basic somatosensory function were intact (Feinberg et al. 1986). Another patient with bilateral LOC lesions could not learn new objects either visually or haptically (James et al. 2006). LOtv is thought to be a processor of geometric shape because it is not activated during object recognition triggered by object-specific sounds (Amedi et al. 2002). Interestingly, though, LOtv does respond when auditory object recognition is mediated by a visual–auditory sensory substitution device that converts visual shape information into an auditory stream, but only when individuals (whether sighted or blind) are specifically trained in a manner permitting generalization to untrained objects and not when merely arbitrary associations are taught (Amedi et al. 2007). This dissociation further bolsters the idea that LOtv is concerned with geometric shape information, regardless of the input sensory modality.
10.2.2 Parietal Cortical Regions Multisensory shape selectivity also occurs in parietal cortical regions, including the postcentral sulcus (Stilla and Sathian 2008), which is the location of Brodmann’s area 2 in human primary somatosensory cortex (S1; Grefkes et al. 2001). Although this region is generally assumed to be purely somatosensory, earlier neurophysiological observations in monkeys suggested visual responsiveness in parts of S1 (Iwamura 1998; Zhou and Fuster 1997). Visuo-haptic shape selectivity has also repeatedly been reported in various parts of the human intraparietal sulcus (IPS), which is squarely in classical multisensory cortex. The particular bisensory foci are either anteriorly in the IPS (Grefkes et al. 2002; Stilla and Sathian 2008), in either the region referred to as the anterior intraparietal area (AIP; Grefkes and Fink 2005; Shikata et al. 2008) or that termed the medial intraparietal area (Grefkes et al. 2004), or posteroventrally (Saito et al. 2003; Stilla and Sathian 2008) in a region comprising the caudal intraparietal area (CIP; Shikata et al. 2008) and the adjacent, retinotopically mapped areas IPS1 and V7 (Swisher et al. 2007). It should be noted that areas AIP, medial intraparietal, CIP, and V7 were first described in macaque monkeys, and their homologies in humans remain somewhat uncertain. A recent study reported that repetitive transcranial magnetic stimulation over the left anterior IPS impaired visual–haptic, but not haptic–visual, shape matching using the right hand (Buelte et al. 2008). However, repetitive transcranial magnetic stimulation over the right AIP during shape matching with the left hand had no effect on either cross-modal condition. The reason for this discrepancy is unclear, and emphasizes that the exact roles of the postcentral sulcus, the IPS regions, and LOtv in multisensory shape processing remain to be fully worked out.
10.3 DO VISION AND TOUCH SHARE A COMMON SHAPE REPRESENTATION? 10.3.1 Potential Role of Visual Imagery An intuitively appealing explanation for haptically evoked activation of visual cortex is that this is mediated by visual imagery rather than multisensory convergence of inputs (Sathian et al. 1997). The visual imagery hypothesis is supported by evidence that the LOC is active during visual imagery. For example, the left LOC is active during mental imagery of familiar objects previously explored haptically by blind individuals or visually by sighted individuals (De Volder et al. 2001), and also during recall of both geometric and material object properties from memory (Newman et al. 2005). Furthermore, individual differences in ratings of the vividness of visual imagery were found to strongly predict individual differences in haptic shape-selective activation magnitudes in the right LOC (Zhang et al. 2004). On the other hand, the magnitude of LOC activation during
Representation of Object Form in Vision and Touch
181
visual imagery can be considerably less than during haptic shape perception, suggesting that visual imagery may be relatively unimportant in haptic shape perception (Amedi et al. 2001; see also Reed et al. 2004). However, performance on the visual imagery task has not generally been monitored, so that lower levels of LOC activity during visual imagery could simply reflect participants not maintaining their visual images throughout the imagery scan. Because both the early and late blind show shape-related activity in the LOC evoked by tactile input (Amedi et al. 2003; Burton et al. 2002; Pietrini et al. 2004; Stilla et al. 2008; reviewed by Pascual-Leone et al. 2005; Sathian 2005; Sathian and Lacey 2007), or by auditory input when sensory substitution devices were used (Amedi et al. 2007; Arno et al. 2001; Renier et al. 2004, 2005), some have concluded that visual imagery does not account for cross-modal activation of visual cortex. Although this is true for the early-blind, it certainly does not exclude the use of visual imagery in the sighted, especially in view of the abundant evidence for cross-modal plasticity resulting from visual deprivation (Pascual-Leone et al. 2005; Sathian 2005; Sathian and Lacey 2007). It is also important to be clear about what is meant by “visual imagery,” which is often treated as a unitary ability. Recent research has shown that there are two different kinds of visual imagery: “object imagery” (images that are pictorial and deal with the actual appearance of objects in terms of shape, color, brightness, and other surface properties) and “spatial imagery” (more schematic images dealing with the spatial relations of objects and their component parts and with spatial transformations; Kozhevnikov et al. 2002, 2005; Blajenkova et al. 2006). This distinction is relevant because both vision and touch encode spatial information about objects—for example, size, shape, and the relative positions of different object features—such information may well be encoded in a modality-independent spatial representation (Lacey and Campbell 2006). Support for this possibility is provided by recent work showing that spatial, but not object, imagery scores were correlated with accuracy on cross-modal, but not within-modal, object identification for a set of closely similar and previously unfamiliar objects (Lacey et al. 2007a). Thus, it is probably beneficial to explore the roles of object and spatial imagery rather than taking an undifferentiated visual imagery approach. We return to this idea later but, as an aside, we note that the object–spatial dimension of imagery can be viewed as orthogonal to the modality involved, as there is evidence that early-blind individuals perform both object-based and spatially based tasks equally well (Aleman et al. 2001; see also Noordzij et al. 2007). However, the object–spatial dimension of haptically derived representations remains unexplored.
10.3.2 A Modality-Independent Shape Representation? An alternative to the visual imagery hypothesis is that incoming inputs in both vision and touch converge on a modality-independent representation, which is suggested by the overlap of visual and haptic shape-selective activity in the LOC. Some researchers refer to such modality-independent representations as “amodal,” but we believe that this term is best reserved for linguistic or other abstract representations. Instead, we suggest the use of the term “multisensory” to refer to a representation that can be encoded and retrieved by multiple sensory systems and which retains the modality “tags” of the associated inputs (Sathian 2004). The multisensory hypothesis is suggested by studies of effective connectivity derived from functional magnetic resonance imaging (fMRI) data indicating bottom-up projections from S1 to the LOC (Peltier et al. 2007; Deshpande et al. 2008) and also by electrophysiological data showing early propagation of activity from S1 into the LOC during tactile shape discrimination (Lucan et al. 2011). If vision and touch engage a common spatial representational system, then we would expect to see similarities in processing of visually and haptically derived representations and this, in fact, turns out to be the case. Thus, LOC activity is greater when viewing objects previously primed haptically, compared to viewing nonprimed objects (James et al. 2002b). In addition, behavioral studies have shown that cross-modal priming is as effective as within-modal priming (Easton et al. 1997a, 1997b; Reales and Ballesteros 1999). Candidate regions for housing a common visuo-haptic shape
182
The Neural Bases of Multisensory Processes
representation include the right LOC and the left CIP because activation magnitudes during visual and haptic processing of (unfamiliar) shape are significantly correlated across subjects in these regions (Stilla and Sathian 2008). Furthermore, the time taken to scan both visual images (Kosslyn 1973; Kosslyn et al. 1978) and haptically derived images (Röder and Rösler 1998) increases with the spatial distance to be inspected. Also, the time taken to judge whether two objects are the same or mirror images increases nearly linearly with increasing angular disparity between the objects for mental rotation of both visual (Shepard and Metzler 1971) and haptic stimuli (Marmor and Zaback 1976; Carpenter and Eisenberg 1978; Hollins 1986; Dellantonio and Spagnolo 1990). The same relationship was found when the angle between a tactile stimulus and a canonical angle was varied, with associated activity in the left anterior IPS (Prather et al. 2004), an area also active during mental rotation of visual stimuli (Alivisatos and Petrides 1997), and probably corresponding to AIP (Grefkes and Fink 2005; Shikata et al. 2008). Similar processing has been found with sighted, early- and late-blind individuals (Carpenter and Eisenberg 1978; Röder and Rösler 1998). These findings suggest that spatial metric information is preserved in both vision and touch, and that both modalities rely on similar, if not identical, imagery processes (Röder and Rösler 1998).
10.4 PROPERTIES OF SHARED REPRESENTATION In this section, we discuss the properties of the multisensory representation of object form with particular reference to recent work on view-independence in visuo-haptic object recognition. The representation of an object is said to be view-dependent if rotating the object away from the learned view impairs object recognition, that is, optimal recognition depends on perceiving the same view of the object. By contrast, a representation is view-independent if objects are correctly identified despite being rotated to provide a different view. The shared multisensory representation that enables crossmodal object recognition is likely distinct from the separate unisensory representations that support visual and haptic within-modal object recognition: we examine the relationship between these.
10.4.1 View-Dependence in Vision and Touch It has long been known that visual object representations are view-dependent (reviewed by Peissig and Tarr 2007) but it might be expected that haptic object representations are view-independent because the hands can simultaneously contact an object from different sides (Newell et al. 2001). This expectation is reinforced because following the contours of a three-dimensional object is necessary for haptic object recognition (Lederman and Klatzky 1987). Nonetheless, several studies have shown that haptic object representations are in fact view-dependent for both unfamiliar (Newell et al. 2001; Lacey et al. 2007a) and familiar objects (Lawson 2009). This may be because the biomechanics of the hands can be restrictive in some circumstances: some hand positions naturally facilitate exploration more than others (Woods et al. 2008). Furthermore, for objects with a vertical main axis, haptic exploration is biased to the far (back) “view” of an object, explored by the fingers whereas the thumbs stabilize the object rather than explore it (Newell et al. 2001). However, haptic recognition remains view-dependent even when similar objects are presented so that their main axis is horizontal, an orientation that allows more freely comprehensive haptic exploration of multiple object surfaces (Lacey et al. 2007a). The extent to which visual object recognition is impaired by changes in orientation depends on the particular axis of rotation: picture-plane rotations are less disruptive than depth-plane rotations in both object recognition and mental rotation tasks, even though these tasks depend on different visual pathways—ventral and dorsal, respectively (Gauthier et al. 2002). By contrast, haptic object recognition is equally disrupted by rotation about each of the three main axes (Lacey et al. 2007a). Thus, although visual and haptic unisensory representations may be functionally equivalent in that they are both view-dependent, the underlying basis for this may be very different in each case.
Representation of Object Form in Vision and Touch
183
A further functional equivalence between visual and haptic object representation is that each has preferred or canonical views of objects. In vision, the preferred view for both familiar and unfamiliar objects is one in which the main axis is angled at 45° to the observer (Palmer et al. 1981; Perrett et al. 1992). Recently, Woods et al. (2008) have shown that haptic object recognition also has canonical views—again independently of familiarity—but that these are defined by reference to the midline of the observer’s body, the object’s main axis being aligned either parallel or perpendicular to the midline. This may be due to grasping and object function: Craddock and Lawson (2008) found that haptic recognition was better for objects in typical rather than atypical orientations; for example, a cup oriented with the handle to the right for a right-handed person.
10.4.2 Cross-Modal View-Independence Remarkably, although visual and haptic within-modal object recognition are both view-dependent, visuo-haptic cross-modal recognition is view-independent (Lacey et al. 2007a; Ueda and Saiki 2007). Rotating an object away from the learned view did not degrade recognition, whether visual study was followed by haptic test or vice versa (Lacey et al. 2007a; Ueda and Saiki 2007), although Lawson (2009) found view-independence only in the haptic study–visual test condition. Cross-modal object recognition was also independent of the particular axis of rotation (Lacey et al. 2007a). Thus, visuo-haptic cross-modal object recognition clearly relies on a different representation from that involved in the corresponding within-modal task (see also Newell et al. 2005). In a recent series of experiments, we used a perceptual learning paradigm to investigate the relationship between the unisensory view-dependent and multisensory view-independent representations (Lacey et al. 2009a). We showed that a relatively brief period of within-modal learning to establish within-modal view-independence resulted in complete, symmetric cross-modal transfer of view-independence: visual view-independence acquired following exclusively visual learning also resulted in haptic view-independence, and vice versa. In addition, both visual–haptic and haptic– visual cross-modal learning also transformed visual and haptic within-modal recognition from viewdependent to view-independent. We concluded from this study that visual and haptic within-modal and visuo-haptic cross-modal view-independence all rely on the same shared representation. Thus, this study and its predecessor (Lacey et al. 2007a) suggest a model of view-independence in which separate, view-dependent, unisensory representations feed directly into a view-independent, bisensory representation rather than being routed through intermediate, unisensory, view-independent representations. A possible mechanism for this is the integration of multiple low-level, view-dependent, unisensory representations into a higher-order, view-independent, multisensory representation (see Riesenhuber and Poggio 1999 for a similar proposal regarding visual object recognition). Cortical localization of this modality-independent, view-independent representation is an important goal for future work. Although the IPS is a potential candidate, being a well-known convergence site for visual and haptic shape processing (Amedi et al. 2001; James et al. 2002b; Zhang et al. 2004; Stilla and Sathian 2008), IPS responses appear to be view-dependent (James et al. 2002a). The LOC also shows convergent multisensory shape processing; however, responses in this area have shown viewdependence in some studies (Grill-Spector et al. 1999; Gauthier et al. 2002) but view-independence in other studies (James et al. 2002a).
10.5 AN INTEGRATIVE FRAMEWORK FOR VISUO- HAPTIC SHAPE REPRESENTATION An important goal of multisensory research is to model the processes underlying visuo-haptic object representation. As a preliminary step to this goal, we have recently investigated connectivity and intertask correlations of activation magnitudes during visual object imagery and haptic perception of both familiar and unfamiliar objects (Deshpande et al. 2010; Lacey et al. 2010). In the visual
184
The Neural Bases of Multisensory Processes
object imagery task, participants listened to word pairs and decided whether the objects designated by those words had the same or different shapes. Thus, in contrast with earlier studies, participants had to process their images throughout the scan and this could be verified by monitoring their performance. In a separate session, participants performed a haptic shape discrimination task. For one group of subjects, the haptic objects were familiar; for the other group, they were unfamiliar. We found that both intertask correlations and connectivity were modulated by object familiarity (Deshpande et al. 2010; Lacey et al. 2010). Although the LOC was active bilaterally during both visual object imagery and haptic shape perception, there was an intertask correlation only for familiar shape. Analysis of connectivity showed that visual object imagery and haptic familiar shape perception engaged quite similar networks characterized by top-down paths from prefrontal and parietal regions into the LOC, whereas a very different network emerged during haptic perception of unfamiliar shape, featuring bottom-up inputs from S1 to the LOC (Deshpande et al. 2010). Based on these findings and on the literature reviewed earlier in this chapter, we proposed a conceptual framework for visuo-haptic object representation that integrates the visual imagery and multisensory approaches (Lacey et al. 2009b). In this proposed framework, the LOC houses a representation that is independent of the input sensory modality and is flexibly accessible via either bottom-up or top-down pathways, depending on object familiarity (or other task attributes). Haptic perception of familiar shape uses visual object imagery via top-down paths from prefrontal and parietal areas into the LOC whereas haptic perception of unfamiliar shape may use spatial imagery processes and involves bottom-up pathways from the somatosensory cortex to the LOC. Because there is no stored representation of an unfamiliar object, its global shape has to be computed by exploring it in its entirety and the framework would therefore predict the somatosensory drive of LOC. The IPS has been implicated in visuo-haptic perception of both shape and location (Stilla and Sathian 2008; Gibson et al. 2008). We might therefore expect that, to compute global shape in unfamiliar objects, the IPS would be involved in processing the relative spatial locations of object parts. For familiar objects, global shape can be inferred easily, perhaps from distinctive features that are sufficient to retrieve a visual image, and so the framework predicts increased contribution from parietal and prefrontal regions. Clearly, objects are not exclusively familiar or unfamiliar and individuals are not purely object or spatial imagers: these are continua along which objects and individuals may vary. In this respect, an individual differences approach is likely to be productive (see Lacey et al. 2007b; Motes et al. 2008) because these factors may interact, with different weights in different circumstances, for example task demands or individual history (visual experience, training, etc.). More work is required to define and test this framework.
ACKNOWLEDGMENTS This work was supported by the National Eye Institute, the National Science Foundation, and the Veterans Administration.
REFERENCES Aleman, A., L. van Lee, M.H.M. Mantione, I.G. Verkoijen, and E.H.F. de Haan. 2001. Visual imagery without visual experience: Evidence from congenitally totally blind people. Neuroreport 12:2601–2604. Alivisatos, B., and M. Petrides. 1997. Functional activation of the human brain during mental rotation. Neuropsychologia 36:11–118. Amedi, A., R. Malach, T. Hendler, S. Peled, and E. Zohary. 2001. Visuo-haptic object-related activation in the ventral visual pathway. Nature Neuroscience 4:324–330. Amedi, A., G. Jacobson, T. Hendler, R. Malach, and E. Zohary. 2002. Convergence of visual and tactile shape processing in the human lateral occipital complex. Cerebral Cortex 12:1202–1212. Amedi, A., N. Raz, P. Pianka, R. Malach, and E. Zohary. 2003 Early ‘visual’ cortex activation correlates with superior verbal memory performance in the blind. Nature Neuroscience 6:758–766.
Representation of Object Form in Vision and Touch
185
Amedi, A., W.M. Stern, J.A. Camprodon et al. 2007. Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex. Nature Neuroscience 10:687–689. Arno, P., A.G. De Volder, A. Vanlierde et al. 2001. Occipital activation by pattern recognition in the early blind using auditory substitution for vision. NeuroImage 13:632–645. Blajenkova, O., M. Kozhevnikov, and M.A. Motes. 2006. Object-spatial imagery: A new self-report imagery questionnaire. Applied Cognitive Psychology 20:239–263. Buelte, D., I.G. Meister, M. Staedtgen et al. 2008. The role of the anterior intraparietal sulcus in crossmodal processing of object features in humans: An rTMS study. Brain Research 1217:110–118. Burton, H., A.Z. Snyder, T.E. Conturo, E. Akbudak, J.M. Ollinger, and M.E. Raichle. 2002. Adaptive changes in early and late blind: A fMRI study of Braille reading. Journal of Neurophysiology 87:589–607. Carpenter, P.A., and P. Eisenberg. 1978. Mental rotation and the frame of reference in blind and sighted individuals. Perception & Psychophysics 23:117–124. Craddock, M., and R. Lawson. 2008. Repetition priming and the haptic recognition of familiar and unfamiliar objects. Perception & Psychophysics 70:1350–1365. Dellantonio, A., and F. Spagnolo. 1990. Mental rotation of tactual stimuli. Acta Psychologica 73:245–257. Deshpande, G., X. Hu, R. Stilla, and K. Sathian. 2008. Effective connectivity during haptic perception: A study using Granger causality analysis of functional magnetic resonance imaging data. NeuroImage 40:1807–1814. Deshpande, G., X. Hu, S. Lacey, R. Stilla, and K. Sathian. 2010. Object familiarity modulates effective connectivity during haptic shape perception. NeuroImage 49:1991–2000. De Volder, A.G., H. Toyama, Y. Kimura et al. 2001. Auditory triggered mental imagery of shape involves visual association areas in early blind humans. NeuroImage 14:129–139. Easton, R.D., A.J. Greene, and K. Srinivas. 1997a. Transfer between vision and haptics: Memory for 2-D patterns and 3-D objects. Psychonomic Bulletin & Review 4:403–410. Easton, R.D., K. Srinivas, and A.J. Greene. 1997b. Do vision and haptics share common representations? Implicit and explicit memory within and between modalities. Journal of Experimental Psychology. Learning, Memory, and Cognition 23:153–163. Feinberg, T.E., L.J. Rothi, and K.M. Heilman. 1986. Multimodal agnosia after unilateral left hemisphere lesion. Neurology 36:864–867. Gauthier, I., W.G. Hayward, M.J. Tarr et al. 2002. BOLD activity during mental rotation and view-dependent object recognition. Neuron 34:161–171. Gibson, G., R. Stilla, and K. Sathian. 2008. Segregated visuo-haptic processing of texture and location. Abstract, Human Brain Mapping. Grefkes, C., S. Geyer, T. Schormann, P. Roland, and K. Zilles. 2001. Human somatosensory area 2: Observerindependent cytoarchitectonic mapping, interindividual variability, and population map. NeuroImage 14:617–631. Grefkes, C., P.H. Weiss, K. Zilles, and G.R. Fink. 2002. Crossmodal processing of object features in human anterior intraparietal cortex: An fMRI study implies equivalencies between humans and monkeys. Neuron 35:173–184. Grefkes, C., A. Ritzl, K. Zilles, and G.R. Fink. 2004. Human medial intraparietal cortex subserves visuomotor coordinate transformation. NeuroImage 23:1494–1506. Grefkes, C., and G. Fink. 2005. The functional organization of the intraparietal sulcus in humans and monkeys. Journal of Anatomy 207:3–17. Grill-Spector, K., T. Kushnir, S. Edelman, G. Avidan, Y. Itzchak, and R. Malach. 1999. Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron 24:187–203. Hollins, M. 1986. Haptic mental rotation: More consistent in blind subjects? Journal of Visual Impairment & Blindness 80:950–952. Iwamura, Y. 1998. Hierarchical somatosensory processing. Current Opinion in Neurobiology 8:522–528. James, T.W., G.K. Humphrey, J.S. Gati, R.S. Menon, and M.A. Goodale. 2002a. Differential effects of view on object-driven activation in dorsal and ventral streams. Neuron 35:793–801. James, T.W., G.K. Humphrey, J.S. Gati, P. Servos, R.S. Menon, and M.A. Goodale. 2002b. Haptic study of three-dimensional objects activates extrastriate visual areas. Neuropsychologia 40:1706–1714. James, T.W., K.H. James, G.K. Humphrey, and M.A. Goodale. 2006. Do visual and tactile object representations share the same neural substrate? In Touch and Blindness: Psychology and Neuroscience, ed. M.A. Heller and S. Ballesteros, 139–155. Mahwah, NJ: Lawrence Erlbaum Associates. Kosslyn, S.M. 1973. Scanning visual images: Some structural implications. Perception & Psychophysics 14:90–94.
186
The Neural Bases of Multisensory Processes
Kosslyn, S.M., T.M. Ball, and B.J. Reiser. 1978. Visual images preserve metric spatial information: Evidence from studies of image scanning. Journal of Experimental Psychology. Human Perception and Performance 4:47–60. Kozhevnikov, M., M. Hegarty, and R.E. Mayer. 2002. Revising the visualiser–verbaliser dimension: Evidence for two types of visualisers. Cognition and Instruction 20:47–77. Kozhevnikov, M., S.M. Kosslyn, and J. Shephard. 2005. Spatial versus object visualisers: A new characterisation of cognitive style. Memory & Cognition 33:710–726. Lacey, S., and C. Campbell. 2006. Mental representation in visual/haptic crossmodal memory: Evidence from interference effects. Quarterly Journal of Experimental Psychology 59:361–376. Lacey, S., A. Peters, and K. Sathian. 2007a. Cross-modal object representation is viewpoint-independent. PLoS ONE 2:e890. doi: 10.1371/journal.pone0000890. Lacey, S., C. Campbell, and K. Sathian. 2007b. Vision and touch: Multiple or multisensory representations of objects? Perception 36:1513–1521. Lacey, S., M. Pappas, A. Kreps, K. Lee, and K. Sathian. 2009a. Perceptual learning of view-independence in visuo-haptic object representations. Experimental Brain Research 198:329–337. Lacey, S., N. Tal, A. Amedi, and K. Sathian. 2009b. A putative model of multisensory object representation. Brain Topography 21:269–274. Lacey, S., P. Flueckiger, R. Stilla, M. Lava, and K. Sathian. 2010. Object familiarity modulates the relationship between visual object imagery and haptic shape perception. NeuroImage 49:1977–1990. Lawson, R. 2009. A comparison of the effects of depth rotation on visual and haptic three-dimensional object recognition. Journal of Experimental Psychology. Human Perception and Performance 35:911–930. Lederman, S.J., and R.L. Klatzky. 1987. Hand movements: A window into haptic object recognition. Cognitive Psychology 19:342–368. Lucan, J.N., J.J. Foxe, M. Gomez-Ramirez, K. Sathian, and S. Molholm. 2011. Tactile shape discrimination recruits human lateral occipital complex during early perceptual processing. Human Brain Mapping 31:1813–1821. Malach, R., J.B. Reppas, R.R. Benson et al. 1995. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proceedings of the National Academy of Sciences of the United States of America 92:8135–8139. Marmor, G.S., and L.A. Zaback. 1976. Mental rotation by the blind: Does mental rotation depend on visual imagery? Journal of Experimental Psychology. Human Perception and Performance 2:515–521. Motes, M.A., R. Malach, and M. Kozhevnikov. 2008. Object-processing neural efficiency differentiates object from spatial visualizers. Neuroreport 19:1727–1731. Newell, F.N., M.O. Ernst, B.S. Tjan, and H.H. Bülthoff. 2001. View dependence in visual and haptic object recognition. Psychological Science 12:37–42. Newell, F.N., A.T. Woods, M. Mernagh, and H.H. Bülthoff. 2005. Visual, haptic and crossmodal recognition of scenes. Experimental Brain Research 161:233–242. Newman, S.D., R.L. Klatzky, S.J. Lederman, and M.A. Just. 2005. Imagining material versus geometric properties of objects: An fMRI study. Cognitive Brain Research 23:235–246. Noordzij, M.L., S. Zuidhoek, and A. Postma. 2007. The influence of visual experience on visual and spatial imagery. Perception 36:101–112. Palmer, S., E. Rosch, and P. Chase. 1981. Canonical perspective and the perception of objects. In Attention and Performance IX, ed. J.B. Long and A.D. Baddeley, 135–151. Hillsdale, NJ: Lawrence Earlbaum Associates. Pascual-Leone, A., and R.H. Hamilton. 2001. The metamodal organization of the brain. Progress in Brain Research 134:427–445. Pascual-Leone, A., A. Amedi, F. Fregni, and L.B. Merabet. 2005. The plastic human brain. Annual Review of Neuroscience 28:377–401. Peissig, J.J., and M.J. Tarr. 2007. Visual object recognition: Do we know more now than we did 20 years ago? Annual Review of Psychology 58:75–96. Peltier, S., R. Stilla, E. Mariola, S. LaConte, X. Hu, and K. Sathian. 2007. Activity and effective connectivity of parietal and occipital cortical regions during haptic shape perception. Neuropsychologia 45:476–483. Perrett, D.I., M.H. Harries, and S. Looker. 1992. Use of preferential inspection to define the viewing sphere and characteristic views of an arbitrary machined tool part. Perception 21:497–515. Pietrini, P., M.L. Furey, E. Ricciardi et al. 2004. Beyond sensory images: Object-based representation in the human ventral pathway. Proceedings of the National Academy of Sciences of the United States of America 101:5658–5663.
Representation of Object Form in Vision and Touch
187
Prather, S.C., J.R. Votaw, and K. Sathian. 2004. Task-specific recruitment of dorsal and ventral visual areas during tactile perception. Neuropsychologia 42:1079–1087. Reales, J.M., and S. Ballesteros. 1999. Implicit and explicit memory for visual and haptic objects: Cross-modal priming depends on structural descriptions. Journal of Experimental Psychology. Learning, Memory, and Cognition 25:644–663. Reed, C.L., S. Shoham, and E. Halgren. 2004. Neural substrates of tactile object recognition: An fMRI study. Human Brain Mapping 21:236–246. Renier, L., O. Collignon, D. Tranduy et al. 2004. Visual cortex activation in early blind and sighted subjects using an auditory visual substitution device to perceive depth. NeuroImage 22:S1. Renier, L., O. Collignon, C. Poirier et al. 2005. Cross modal activation of visual cortex during depth perception using auditory substitution of vision. NeuroImage 26:573–580. Riesenhuber, M., and T. Poggio. 1999. Hierarchical models of object recognition in cortex. Nature Neuroscience 2:1019–1025. Röder, B., and F. Rösler. 1998. Visual input does not facilitate the scanning of spatial images. Journal of Mental Imagery 22:165–181. Saito, D.N., T. Okada, Y. Morita, Y. Yonekura, and N. Sadato. 2003. Tactile–visual cross-modal shape matching: A functional MRI study. Cognitive Brain Research 17:14–25. Sathian, K. 2004. Modality, quo vadis? Comment. Behavioral and Brain Sciences 27:413–414. Sathian, K. 2005. Visual cortical activity during tactile perception in the sighted and the visually deprived. Developmental Psychobiology 46:279–286. Sathian, K., and S. Lacey. 2007. Journeying beyond classical somatosensory cortex. Canadian Journal of Experimental Psychology 61:254–264. Sathian, K., A. Zangaladze, J.M. Hoffman, and S.T. Grafton. 1997. Feeling with the mind’s eye. Neuroreport 8:3877–3881. Shepard, R.N., and J. Metzler. 1971. Mental rotation of three-dimensional objects. Science 171:701–703. Shikata, E., A. McNamara, A. Sprenger et al. 2008. Localization of human intraparietal areas AIP, CIP, and LIP using surface orientation and saccadic eye movement tasks. Human Brain Mapping 29:411–421. Stilla, R., R. Hanna, X. Hu, E. Mariola, G. Deshpande, and K. Sathian. 2008. Neural processing underlying tactile microspatial discrimination in the blind: A functional magnetic resonance imaging study. Journal of Vision 8:1–19 doi:10.1167/8.10.13. Stilla, R., and K. Sathian. 2008. Selective visuo-haptic processing of shape and texture. Human Brain Mapping 29:1123–1138. Stoesz, M., M. Zhang, V.D. Weisser, S.C. Prather, H. Mao, and K. Sathian. 2003. Neural networks active during tactile form perception: Common and differential activity during macrospatial and microspatial tasks. International Journal of Psychophysiology 50:41–49. Swisher, J.D., M.A. Halko, L.B. Merabet, S.A. McMains, and D.C. Somers. 2007. Visual topography of human intraparietal sulcus. Journal of Neuroscience 27:5326–5337. Ueda, Y., and J. Saiki. 2007. View independence in visual and haptic object recognition. Japanese Journal of Psychonomic Science 26:11–19. Woods, A.T., A. Moore, and F.N. Newell. 2008. Canonical views in haptic object representation. Perception 37:1867–1878. Zhang, M., V.D. Weisser, R. Stilla, S.C. Prather, and K. Sathian. 2004. Multisensory cortical processing of object shape and its relation to mental imagery. Cognitive, Affective & Behavioral Neuroscience 4:251–259. Zhou, Y.-D., and J.M. Fuster. 1997. Neuronal activity of somatosensory cortex in a cross-modal (visuo-haptic) memory task. Experimental Brain Research 116:551–555.
Section III Combinatorial Principles and Modeling
11
Spatial and Temporal Features of Multisensory Processes Bridging Animal and Human Studies Diana K. Sarko, Aaron R. Nidiffer, Albert R. Powers III, Dipanwita Ghose, Andrea Hillock-Dunn, Matthew C. Fister, Juliane Krueger, and Mark T. Wallace
CONTENTS 11.1 Introduction........................................................................................................................... 192 11.2 Neurophysiological Studies in Animal Models: Integrative Principles as a Foundation for Understanding Multisensory Interactions........................................................................ 192 11.3 Neurophysiological Studies in Animal Models: New Insights into Interdependence of Integrative Principles............................................................................................................. 193 11.3.1 Spatial Receptive Field Heterogeneity and Its Implications for Multisensory Interactions................................................................................................................ 193 11.3.2 Spatiotemporal Dynamics of Multisensory Processing............................................ 197 11.4 Studying Multisensory Integration in an Awake and Behaving Setting: New Insights into Utility of Multisensory Processes.................................................................................. 199 11.5 Human Behavioral and Perceptual Studies of Multisensory Processing: Building Bridges between Neurophysiological and Behavioral and Perceptual Levels of Analysis.........201 11.5.1 Defining the “Temporal Window” of Multisensory Integration............................... 201 11.5.2 Stimulus-Dependent Effects on the Size of the Multisensory Temporal Window....202 11.5.3 Can “Higher-Order” Processes Affect Multisensory Temporal Window?............... 203 11.6 Adult Plasticity in Multisensory Temporal Processes: Psychophysical and Neuroimaging Evidence........................................................................................................203 11.7 Developmental Plasticity in Multisensory Representations: Insights from Animal and Human Studies....................................................................................................................... 205 11.7.1 Neurophysiological Studies into Development of Multisensory Circuits..................205 11.7.2 Development of Integrative Principles......................................................................206 11.7.3 Experientially Based Plasticity in Multisensory Circuits..........................................207 11.7.4 Development of Human Multisensory Temporal Perception....................................207 11.8 Conclusions and Future Directions........................................................................................209 References....................................................................................................................................... 210
191
192
The Neural Bases of Multisensory Processes
11.1 INTRODUCTION Multisensory processing is a pervasive and critical aspect of our behavioral and perceptual repertoires, facilitating and enriching a wealth of processes including target identification, signal detection, speech comprehension, spatial navigation, and flavor perception to name but a few. The adaptive advantages that multisensory integration confers are critical to survival, with effective acquisition and use of multisensory information enabling the generation of appropriate behavioral responses under circumstances in which one sense is inadequate. In the behavioral domain, a number of studies have illustrated the strong benefits conferred under multisensory circumstances, with the most salient examples including enhanced orientation and discrimination (Stein et al. 1988, 1989), improved target detection (Frassinetti et al. 2002; Lovelace et al. 2003), and speeded responses (Hershenson 1962; Hughes et al. 1994; Frens et al. 1995; Harrington and Peck 1998; Corneil et al. 2002; Forster et al. 2002; Molholm et al. 2002; Amlot et al. 2003; Diederich et al. 2003; Calvert and Thesen 2004). Along with these behavioral examples, there are myriad perceptual illustrations of the power of multisensory interactions. For example, the intensity of a light is perceived as greater when presented with a sound (Stein et al. 1996) and judgments of stimulus features such as speed and orientation are often more accurate when combined with information available from another sense (Soto-Faraco et al. 2003; Manabe and Riquimaroux 2000; Clark and Graybiel 1966; Wade and Day 1968). One of the most compelling examples of multisensory-mediated perceptual gains can be seen in the speech realm, where the intelligibility of a spoken signal can be greatly enhanced when the listener can see the speaker’s face (Sumby and Pollack 1954). In fact, this bimodal gain may be a principal factor in the improvements in speech comprehension seen in those with significant hearing loss after visual training (Schorr et al. 2005; Rouger et al. 2007). Regardless of whether the benefits are seen in the behavioral or perceptual domains, they typically exceed those that are predicted on the basis of responses to each of the component unisensory stimuli (Hughes et al. 1994, 1998; Corneil and Munoz 1996; Harrington and Peck 1998). Such deviations from simple additive models provide important insights into the neural bases for these multisensory interactions in that they strongly argue for a convergence and active integration of the different sensory inputs within the brain.
11.2 NEUROPHYSIOLOGICAL STUDIES IN ANIMAL MODELS: INTEGRATIVE PRINCIPLES AS A FOUNDATION FOR UNDERSTANDING MULTISENSORY INTERACTIONS Information from multiple sensory modalities converges at many sites within the central nervous system, providing the necessary anatomical framework for multisensory interactions (Calvert and Thesen 2004; Stein and Meredith 1993). Multisensory convergence at the level of the single neuron commonly results in an integrated output such that the multisensory response is typically distinct from the component responses, and often from their predicted addition as well. Seminal studies of multisensory processing initially focused on a midbrain structure, the superior colliculus (SC), because of its high incidence of multisensory neurons, its known spatiotopic organization, and its well-defined role in controlling orientation movements of the eyes, pinnae, and head (Sparks 1986; Stein and Meredith 1993; Sparks and Groh 1995; Hall and Moschovakis 2004; King 2004). These foundational studies of the SC of cats (later reaffirmed by work in nonhuman primate models, see Wallace and Stein 1996, 2001; Wallace et al. 1996) provided an essential understanding of the organization of multisensory neurons and the manner in which they integrate their different sensory inputs. In addition to characterizing the striking nonlinearities that frequently define the responses of these neurons under conditions of multisensory stimulation, these studies established a series of fundamental principles that identified key stimulus features that govern multisensory interactions (Meredith and Stein 1983, 1985, 1986; Meredith et al. 1987). The spatial principle deals
Spatial and Temporal Features of Multisensory Processes
193
with the physical location of the paired stimuli, and illustrates the importance of spatial proximity in driving the largest proportionate gains in response. Similarly, the temporal principle captures the fact that the largest gains are typically seen when stimuli are presented close together in time, and that the magnitude of the interaction declines as the stimuli become increasingly separated in time. Finally, the principle of inverse effectiveness reflects the fact that the largest gains are generally seen to the pairing of two weakly effective stimuli. As individual stimuli become increasingly effective in driving neuronal responses, the size of the interactions seen to the pairing declines. Together, these principles have provided an essential predictive outline for understanding multisensory integration at the neuronal level, as well as for understanding the behavioral and perceptual consequences of multisensory pairings. However, it is important to point out that these principles, although widely instructive, fail to capture the complete integrative profile of any individual neuron. The reason for this is that space, time, and effectiveness are intimately intertwined in naturalistic stimuli, and manipulating one has a consequent effect on the others. Recent studies, described in the next section, have sought to better understand the strong interdependence between these factors, with the hope of better elucidating the complex spatiotemporal architecture of multisensory interactions.
11.3 NEUROPHYSIOLOGICAL STUDIES IN ANIMAL MODELS: NEW INSIGHTS INTO INTERDEPENDENCE OF INTEGRATIVE PRINCIPLES 11.3.1 S patial Receptive Field Heterogeneity and Its Implications for Multisensory Interactions Early observations during the establishment of the neural principles of multisensory integration hinted at a complexity not captured by integrative “rules” or constructs. For example, in structuring experiments to test the spatial principle, it was clear that stimulus location not only played a key role in the magnitude of the multisensory interaction, but also that the individual sensory responses were strongly modulated by stimulus location. Such an observation suggested an interaction between the spatial and inverse effectiveness principles, and one that might possibly be mediated by differences in unisensory responses as a function of location within the neuron’s receptive field. Recently, this concept has been tested by experiments specifically designed to characterize the microarchitecture of multisensory receptive fields. In these experiments, stimuli from each of the effective modalities were presented at a series of locations within and outside the classically defined excitatory receptive field of individual multisensory neurons (Figure 11.1). Studies were conducted in both subcortical (i.e., SC) and cortical [i.e., the anterior ectosylvian sulcus (AES)] multisensory domains in the cat, in which prior work had illustrated that the receptive fields of multisensory neurons are quite large (Stein and Meredith 1993; Benedek et al. 2004; Furukawa and Middlebrooks 2002; Middlebrooks and Knudsen 1984; Middlebrooks et al. 1998; Xu et al. 1999; Wallace and Stein 1996, 1997; Nagy et al. 2003). In this manner, spatial receptive field (SRFs) can be created for each of the effective modalities, as well as for the multisensory combination. It is important to point out that in these studies, the stimuli are identical (e.g., same luminance, loudness, and spectral composition) except for their location. The results of these analyses have revealed a marked degree of heterogeneity to the SRFs of both SC and AES multisensory neurons (Carriere et al. 2008; Royal et al. 2009). This response heterogeneity is typically characterized by regions of high response (i.e., hot spots) surrounded by regions of substantially weaker response. Studies are ongoing to determine whether features such as the number or size of these hot spots differ between subcortical and cortical areas. Although these SRF analyses have revealed a previously uncharacterized feature of multisensory neurons, perhaps the more important consequence of this SRF heterogeneity is the implication that this has for multisensory interactions. At least three competing hypotheses can be envisioned for the role of receptive field heterogeneity in multisensory integration—each with strikingly different
194
The Neural Bases of Multisensory Processes Receptive field locations
Stimulus locations
SUA
SDF Visual Auditory Spikes/s
100
Elevation (deg)
SRF
50 0 –200 0 200 400 600 Time (ms)
Azimuth (deg)
FIGURE 11.1 Construction of an SRF for an individual multisensory neuron. Each stimulus location tested within receptive field generates a response that is then compiled into a single unit activity (SUA) plot. SUA plot at one location is shown in detail to illustrate how spike density function (SDF) is derived. Finally, SDF/ SUA data are transformed into a pseudocolor SRF plot in which normalized evoked response is shown relative to azimuth and elevation. Evoked responses are scaled to maximal response, with warmer colors representing higher firing rates. (Adapted from Carriere, B.N. et al., J. Neurophysiol., 99, 2357–2368, 2008.)
predictions. The first is that spatial location takes precedence and that the resultant interactions would be completely a function of the spatial disparity between the paired stimuli. In this scenario, the largest interactions would be seen when the stimuli were presented at the same location, and the magnitude of the interaction would decline as spatial disparity increased. Although this would seem to be a strict interpretation of the spatial principle, in fact, even the early characterization of this principle focused not on location or disparity, but rather on the presence or absence of stimuli within the receptive field (Meredith and Stein 1986), hinting at the relative lack of importance of absolute location. The second hypothesis is that stimulus effectiveness would be the dominant factor, and that the interaction would be dictated not by spatial location but rather by the magnitude of the individual sensory responses (which would be modulated by changes in spatial location). The final hypothesis is that there is an interaction between stimulus location and effectiveness, such that both would play a role in shaping the resultant interaction. If this were the case, studies would seek to identify the relative weighting of these two stimulus dimensions to gain a better mechanistic view into these interactions. The first foray into this question focused on cortical area AES (Carriere et al. 2008). Here, it was found that SRF architecture played an essential deterministic role in the observed multisensory interactions, and most intriguingly, in a manner consistent with the second hypothesis outlined above. Thus, and as illustrated in Figure 11.2, SRF architecture resulted in changes in stimulus effectiveness that formed the basis for the multisensory interaction. In the neuron shown, if the stimuli were presented in a region of strong response within the SRF, a response depression would result (Figure 11.2b, left column). In contrast, if the stimuli were moved to a location of weak response, their pairing resulted in a large enhancement (Figure 11.2b, center column). Intermediate regions
0
10
20
0
10
20
Azimuth (deg)
0.2
–10
0.4
–30 0
0.6
0
0.8
1
–15
15
Multisensory
0.2
–30
Azimuth (deg)
0.4
–15 0
0.6
0
1 0.8
0
20
15
–10
10
Azimuth (deg)
–10
0
0.2
0.4
0.6
0.8
(d)
0
100
69
143 200
300
263
0
99
100
198 267
200
300
V A M Stimulus condition
0
5
10
–100
0
100
200
0 100 200 300 400 Time from stimulus onset (ms)
–100
400
400
No stimulus evoked response detected
–100
0
100
200
7
14
S A V
0
100
200
7
14
S A V
–100
0
100
200
7
S A V
14
+2SD Mean
+2SD Mean
+2SD Mean
0
100
64 200
300
–100
0
56
100
199
200
286
300
V A M Stimulus condition
0
5
10
–100
0
100
200
0 100 200 300 400 Time from stimulus onset (ms)
–100
400
400
No stimulus evoked response detected
–100
131 175
+2SD Mean
+2SD Mean
+2SD Mean
0
100
65 200
300
253
0
65
106
100
241
200
300
0
5 V A M Stimulus condition
–100
0
100
200
0 100 200 300 400 Time from stimulus onset (ms)
10
–100
–100
400
400
No stimulus evoked response detected
–100
153
+2SD Mean
+2SD Mean
+2SD Mean
Multisensory interaction (%)
Multisensory interaction (%)
Multisensory interaction (%)
FIGURE 11.2 (See color insert.) Multisensory interactions in AES neurons differ based on location of paired stimuli. (a) Visual, auditory, and multisensory SRFs are shown with highlighted locations (b, d) illustrating response suppression (left column), response enhancement (middle column), and no significant interaction (right column). (c) Shaded areas depict classically defined receptive fields for visual (blue) and auditory (green) stimuli.
(c)
–30
–15
0
Auditory (50%)
Elevation (deg)
Elevation (deg)
Elevation (deg)
Trial Stim Spikes/s Trial Stim Spikes/s Trial Stim Spikes/s
15
(b)
Mean stimulus evoked spikes/trial
1
Mean stimulus evoked spikes/trial
Visual
Mean stimulus evoked spikes/trial
(a)
Spatial and Temporal Features of Multisensory Processes 195
196
The Neural Bases of Multisensory Processes
of response resulted in either weak or no interactions (Figure 11.2b, right column). In addition to this traditional measure of multisensory gain (relative to the best unisensory response), these same interactions can also be examined and quantified relative to the predicted summation of the unisensory responses (Wallace et al. 1992; Wallace and Stein 1996; Stein and Wallace 1996; Stanford et al. 2005; Royal et al. 2009; Carriere et al. 2008). In these comparisons, strongly effective pairings typically result in subadditive interactions, weakly effective pairings result in superadditive interactions, and intermediate pairings result in additive interactions. Visualization of these different categories of interactions relative to additive models can be captured in pseudocolor representations such as that shown in Figure 11.3, in which the actual multisensory SRF is contrasted against that predicted on the basis of additive modeling. Together, these results clearly illustrate the primacy of stimulus efficacy in dictating multisensory interactions, and that the role of space per se appears to be a relatively minor factor in governing these integrative processes. Parallel studies are now beginning to focus on the SC, and provide an excellent comparative framework from which to view multisensory interactive mechanisms across brain structures. In this work, Krueger et al. (2009) reported that the SRF architecture of multisensory neurons in the SC is not only similar to that of cortical neurons, but also that stimulus effectiveness appears to once again be the key factor in dictating the multisensory response. Thus, stimulus pairings within regions of weak unisensory response often resulted in superadditive interactions (Figure 11.4b–c, ◼), whereas pairings at locations of strong unisensory responses typically exhibited subadditive interactions (Figure 11.4b–c, ○). Overall, such an organization presumably boosts signals within weakly effective regions of the unisensory SRFs during multisensory stimulus presentations and yields more reliable activation for each stimulus presentation. Although SRF architecture appears similar in both cortical and subcortical multisensory brain regions, there are also subtle differences that may provide important insights into both the underlying mechanistic operations and the different behavioral and perceptual roles of AES and SC. For example, when the SRFs of a multisensory neuron in the SC are compared under different sensory
Azimuth (deg)
Azimuth (deg)
10
10
30
30
50
50
0
200
400
600
V A
(V + A)
A
0
200
400
600 1
Azimuth (deg)
Azimuth (deg)
Auditory
V A
Multisensory
V
Visual
10
10
30
30
50
50
200 0 600 400 Time from stim onset (ms)
0 200 400 600 Time from stim onset (ms)
0
FIGURE 11.3 Multisensory interactions relative to additive prediction models. Visual, auditory, and multisensory (VA) SRFs are shown for an individual multisensory neuron of AES. True multisensory responses can be contrasted with those predicted by an additive model (V + A) and reveal a richer integrative microarchitecture than predicted by simple linear summation of unisensory response profiles. (Adapted from Carriere, B.N. et al., J. Neurophysiol., 99, 2357–2368, 2008.)
197
Spatial and Temporal Features of Multisensory Processes
60 30 0 30 −60
Multisensory (M)
Visual (V)
−60 −30 0 30 Azimuth (deg)
−60 −30 0 30 Azimuth (deg)
(b)
Auditory (A)
Elevation (deg)
Elevation (deg)
(a)
60 30 0 30 −60
−60 −30 0 30 Azimuth (deg)
−60 −30 0 30 Azimuth (deg)
Spikes/s stim
(c) A V
V A M
V A M
150 100 50 0
–100
0 100 200 300 400 Time from stimulus onset (ms)
–100
0 100 200 300 400 Time from stimulus onset (ms)
FIGURE 11.4 Multisensory interactions in SC neurons differ based on location of paired stimuli. (a) Visual, auditory, and multisensory SRFs are shown as a function of azimuth (x axis) and elevation (y axis). Specific locations within receptive field (b) are illustrated in detail (c) to show evoked responses for visual, auditory, and multisensory conditions. Weakly effective locations (square) result in response enhancement, whereas conditions evoking a strong unisensory response (circle) result in response suppression.
conditions, there appears to be a global similarity in the structure of each SRF with respect to both the number and location of hot spots. This might indicate that the overall structure of the SRF is dependent on fixed anatomical and/or biophysical constraints such as the extent of dendritic arbors. However, these characteristics are far less pronounced in cortical SRFs (Carriere et al. 2008), possibly due to the respective differences in the inputs to these two structures (the cortex receiving more heterogeneous inputs) and/or due to less spatiotopic order in the cortex. Future work will seek to better clarify these intriguing differences across structures.
11.3.2 Spatiotemporal Dynamics of Multisensory Processing In addition to the clear interactions between space and effectiveness captured by the aforementioned SRF analyses, an additional stimulus dimension that needs to be included is time. For example, and returning to the initial outlining of the interactive principles, changing stimulus location impacts not only stimulus effectiveness, but also the temporal dynamics of each of the unisensory (and multisensory) responses. Thus, dependent on the location of the individual stimuli, responses will have very different temporal patterns of activation. More recently, the importance of changes in temporal response profiles has been highlighted by findings that the multisensory responses of SC neurons show shortened latencies when compared with the component unisensory responses (Rowland et al. 2007), a result likely underlying the behavioral finding of the speeding of saccadic eye movements under multisensory conditions (Frens and Van Opstal 1998; Frens et al. 1995; Hughes et al. 1998; Amlot et al. 2003; Bell et al. 2005). Additional work focused on the temporal dimension of multisensory responses has extended the original characterization of the temporal principle to nonhuman primate cortex, where Kayser and colleagues (2008) have found that audiovisual interactions in the superior temporal plane of rhesus monkey neocortex are maximal when a visual stimulus precedes an auditory stimulus by 20 to 80
198
The Neural Bases of Multisensory Processes
Change in firing rate (%)
ms. Along with these unitary changes, recent work had also shown that the timing of sensory inputs with respect to ongoing neural oscillations in the neocortex has a significant impact on whether neuronal responses are enhanced or suppressed. For instance, in macaque primary auditory cortex, properly timed somatosensory input has been found to reset ongoing oscillations to an optimal excitability phase that enhances the response to temporally correlated auditory input. In contrast, somatosensory input delivered during suboptimal, low-excitability oscillatory periods depresses the auditory response (Lakatos et al. 2007). Although clearly illustrating the importance of stimulus timing in shaping multisensory interactions, these prior studies have yet to characterize the interactions between time, space, and effectiveness in the generation of a multisensory response. To do this, recent studies from our laboratory have extended the SRF analyses described above to include time, resulting in the creation of spatiotemporal receptive field (STRF) plots. It is important to point out that such analyses are not a unique construct to multisensory systems, but rather stem from both spatiotemporal and spectrotemporal receptive field studies within individual sensory systems (David et al. 2004; Machens et al. 2004; Haider et al. 2010; Ye et al. 2010). Rather, the power of the STRF here is its application to multisensory systems as a modeling framework from which important mechanistic insights can be gained about the integrative process. The creation of STRFs for cortical multisensory neurons has revealed interesting features about the temporal dynamics of multisensory interactions and the evolution of the multisensory response (Royal et al. 2009). Most importantly, these analyses, when contrasted with simple additive models based on the temporal architecture of the unisensory responses, identified two critical epochs in the multisensory response not readily captured by additive processes (Figure 11.5). The first of these, presaged by the Rowland et al. study described above, revealed an early phase of superadditive multisensory responses that manifest as a speeding of response (i.e., reduced latency) under 200 150 100 50 0 −50
Firing rate (Hz)
Integration (%)
−100
−15 −10 −5
0
5
10
15
Latency shift (ms)
20
25
–50
−25
0
25
50
Duration shift (ms)
75
800 400 0 60 45 30 15 0
–500
0
500
Time from onset of predicted multisensory response (ms)
1000
–1000
–500
0
Time from offset of predicted multisensory response (ms)
500
FIGURE 11.5 Spatiotemporal response dynamics in multisensory AES neurons. A reduced response latency and increased response duration characterized spatiotemporal dynamics of paired multisensory stimuli.
Spatial and Temporal Features of Multisensory Processes
199
multisensory conditions. The second of these happens late in the response epoch, where the multisensory response continues beyond the truncation of the unisensory responses, effectively increasing response duration under multisensory circumstances. It has been postulated that these two distinct epochs of multisensory integration may ultimately be linked to very different behavioral and/or perceptual roles (Royal et al. 2009). Whereas reduced latencies may speed target detection and identification, extended response duration may facilitate perceptual analysis of the object or area of interest. One interesting hypothesis is that the early speeding of responses will be more prominent in SC multisensory neurons given their important role in saccadic (and head) movements, and that the extended duration will be seen more in cortical networks engaged in perceptual analyses. Future work, now in progress in our laboratory (see below), will seek to clarify the behavioral/perceptual roles of these integrative processes by directly examining the links at the neurophysiological and behavioral levels.
11.4 STUDYING MULTISENSORY INTEGRATION IN AN AWAKE AND BEHAVING SETTING: NEW INSIGHTS INTO UTILITY OF MULTISENSORY PROCESSES As research on the neural substrates of multisensory integration progresses, and as the behavioral and perceptual consequences of multisensory combinations become increasingly apparent, contemporary neuroscience is faced with the challenge of bridging between the level of the single neuron and whole animal behavior and perception. To date, much of the characterization of multisensory integration at the cellular level has been conducted in anesthetized animals, which offer a variety of practical advantages. However, given that anesthesia could have substantial effects on neural encoding, limiting the interpretation of results within the broader construct of perceptual abilities (Populin 2005; Wang et al. 2005; Ter-Mikaelian et al. 2007), the field must now turn toward awake preparations in which direct correlations can be drawn between neurons and behavior/perception. Currently, in our laboratory, we are using operant conditioning methods to train animals to fixate on a single location while audiovisual stimuli are presented in order to study SRF architecture in this setting (and compare these SRFs with those generated in anesthetized animals). In addition to providing a more naturalistic view into receptive field organization, these studies can then be extended in order to begin to address the relationships between the neural and behavioral levels. One example of this is the use of a delayed saccade task, which has been used in prior work to parse sensory from motor responses in the SC (where many neurons have both sensory and motor activity; Munoz et al. 1991a, 1991b; Munoz and Guitton 1991; Guitton and Munoz 1991). In this task, an animal is operantly conditioned to fixate on a simple visual stimulus (a light-emitting diode or LED), and to hold fixation for the duration of the LED. While maintaining fixation, a peripheral LED illuminates, resulting in a sensory (i.e., visual) response in the SC. A short time later (usually on the order of 100–200 ms), the fixation LED is shut off, cueing the animal to generate a motor response to the location at which the target was previously presented. The “delay” allows the sensory response to be dissociated from the motor response, thus providing insight into the nature of the sensory–motor transform. Although such delayed saccade tasks have been heavily employed in both the cat and monkey, they are typically used to eliminate “confounding” sensory influences on the motor responses. Another advantage afforded by the awake preparation is the ability to study how space, time, and effectiveness interact in a state more reflective of normal brain function, and which is likely to reveal important links between multisensory neuronal interactions and behavioral/perceptual enhancements such as speeded responses, increased detection, and accuracy gains. Ideally, these analyses could be structured to allow direct neurometric–psychometric comparisons, providing fundamental insights into how individual neurons and neuronal assemblies impact whole organismic processes.
0 10 20 30 50
VA
(V+A)
–1.0
0.0
1.0
150 100 50 0 –100 0
150 100 50 0
150 100 50 0
100 200 300 400 Time (ms)
Rasters and perievent time histograms
STRFs
100 200 300 400 0 Time from stim onset (ms)
40
20
0
40
20
0
40
20
0
40
20
0
40
20
0
(b)
–1.0
0.0
1.0
Spikes/s
Spikes/s
0 –100 0
100
200
0
100
200
0
100
200
100 200 300 400 Time (ms)
Rasters and perievent time histograms
FIGURE 11.6 (See color insert.) Representative STRF from awake (a) versus anesthetized (b) recordings from cat SC using simple audiovisual stimulus presentations (an LED paired with broadband noise). In awake animals, superadditive interactions occurred over multiple time points in multisensory condition (VA) when compared to what would be predicted based on a linear summation of unisensory responses (V + A; see contrast, VA – [V + A]). This differs from anesthetized recordings from SC in which multisensory interactions are limited to earliest temporal phase of multisensory response.
VA–(V+A)
0 10 20 30 50
A
STRFs
0 10 20 30 50 0 100 200 300 400 Time from stim onset (ms)
0 10 20 30 50
V
0 10 20 30 50
Azimuth (deg)
(a)
200 The Neural Bases of Multisensory Processes
Spatial and Temporal Features of Multisensory Processes
201
Preliminary studies have already identified that multisensory neurons in the SC of the awake cat demonstrate extended response durations, as well as superadditive interactions over multiple time scales, when compared to anesthetized animals in which multisensory interactions are typically limited to the early phases of the response (Figure 11.6; Krueger et al. 2008). These findings remain to be tested in multisensory regions of the cortex, or extended beyond simple stimuli (LEDs paired with white noise) to more complex, ethologically relevant cues that might better address multisensory perceptual capabilities. Responses to naturalistic stimuli in cats have primarily been examined in unisensory cortices, demonstrating that simplification of natural sounds (bird chirps) results in significant alteration of neuronal responses (Bar-Yosef et al. 2002) and that firing rates differ for natural versus time-reversed conspecific vocalizations (Qin et al. 2008) in the primary auditory cortex. Furthermore, multisensory studies in primates have shown that multisensory enhancement in the primary auditory cortex of awake monkeys was reduced when a mismatched pair of naturalistic audiovisual stimuli was presented (Kayser et al. 2010).
11.5 HUMAN BEHAVIORAL AND PERCEPTUAL STUDIES OF MULTISENSORY PROCESSING: BUILDING BRIDGES BETWEEN NEUROPHYSIOLOGICAL AND BEHAVIORAL AND PERCEPTUAL LEVELS OF ANALYSIS As should be clear from the above description, the ultimate goal of neurophysiological studies is to provide a more informed view into the encoding processes that give rise to our behaviors and perceptions. Indeed, these seminal findings in the animal model can be used as important instruction sets for the design of experiments in human subjects to bridge between these domains. Recently, our laboratory has embarked on such experiments with a focus on better characterizing how stimulus timing influences multisensory perceptual processes, with a design shaped by our knowledge of the temporal principle.
11.5.1 Defining the “Temporal Window” of Multisensory Integration In addition to emphasizing the importance of stimulus onset asynchrony (SOA) in determining the outcome of a given multisensory pairing, experiments in both SC and AES cortex of the cat showed that the span of time over which response enhancements are generally seen in these neurons is on the order of several hundred milliseconds (Meredith et al. 1987; Wallace and Stein 1996; Wallace et al. 1992, 1996). Behavioral studies have followed up on these analyses to illustrate the temporal constraints of multisensory combinations on human performance, and have found that the presentation of cross-modal stimulus pairs in close temporal proximity results in shortened saccadic reaction times (Colonius and Diederich 2004; Colonius and Arndt 2001; Frens et al. 1995), heightened accuracy in understanding speech in noise (McGrath and Summerfield 1985; Pandey et al. 1986; van Wassenhove et al. 2007), as well as playing an important role in multisensory illusions such as the McGurk effect (Munhall et al. 1996), the sound-induced flash illusion (Shams et al. 2000, 2002), the parchment skin illusion (Guest et al. 2002), and the stream-bounce illusion (Sekuler et al. 1997). Moreover, multisensory interactions as demonstrated using population-based functional imaging methods (Dhamala et al. 2007; Kavounoudias et al. 2008; Macaluso et al. 2004; Noesselt et al. 2007) have been shown to be greatest during synchronous presentation of stimulus pairs. Perhaps even more important than synchrony in these studies was the general finding that multisensory interactions were typically preserved over an extended window of time (i.e., several hundred milliseconds) surrounding simultaneity, giving rise to the term “temporal window” for describing the critical period for these interactions (Colonius and Diederich 2004; van Wassenhove et al. 2007; Dixon and Spitz 1980). The concept of such a window makes good ethological sense, in that it provides a buffer for the latency differences that characterize the propagation times of energies in the different senses. Most illustrative here are the differences between the propagation times of light and sound in our environment, which differ by many orders of magnitude. As a simple example of
202
The Neural Bases of Multisensory Processes
this difference, take an audiovisual event happening at a distance of 1 m, where the incident energies will arrive at the retina almost instantaneously and at the cochlea about 3 ms later (the speed of sound is approximately 330 m/s). Now, if we move that same audiovisual source to a distance of 20 m, the difference in arrival times expands to 60 ms. Hence, having a window of tolerance for these audiovisual delays represents an effective means to continue to bind stimuli across modalities even without absolute correspondence in their incident arrival times. Because of the importance of temporal factors for multisensory integration, a number of experimental paradigms have been developed for use in human subjects as a way to systematically study the temporal binding window and its associated dynamics. One of the most commonly used of these is a simultaneity judgment task, in which paired visual and auditory stimuli are presented at various SOAs and participants are asked to judge whether the stimuli occurred simultaneously or successively (Zampini et al. 2005a; Engel and Dougherty 1971; Stone et al. 2001; Stevenson et al. 2010). A distribution of responses can then be created that plots the probability of simultaneity reports as a function of SOA. This distribution yields not only the point of subjective simultaneity, defined as the peak of function (Stone et al. 2001; Zampini et al. 2005a) but, more importantly, can be used to define a “window” of time within which simultaneity judgments are highly likely. A similar approach is taken in paradigms designed to assess multisensory temporal order judgments, wherein participants judge whether stimuli within one or another modality was presented first. Similar to the simultaneity judgment task, the point of subjective simultaneity is the time point at which participants judge either stimulus to have occurred first at a rate of 50% (Zampini et al. 2003; Spence et al. 2001). Once again, this method can also be adapted to create response distributions that serve as proxies for the temporal binding window. Although the point measures (i.e., point of subjective simultaneity) derived from these studies tend to differ based on the paradigm chosen (Fujisaki et al. 2004; Vroomen et al. 2004; Zampini et al. 2003, 2005a), the span of time over which there is a high likelihood of reporting simultaneity is remarkably constant, ranging from about –100 ms to 250 ms, where negative values denote auditory-leading-visual conditions (Dixon and Spitz 1980; Fujisaki et al. 2004; Vroomen et al. 2004; Zampini et al. 2003, 2005a). The larger window size on the right side of these distributions—in which vision leads audition—appears in nearly all studies of audiovisual simultaneity perception, and has been proposed to arise from the inherent flexibility needed to process real-world audiovisual events, given that the propagation speeds of light and sound will result in SOAs only on the right side of these distributions (Dixon and Spitz 1980). Indeed, very recent efforts to model the temporal binding window within a probabilistic framework (Colonius and Diederich 2010a, 2010b) have described this asymmetry as arising from an asymmetry in Bayesian priors across SOAs corresponding to the higher probability that visual-first pairs were generated by the same external event.
11.5.2 Stimulus-Dependent Effects on the Size of the Multisensory Temporal Window Although some have argued for an invariant size to the temporal window (see Munhall et al. 1996), there is a growing body of evidence to suggest that the size of the temporal window is very much dependent on the type of stimulus that is used (Dixon and Spitz 1980; van Wassenhove et al. 2008; Soto-Faraco and Alsius 2009). The largest distinctions in this domain have been seen when contrasting speech versus nonspeech stimuli, in which the window for speech appears to be far larger (approximately 450 ms) when compared with the pairing of simpler stimuli such as flash-tone pairs or videos of inanimate objects, such as a hammer pounding a nail—about 250 ms (Dixon and Spitz 1980; van Atteveldt et al. 2007; van Wassenhove et al. 2007; Massaro et al. 1996; Conrey and Pisoni 2006; McGrath and Summerfield 1985). Interpretation of this seeming expansion in the case of speech has ranged from the idea that learned tolerance of asynchrony is greatest with stimuli to which we are most exposed (Dixon and Spitz 1980), to the theory that the richness of auditory spectral and visual dynamic content in speech allows for binding over a larger range of asynchrony (Massaro et al. 1996), to the view that speech window size is dictated
Spatial and Temporal Features of Multisensory Processes
203
by the duration of the elemental building blocks of the spoken language—phonemes (Crystal and House 1981). Other studies have focused on altering the statistics of multisensory temporal relations in an effort to better characterize the malleability of these processes. For example, repeated exposure to a 250-ms auditory-leading-visual asynchronous pair is capable of biasing participants’ simultaneity judgments in the direction of that lag by about 25 ms, with effects lasting on the order of minutes (Fujisaki et al. 2004; Vroomen et al. 2004). Similar recalibration effects have been noted after exposure to asynchronous audiovisual speech, as well as to visual–tactile, audio–tactile, and sensory– motor pairs (Hanson et al. 2008; Fajen 2007; Stetson et al. 2006; Navarra et al. 2005). Although the exact mechanisms underlying these changes are unknown, they have been proposed to represent a recalibration of sensory input consistent with Bayesian models of perception (Hanson et al. 2008; Miyazaki et al. 2005, 2006).
11.5.3 Can “Higher-Order” Processes Affect Multisensory Temporal Window? In addition to these studies examining stimulus-dependent effects, other works have sought to determine the malleability of multisensory temporal processing resulting from the manipulation of cognitive processes derived from top-down networks. Much of this work has focused on attentional control, and has been strongly influenced by historical studies showing that attention within a modality could greatly facilitate information processing of a cued stimulus within that modality. This work has now been extended to the cross-modal realm, and has shown that attention to one modality can bias temporally based judgments concerning a stimulus in another modality (Zampini et al. 2005b; Spence et al. 2001; Shore et al. 2001), illustrating the presence of strong attentional links between different sensory systems.
11.6 ADULT PLASTICITY IN MULTISENSORY TEMPORAL PROCESSES: PSYCHOPHYSICAL AND NEUROIMAGING EVIDENCE Further work in support of top-down influences on multisensory perception have focused on characterizing the plasticity that can be engendered with the use of classic perceptual learning paradigms. The first of these studies were directed outside the temporal domain, and focused on the simple question of whether perceptual learning within a single sensory modality can be improved with the use of cross-modal stimuli. In these studies, participants were trained on a motion discrimination task using either a visual cue alone or combined visual–auditory cues. Results reveal enhanced visual motion discrimination abilities and an abbreviated time course of learning in the group trained on the audiovisual version of the task when compared with those trained only on the visual version (Kim et al. 2008; Seitz et al. 2006). Similar results have been seen in the visual facilitation of voice discrimination learning (von Kriegstein and Giraud 2006), cross-modal enhancement of both auditory and visual natural object recognition (Schneider et al. 2008), and in the facilitation of unisensory processing based on prior multisensory memories (Murray et al. 2004, 2005). More recently, our laboratory has extended these perceptual plasticity studies into the temporal realm, by attempting to assess the plasticity of the multisensory temporal binding window itself. Initial efforts used a two-alternative forced choice audiovisual simultaneity judgment task in which subjects were asked to choose on a trial-by-trial basis whether a stimulus pair was synchronously or asynchronously presented (Powers et al. 2009). In the initial characterization (i.e., before training), a distribution of responses was obtained that allowed us to define a proxy measure for the multisensory temporal binding window for each individual subject (Figure 11.7). After this baseline measurement, subjects were then engaged in the same task, except that now they were given feedback as to the correctness of their judgments. Training was carried out for an hour a day over 5 days. This training regimen resulted in a marked narrowing in the width of the multisensory temporal binding
204
The Neural Bases of Multisensory Processes (b)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Probability of simultaneity judgment
Probability of simultaneity judgment
(a) Baseline Post-training day 5
321 ms 115 ms
–300 –200 –100 0 100 SOA (ms)
Mean window size (ms)
(c)
400 350 300 250
200
300
1 0.9 0.8
Baseline Post-training day 5 N = 22
*
0.7
*
*
0.6 0.5 0.4 0.3 0.2 0.1 0
–300 –200 –100 0 100 SOA (ms)
200
300
Total Window Right Window Left Window
*
200 150 100 50 0
Baseline Pre Post Pre Post Pre Post Pre Post Pre Post Day 1 Day 1 Day 2 Day 2 Day 3 Day 3 Day 4 Day 4 Day 5 Day 5
FIGURE 11.7 Training on a two-alternative forced choice simultaneity judgment forced choice task. (a) An estimate of temporal binding window is derived using a criterion set at 75% of maximum. In this representative individual case, window narrows from 321 to 115 ms after 5 days (1 h/day) of feedback training. (b) After training, a significant decrease in probability of judging nonsimultaneous audiovisual pairs to be simultaneous was found (*P < .05). (c) Average window size dropped significantly after first day (1 h) of training, then remained stable (*P < .05).
window, with a group average reduction of 40%. Further characterization revealed that the changes in window size were very rapid (being seen after the first day of training), were durable (lasting at least a week after the cessation of training), and were a direct result of the feedback provided (control subjects passively exposed to the same stimulus set did not exhibit window narrowing). Additionally, to rule out the possibility that this narrowing was the result of changes in cognitive biases, a second experiment using a two-interval forced choice paradigm was undertaken in which participants were instructed to identify the simultaneously presented audiovisual pair presented within one of two intervals. The two-interval forced choice paradigm resulted in a narrowing that was similar in both degree and dynamics to that using the two-alternative forced choice approach. Overall, this result is the first to illustrate a marked experience-dependent malleability to the multisensory temporal binding window, a result that has potentially important implication for clinical conditions such as autism and dyslexia in which there is emerging evidence for changes in multisensory temporal function (Ciesielski et al. 1995; Laasonen et al. 2001, 2002; Kern 2002; Hairston et al. 2005; Facoetti et al. 2010; Foss-Feig et al. 2010). In an effort to better define the brain networks responsible for multisensory temporal perception (and the demonstrable plasticity), our laboratory has conducted a follow-up neuroimaging study using functional magnetic resonance imaging (fMRI) (Powers et al. 2010). The findings revealed marked
Spatial and Temporal Features of Multisensory Processes
205
changes in one of the best-established multisensory cortical domains in humans, the posterior superior temporal sulcus (pSTS). The pSTS exhibited striking decreases in blood oxygen level dependent (BOLD) activation after training, suggestive of an increased efficiency of processing. In addition to these changes in pSTS were changes in regions of the auditory and visual cortex, along with marked changes in functional coupling between these unisensory domains and the pSTS. Together, these studies are beginning to reveal the cortical networks involved in multisensory temporal processing and perception, as well as the dynamics of these networks that must be continually adjusted to capture the ever-changing sensory statistics of our natural world as well as their cognitive valence.
11.7 DEVELOPMENTAL PLASTICITY IN MULTISENSORY REPRESENTATIONS: INSIGHTS FROM ANIMAL AND HUMAN STUDIES In addition to this compelling emerging evidence as to the plastic potential of the adult brain for having its multisensory processing architecture shaped in an experience-dependent manner, there is a rich literature on the development of multisensory representations and the role that postnatal experience plays in shaping these events. Although the questions were first posed in the literature associated with the development of human perceptual abilities, more recent work in animal models has laid the foundation for better understanding the seminal events in the maturation of multisensory behaviors and perceptions.
11.7.1 Neurophysiological Studies into Development of Multisensory Circuits The studies described above in adult animal models provide an ideal foundation on which to evaluate the developmental events in the nervous system that lead up to the construction of mature multisensory representations. Hence, subsequent studies focused on establishing the developmental chronology for multisensory neurons and their integrative features in these same model structures— the subcortical SC and the cortical AES. In the SC, recordings immediately after birth reveal an absence of multisensory neurons (Wallace and Stein 1997). Indeed, the first neurons present in the SC at birth and soon after are those that are exclusively responsive to somatosensory cues. By 10 to 12 days postnatal, auditory-responsive neurons appear, setting the stage for the first multisensory neurons that are responsive to both somatosensory and auditory cues. More than a week later, the first visually responsive neurons appear, providing the basis for the first visually responsive multisensory neurons. These early multisensory neurons were found to be far different than their adult counterparts, responded weakly to sensory stimuli, and had poorly developed response selectivity, long latencies, and large receptive fields (Wallace and Stein 1997; Stein et al. 1973a, 1973b). Perhaps most importantly, these early multisensory neurons failed to integrate their different sensory inputs, responding to stimulus combinations in a manner that was indistinguishable from their component unisensory responses (Wallace and Stein 1997). Toward the end of the first postnatal month, this situation begins to change, with individual neurons starting to show the capacity to integrate their different sensory inputs. Over the ensuing several months, both the number of multisensory neurons and those with integrative capacity grow steadily, such that by 4 to 5 months after birth, the adultlike incidences are achieved (Figure 11.8). The developmental progression in the cortex is very similar to that in the SC, except that it appears to be delayed by several weeks (Wallace et al. 2006). Thus, the first multisensory neurons do not appear in AES until about 6 weeks after birth (Figure 11.8). Like with the SC, these early multisensory neurons are reflective of the adjoining unisensory representations, being auditory– somatosensory. Four weeks or so later, we see the appearance of visual neurons and the coincident appearance of visually responsive multisensory neurons. Once again, early cortical multisensory neurons are strikingly immature in many respects, including a lack of integrative capacity. As development progresses, we see a substantial growth in the multisensory population and we see most multisensory AES neurons develop their integrative abilities.
206
The Neural Bases of Multisensory Processes 70
SC AES
% Multisensory cells
60
SC
50 40 30
AES
20 10 0 0
5
10 15 Postnatal age (weeks)
20
Adult
FIGURE 11.8 Development of multisensory neurons in SC (open circles) versus AES (closed circles) of cat. Development of multisensory neurons is similar between SC and AES with exceptions of onset and overall percentage of multisensory neurons. At 4 months postnatal life, percentages of multisensory neurons in both AES and SC are at their mature levels, with SC having a higher percentage than AES.
The parallels between SC and AES in their multisensory developmental chronology likely reflect the order of overall sensory development (Gottlieb 1971), rather than dependent connectivity between the two regions because the establishment of sensory profiles in the SC precedes the functional maturation of connections between AES and the SC (Wallace and Stein 2000). Thus, a gradual recruitment of sensory functions during development appears to produce neurons capable of multisensory integration (Lewkowicz and Kraebel 2004; Lickliter and Bahrick 2004), and points strongly to a powerful role for early experience in sculpting the final multisensory state of these systems (see Section 11.7.3).
11.7.2 Development of Integrative Principles In addition to characterizing the appearance of multisensory neurons and the maturation of their integrative abilities, these studies also examined how the integrative principles changed during the course of development. Intriguingly, the principle of inverse effectiveness appeared to hold in the earliest integrating neurons, in that as soon as a neuron demonstrated integrative abilities, the largest enhancements were seen in pairings of weakly effective stimuli. Indeed, one of the most surprising findings in these developmental studies is the all-or-none nature of multisensory integration. Thus, neurons appear to transition very rapidly from a state in which they lack integrative capacity to one in which that capacity is adult-like in both magnitude and adherence to the principle of inverse effectiveness. In the spatial domain, the situation appears to be much the same. Whereas early multisensory neurons have large receptive fields and lack integration, as soon as receptive fields become adult-like in size, neurons show integrative ability. Indeed, these processes appear to be so tightly linked that it has been suggested that they reflect the same underlying mechanistic process (Wallace and Stein 1997; Wallace et al. 2006). The one principle that appears to differ in a developmental context is the temporal principle. Observations from the earliest integrating neurons show that they typically only show response enhancements to pairings at a single SOA (see Wallace and Stein 1997). This is in stark contrast to adults, in which enhancements are typically seen over a span of SOAs lasting several hundred milliseconds, and which has led to the concept of a temporal “window” for multisensory integration. In these animal studies, as development progresses, the range of SOAs over which enhancements can be generated grow, ultimately resulting in adult-sized distributions reflective of the large temporal window. Why such a progression is seen in the temporal domain and not in the other domains is not
Spatial and Temporal Features of Multisensory Processes
207
yet clear, but may have something to do with the fact that young animals are generally only concerned with events in the immediate proximity to the body (and which would make an SOA close to 0 of greatest utility). As the animal becomes increasingly interested in exploring space at greater distances, an expansion in the temporal window would allow for the better encoding of these more distant events. We will return to the issue of plasticity in the multisensory temporal window when we return to the human studies (see Section 11.7.4).
11.7.3 Experientially Based Plasticity in Multisensory Circuits Although the protracted timeline for the development of mature multisensory circuits is strongly suggestive of a major deterministic role for early experience in shaping these circuits, only with controlled manipulation of this experience can we begin to establish causative links. To address this issue, our laboratory has performed a variety of experiments in which sensory experience is eliminated or altered in early life, after which the consequent impact on multisensory representations is examined. In the first of these studies, the necessity of cross-modal experiences during early life was examined by eliminating all visual experiences from birth until adulthood, and then assessing animals as adults (Wallace et al. 2004; Carriere et al. 2007). Although there were subtle differences between SC and AES in these studies, the impact on multisensory integration in both structures was profound. Whereas dark-rearing allowed for the appearance of a robust (albeit smaller than normal) visual population, its impact on multisensory integration was profound—abolishing virtually all response enhancements to visual–nonvisual stimulus pairings. A second series of experiments then sought to address the importance of the statistical relationship of the different sensory cues to one another on the construction of these multisensory representations. Here, animals were reared in environments in which the spatial relationship between visual and auditory stimuli was systematically altered, such that visual and auditory events that were temporally coincident were always separated by 30°. When examined as adults, these animals were found to have multisensory neurons with visual and auditory receptive fields that were displaced by approximately 30°, but more importantly, to now show maximal multisensory enhancements when stimuli were separated by this disparity (Figure 11.9a). More recent work has extended these studies into the temporal domain, and has shown that raising animals in environments in which the temporal relationship of visual and auditory stimuli is altered by 100 ms results in a shift in the peak tuning profiles of multisensory neurons by approximately 100 ms (Figure 11.9b). Of particular interest was that when the temporal offset was extended to 250 ms, the neurons lost the capacity to integrate their different sensory inputs, suggesting that there is a critical temporal window for this developmental process. Collectively, these results provide strong support for the power of the statistical relations of multisensory stimuli in driving the formation of multisensory circuits; circuits that appear to be optimally designed to code the relations most frequently encountered in the world during the developmental period.
11.7.4 Development of Human Multisensory Temporal Perception The ultimate goal of these animal model–based studies is to provide a better framework from which to view human development, with a specific eye toward the maturation of the brain mechanisms that underlie multisensory-mediated behaviors and perceptions. Human developmental studies on multisensory processing have provided us with important insights into the state of the newborn and infants brains, and have illustrated that multisensory abilities are changing rapidly in the first year of postnatal life (see Lewkowicz and Ghazanfar 2009). Intriguingly, there is then a dearth of knowledge about multisensory maturation until adulthood. In an effort to begin to fill this void, our laboratory has embarked on a series of developmental studies focused on childhood and adolescence, with a specific emphasis on multisensory temporal processes, one of the principal themes of this chapter.
208
The Neural Bases of Multisensory Processes
Multisensory interaction (%)
(a)
150
100
Spatial disparity rearing normal rearing 30˚ spatially disparate rearing
50
0
–50 –30
–10
0
10
20
30
50
60
30˚ Visual-auditory spatial experience shift Temporal disparity rearing
160
normal rearing 100 ms temporal disparate rearing 250 ms temporal disparate rearing
140 120
Multisensory interaction (%)
40
Visual stimulus location relative to auditory (degrees)
–100
(b)
–20
100 80 60 40 20 0 –20 –40 –60
A100V
V=A
V100A
V200A
V300A
stimulus onset asynchrony (ms)
100 ms
V400A
V500A
250 ms
visual-auditory temporal experience shift
FIGURE 11.9 Developmental manipulations of spatial and temporal relationships of audiovisual stimuli. (a) Multisensory interaction is shown as a function of spatially disparate stimuli between normally reared animals and animals reared with a 30° disparity between auditory and visual stimuli. Peak multisensory interaction for disparately reared group falls by 30° from that of normally reared animals. (b) Multisensory interaction as a function of SOA in animals reared normally versus animals reared in environments with 100 and 250 ms temporal disparities. As might be expected, peak multisensory interactions are offset by 100 ms for normally reared versus the 100 ms disparate group. Interestingly, the 250 ms group loses the ability to integrate audiovisual stimuli.
These studies strongly suggest that the maturation of multisensory temporal functioning extends beyond the first decade of life. In the initial study, it was established that multisensory temporal functioning was still not mature by 10 to 11 years of age (Hillock et al. 2010). Here, children were assessed on a simultaneity judgment task in which flashes and tone pips were presented at SOAs ranging from –450 to +450 ms (with positive values representing visual-leading stimulus trials and
209
Spatial and Temporal Features of Multisensory Processes 700
Window size (ms)
600 500 400 300 200 100 0
0
5
10 15 Subject age (y)
20
25
FIGURE 11.10 Temporal window size decreases from childhood to adulthood. Each data point represents a participant’s window size as determined by width at 75% of maximum probability of perceived simultaneity using nonspeech stimuli. See Section 11.5.1. (Adapted from Hillock, A.R. et al., Binding of sights and sounds: Age-related changes in audiovisual temporal processing, 2010, submitted for publication.)
negative values representing auditory-leading stimulus trials), allowing for the creation of a response distribution identical to what has been done in adults and which serves as a proxy for the multisensory temporal binding window (see Section 11.6). When compared with adults, the group mean window size for these children was found to be approximately 38% larger (i.e., 413 vs. 299 ms). A larger follow-up study then sought to detail the chronology of this maturational process from 6 years of age until adulthood, and identified the closure of the binding window in mid to late adolescence for these simple visual–auditory pairings (Figure 11.10; Hillock and Wallace 2011b). A final study then sought to extend these analyses into the stimulus domain with which children likely have the greatest experience—speech. Using the McGurk effect, which uses the pairing of discordant visual and auditory speech stimuli (e.g., a visual /ga/ with an auditory /ba/), it is possible to index the integrative process by looking at how often participants report fusions that represent a synthesis of the visual and auditory cues (e.g., /da/ or /tha/). Furthermore, because this effect has been shown to be temporally dependent, it can be used as a tool to study the multisensory temporal binding window for speech-related stimuli. Surprisingly, when used with children (6–11 years), adolescents (12–17 years), and adults (18–23 years), windows were found to be indistinguishable (Hillock and Wallace 2011a). Together, these studies show a surprising dichotomy between the development of multisensory temporal perception for nonspeech versus speech stimuli, a result that may reflect the powerful imperative placed on speech in young children, and reinforcing the importance of sensory experience in the development of multisensory abilities.
11.8 CONCLUSIONS AND FUTURE DIRECTIONS As should be clear from the above, substantial efforts are ongoing to bridge between the rapidly growing knowledge sets concerning multisensory processing derived from both animal and human studies. This work should not only complement each domain, but should inform the design of better experiments in each. As an example, the final series of human experiments described above begs for a nonhuman correlate to better explore the mechanistic underpinnings that result in very different timelines for the maturation of nonspeech versus speech integrative networks. Experiments in nonhuman primates, in which the critical nodes for communicative signal processing are beginning to emerge (Ghazanfar et al. 2008, 2010), can begin to tease out the relative maturation of the relevant neurophysiological processes likely to result in these distinctions.
210
The Neural Bases of Multisensory Processes
Although we have made great strides in recent years in building a better understanding of multisensory behavioral and perceptual processes and their neural correlates, we still have much to discover. Fundamental questions remain unanswered, providing both a sense of frustration but also a time of great opportunity. One domain of great interest to our laboratory is creating a bridge between the neural and the behavioral/perceptual in an effort to extend beyond the correlative analyses done thus far. Paradigms developed in awake and behaving animals allow for a direct assessment of neural and behavioral responses during performance on the same task, and should more directly link multisensory encoding processes to their striking behavioral benefits (e.g., see Chandrasekaran and Ghazanfar 2009). However, even these experiments provide only correlative evidence, and future work will seek to use powerful new methods such as optogenetic manipulation in animal models (e.g., see Cardin et al. 2009) and transcranial magnetic stimulation in humans (e.g., see Romei et al. 2007; Beauchamp et al. 2010; Pasalar et al. 2010) to selectively deactivate specific circuit components and then assess the causative impact on multisensory function.
REFERENCES Amlot, R., R. Walker, J. Driver, and C. Spence. 2003. Multimodal visual–somatosensory integration in saccade generation. Neuropsychologia, 41, 1–15. Bar-Yosef, O., Y. Rotman, and I. Nelken. 2002. Responses of neurons in cat primary auditory cortex to bird chirps: Effects of temporal and spectral context. Journal of Neuroscience, 22, 8619–8632. Beauchamp, M.S., A.R. Nath, and S. Pasalar. 2010. fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience, 30, 2414–2417. Bell, A.H., M.A. Meredith, A.J. Van Opstal, and D.P. Munoz. 2005. Crossmodal integration in the primate superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of Neurophysiology, 93, 3659–3673. Benedek, G., G. Eordegh, Z. Chadaide, and A. Nagy. 2004. Distributed population coding of multisensory spatial information in the associative cortex. European Journal of Neuroscience, 20, 525–529. Calvert, G.A., and T. Thesen. 2004. Multisensory integration: methodological approaches and emerging principles in the human brain. Journal of Physiology, Paris, 98, 191–205. Cardin, J.A., M. Carlen, K. Meletis, U. Knoblich, F. Zhang, K. Deisseroth, L.H. Tsai, and C.I. Moore. 2009. Driving fast-spiking cells induces gamma rhythm and controls sensory responses. Nature, 459, 663–667. Carriere, B.N., D.W. Royal, T.J. Perrault, S.P. Morrison, J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2007. Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology, 98, 2858–2867. Carriere, B.N., D.W. Royal, and M.T. Wallace. 2008. Spatial heterogeneity of cortical receptive fields and its impact on multisensory interactions. Journal of Neurophysiology, 99, 2357–2368. Chandrasekaran, C., and A.A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. Journal of Neurophysiology, 101, 773–788. Ciesielski, K.T., J.E. Knight, R.J. Prince, R.J. Harris, and S.D. Handmaker. 1995. Event-related potentials in cross-modal divided attention in autism. Neuropsychologia, 33, 225–246. Clark, B., and A. Graybiel. 1966. Factors contributing to the delay in the perception of the oculogravic illusion. American Journal of Psychology, 79, 377–388. Colonius, H., and P. Arndt. 2001. A two-stage model for visual–auditory interaction in saccadic latencies. Perception & Psychophysics, 63, 126–147. Colonius, H., and A. Diederich. 2004. Multisensory interaction in saccadic reaction time: A time-window-ofintegration model. Journal of Cognitive Neuroscience, 16, 1000–1009. Colonius, H., and A. Diederich. 2010a. The optimal time window of visual–auditory integration: A reaction time analysis. Frontiers in Integrative Neuroscience, 4, 11. Colonius, H., and A. Diederich. 2010b. Optimal time windows of integration. Abstract Presented at 2010 International Multisensory Research Forum. Conrey, B., and D.B. Pisoni. 2006. Auditory–visual speech perception and synchrony detection for speech and nonspeech signals. Journal of the Acoustical Society of America, 119, 4065–4073. Corneil, B.D., and D.P. Munoz 1996. The influence of auditory and visual distractors on human orienting gaze shifts. Journal of Neuroscience, 16, 8193–8207.
Spatial and Temporal Features of Multisensory Processes
211
Corneil, B.D., M. Van Wanrooij., D.P. Munoz, and A.J. Van Opstal. 2002. Auditory–visual interactions subserving goal-directed saccades in a complex scene. Journal of Neurophysiology, 88, 438–454. Crystal, T.H., and A.S. House. 1981. Segmental durations in connected speech signals. Journal of the Acoustical Society of America, 69, S82–S83. David, S.V., W.E. Vinje, and J.L. Gallant. 2004. Natural stimulus statistics alter the receptive field structure of v1 neurons. Journal of Neuroscience, 24, 6991–7006. Dhamala, M., C.G. Assisi, V.K. Jirsa, F.L. Steinberg, and J.A. Kelso. 2007. Multisensory integration for timing engages different brain networks. NeuroImage, 34, 764–773. Diederich, A., H. Colonius, D. Bockhorst, and S. Tabeling. 2003. Visual–tactile spatial interaction in saccade generation. Experimental Brain Research, 148, 328–337. Dixon, N.F., and L. Spitz. 1980. The detection of auditory visual desynchrony. Perception, 9, 719–721. Engel, G.R., and W.G. Dougherty. 1971. Visual–auditory distance constancy. Nature, 234, 308. Facoetti, A., A.N. Trussardi, M. Ruffino, M.L. Lorusso, C. Cattaneo, R. Galli, M. Molteni, and M. Zorzi. 2010. Multisensory spatial attention deficits are predictive of phonological decoding skills in developmental dyslexia. Journal of Cognitive Neuroscience, 22, 1011–1025. Fajen, B.R. 2007. Rapid recalibration based on optic flow in visually guided action. Experimental Brain Research, 183, 61–74. Forster, B., C. Cavina-Pratesi, S.M. Aglioti, and G. Berlucchi. 2002. Redundant target effect and intersensory facilitation from visual–tactile interactions in simple reaction time. Experimental Brain Research, 143, 480–487. Foss-Feig, J.H., L.D. Kwakye, C.J. Cascio, C.P. Burnette, H. Kadivar, W.L. Stone, and M.T. Wallace. 2010. An extended multisensory temporal binding window in autism spectrum disorders. Experimental Brain Research, 203, 381–389. Frassinetti, F., N. Bolognini, and E. Ladavas. 2002. Enhancement of visual perception by crossmodal visuoauditory interaction. Experimental Brain Research, 147, 332–343. Frens, M.A., and A.J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in monkey superior colliculus. Brain Research Bulletin, 46, 211–224. Frens, M.A., A.J. Van Opstal, and R.F. van der Willigen. 1995. Spatial and temporal factors determine auditory–visual interactions in human saccadic eye movements. Perception & Psychophysics, 57, 802–816. Fujisaki, W., S. Shimojo, M. Kashino, and S. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature Neuroscience, 7, 773–778. Furukawa, S., and J.C. Middlebrooks. 2002. Cortical representation of auditory space: Information-bearing features of spike patterns. Journal of Neurophysiology, 87, 1749–1762. Ghazanfar, A.A., C. Chandrasekaran, and N.K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience, 28, 4457–4469. Ghazanfar, A.A., C. Chandrasekaran, and R.J. Morrill. 2010. Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31, 1807–1817. Gottlieb, G. 1971. Ontogenesis of sensory function in birds and mammals. In The biopsychology of development, ed. E. Tobach, L.R. Aronson, and E. Shaw. New York: Academic Press. Guest, S., C. Catmur, D. Lloyd, and C. Spence. 2002. Audiotactile interactions in roughness perception. Experimental Brain Research, 146, 161–171. Guitton, D., and D.P. Munoz 1991. Control of orienting gaze shifts by the tectoreticulospinal system in the head-free cat. I. Identification, localization, and effects of behavior on sensory responses. Journal of Neurophysiology, 66, 1605–1623. Haider, B., M.R. Krause, A. Duque, Y. Yu, J. Touryan, J.A. Mazer, and D.A. McCormick. 2010. Synaptic and network mechanisms of sparse and reliable visual cortical activity during nonclassical receptive field stimulation. Neuron, 65, 107–121. Hairston, W.D., J.H. Burdette, D.L. Flowers, F.B. Wood, and M.T. Wallace. 2005. Altered temporal profile of visual–auditory multisensory interactions in dyslexia. Experimental Brain Research, 166, 474–480. Hall, W.C., and A.K. Moschovakis. 2004. The superior colliculus: New approaches for studying sensorimotor integration. Boca Raton, FL: CRC Press. Hanson, J.V., J. Heron, and D. Whitaker. 2008. Recalibration of perceived time across sensory modalities. Experimental Brain Research, 185, 347–352. Harrington, L.K., and C.K. Peck. 1998. Spatial disparity affects visual–auditory interactions in human sensorimotor processing. Experimental Brain Research, 122, 247–252.
212
The Neural Bases of Multisensory Processes
Hershenson, M. 1962. Reaction time as a measure of intersensory facilitation. Journal of Experimental Psychology, 63, 289–293. Hillock, A.R., and M.T. Wallace. 2011a. Changes in the multisensory temporal binding window persist into adolescence. In preparation. Hillock, A.R., and M.T. Wallace. 2011b. A developmental study of the temporal constraints for audiovisual speech binding. In preparation. Hillock, A.R., A.R. Powers 3rd, and M.T. Wallace. 2010. Binding of sights and sounds: Age-related changes in audiovisual temporal processing. (Submitted). Hughes, H.C., P.A. Reuter-Lorenz, G. Nozawa, and R. Fendrich. 1994. Visual–auditory interactions in sensorimotor processing: saccades versus manual responses. Journal of Experimental Psychology. Human Perception and Performance, 20, 131–53. Hughes, H.C., M.D. Nelson, and D.M. Aronchick. 1998. Spatial characteristics of visual–auditory summation in human saccades. Vision Research, 38, 3955–63. Kavounoudias, A., J.P. Roll, J.L. Anton, B. Nazarian, M. Roth, and R. Roll. 2008. Proprio-tactile integration for kinesthetic perception: An fMRI study. Neuropsychologia, 46, 567–575. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex, 18, 1560–74. Kayser, C., N.K. Logothetis, and S. Panzeri. 2010. Visual enhancement of the information representation in auditory cortex. Current Biology, 20, 19–24. Kern, J.K. 2002. The possible role of the cerebellum in autism/PDD: Disruption of a multisensory feedback loop. Medical Hypotheses, 59, 255–260. Kim, R.S., A.R. Seitz, and L. Shams. 2008. Benefits of stimulus congruency for multisensory facilitation of visual learning. PLoS One, 3, e1532. King, A. J. 2004. The superior colliculus. Current Biology, 14, R335–R338. Krueger, J., M.C. Fister, D.W. Royal, B.N. Carriere, and M.T. Wallace. 2008. A comparison of spatiotemporal receptive fields of multisensory superior colliculus neurons in awake and anesthetized cat. Society for Neuroscience Abstract, 457.17. Krueger, J., D.W. Royal, M.C. Fister, and M.T. Wallace. 2009. Spatial receptive field organization of multisensory neurons and its impact on multisensory interactions. Hearing Research, 258, 47–54. Laasonen, M., E. Service, and V. Virsu. 2001. Temporal order and processing acuity of visual, auditory, and tactile perception in developmentally dyslexic young adults. Cognitive, Affective & Behavioral Neuroscience, 1, 394–410. Laasonen, M., E. Service, and V. Virsu. 2002. Crossmodal temporal order and processing acuity in developmentally dyslexic young adults. Brain and Language, 80, 340–354. Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53, 279–292. Lewkowicz, D.J., and K.S. Kraebel. 2004. The value of multisensory redundancy in the development of intersensory perception. In The Handbook of Multisensory Processes, ed. G.A. Calvert, C. Spence, and B.E. Stein. Cambridge, MA: MIT Press. Lewkowicz, D.J., and A.A. Ghazanfar. 2009. The emergence of multisensory systems through perceptual narrowing. Trends in Cognitive Sciences, 13, 470–478. Lickliter, R., and L.E. Bahrick. 2004. Perceptual development and the origins of multisensory responsiveness. In The Handbook of Multisensory Processes, ed. G.A. Calvert, C. Spence, and B.E. Stein. Cambridge, MA: MIT Press. Lovelace, C.T., B.E. Stein, and M.T. Wallace. 2003. An irrelevant light enhances auditory detection in humans: a psychophysical analysis of multisensory integration in stimulus detection. Brain Research Cognitive Brain Research, 17, 447–453. Macaluso, E., N. George, R. Dolan, C. Spence, and J. Driver. 2004. Spatial and temporal factors during processing of audiovisual speech: A PET study. NeuroImage, 21, 725–732. Machens, C.K., M.S. Wehr, and A.M. Zador. 2004. Linearity of cortical receptive fields measured with natural sounds. Journal of Neuroscience, 24, 1089–1100. Manabe, K., and H. Riquimaroux. 2000. Sound controls velocity perception of visual apparent motion. Journal of the Acoustical Society of Japan, 21, 171–174. Massaro, D.W., M.M. Cohen, and P.M. Smeele. 1996. Perception of asynchronous and conflicting visual and auditory speech. Journal of the Acoustical Society of America, 100, 1777–1786. McGrath, M., and Q. Summerfield. 1985. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. Journal of the Acoustical Society of America, 77, 678–685.
Spatial and Temporal Features of Multisensory Processes
213
Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. Journal of Neuroscience, 7, 3215–3229. Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391. Meredith, M.A., and B.E. Stein. 1985. Descending efferents from the superior colliculus relay integrated multisensory information. Science, 227, 657–659. Meredith, M.A., and B.E. Stein. 1986. Spatial factors determine the activity of multisensory neurons in cat superior colliculus. Brain Research, 365, 350–354. Middlebrooks, J.C., and E.I. Knudsen. 1984. A neural code for auditory space in the cat’s superior colliculus. Journal of Neuroscience, 4, 2621–2634. Middlebrooks, J.C., L. Xu, A.C. Eddins, and D.M. Green. 1998. Codes for sound-source location in nontonotopic auditory cortex. Journal of Neurophysiology, 80, 863–881. Miyazaki, M., D. Nozaki, and Y. Nakajima. 2005. Testing Bayesian models of human coincidence timing. Journal of Neurophysiology, 94, 395–399. Miyazaki, M., S. Yamamoto, S., Uchida, and S. Kitazawa. 2006. Bayesian calibration of simultaneity in tactile temporal order judgment. Nature Neuroscience, 9, 875–877. Molholm, S., W. Ritter, M.M. Murray, D.C. Javitt, C.E. Schroeder, and J.J. Foxe. 2002. Multisensory auditory– visual interactions during early sensory processing in humans: A high-density electrical mapping study. Brain Research. Cognitive Brain Research, 14, 115–128. Munhall, K.G., P. Gribble, L. Sacco, and M. Ward. 1996. Temporal constraints on the McGurk effect. Perception & Psychophysics, 58, 351–362. Munoz, D.P., and D. Guitton. 1991. Control of orienting gaze shifts by the tectoreticulospinal system in the head-free cat: II. Sustained discharges during motor preparation and fixation. Journal of Neurophysiology, 66, 1624–1641. Munoz, D.P., D. Guitton, and D. Pelisson. 1991a. Control of orienting gaze shifts by the tectoreticulospinal system in the head-free cat: III. Spatiotemporal characteristics of phasic motor discharges. Journal of Neurophysiology, 66, 1642–1666. Munoz, D.P., D. Pelisson, and D. Guitton. 1991b. Movement of neural activity on the superior colliculus motor map during gaze shifts. Science, 251, 1358–1360. Murray, M.M., C.M. Michel, R. Grave De Peralta, S. Ortigue, D. Brunet, S. Gonzalez Andino, and A. Schnider. 2004. Rapid discrimination of visual and multisensory memories revealed by electrical neuroimaging. NeuroImage, 21, 125–135. Murray, M.M., J.J. Foxe, and G.R. Wylie. 2005. The brain uses single-trial multisensory memories to discriminate without awareness. NeuroImage, 27, 473–478. Nagy, A., G. Eordegh, and G. Benedek. 2003. Spatial and temporal visual properties of single neurons in the feline anterior ectosylvian visual area. Experimental Brain Research, 151, 108–114. Navarra, J., A. Vatakis, M. Zampini, S. Soto-Faraco, W. Humphreys, and C. Spence. 2005. Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Brain Research. Cognitive Brain Research, 25, 499–507. Noesselt, T., J.W. Rieger, M.A. Schoenfeld, M. Kanowski, H. Hinrichs, H.J. Heinze, and J. Driver. 2007. Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience, 27, 11431–11441. Pandey, P.C., H. Kunov, and S.M. Abel. 1986. Disruptive effects of auditory signal delay on speech perception with lipreading. Journal of Auditory Research, 26, 27–41. Pasalar, S., T. Ro, and M.S. Beauchamp. 2010. TMS of posterior parietal cortex disrupts visual tactile multisensory integration. European Journal of Neuroscience, 31, 1783–1790. Populin, L.C. 2005. Anesthetics change the excitation/inhibition balance that governs sensory processing in the cat superior colliculus. Journal of Neuroscience, 25, 5903–5914. Powers 3rd, A.R., A.R. Hillock, and M.T. Wallace. 2009. Perceptual training narrows the temporal window of multisensory binding. Journal of Neuroscience, 29, 12265–12274. Powers 3rd, A.R., M.A. Hevey, and M.T. Wallace. 2010. Neural correlates of multisensory perceptual learning. In preparation. Qin, L., J.Y. Wang, and Y. Sato. 2008. Representations of cat meows and human vowels in the primary auditory cortex of awake cats. Journal of Neurophysiology, 99, 2305–2319. Romei, V., M.M. Murray, L.B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has opposing effects on visual and auditory stimulus detection: implications for multisensory interactions. Journal of Neuroscience, 27, 11465–11472.
214
The Neural Bases of Multisensory Processes
Rouger, J., S. Lagleyre, B. Fraysse, S. Deneve, O. Deguine, and P. Barone. 2007. Evidence that cochlearimplanted deaf patients are better multisensory integrators. Proceedings of the National Academy of Sciences of the United States of America, 104, 7295–7300. Rowland, B.A., S. Quessy, T.R. Stanford, and B.E. Stein. 2007. Multisensory integration shortens physiological response latencies. Journal of Neuroscience, 27, 5879–5884. Royal, D.W., B.N. Carriere, and M.T. Wallace. 2009. Spatiotemporal architecture of cortical receptive fields and its impact on multisensory interactions. Experimental Brain Research, 198, 127–136. Schneider, T.R., A.K. Engel, and S. Debener. 2008. Multisensory identification of natural objects in a two-way crossmodal priming paradigm. Experimental Psychology, 55, 121–132. Schorr, E.A., N.A. Fox, V. van Wassenhove, and E.I. Knudsen. 2005. Auditory–visual fusion in speech perception in children with cochlear implants. Proceedings of the National Academy of Sciences of the United States of America, 102, 18748–18750. Seitz, A.R., R. Kim, and L. Shams. 2006. Sound facilitates visual learning. Current Biology, 16, 1422–1427. Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature, 385, 308. Shams, L., Y. Kamitani, and S. Shimojo. 2000. Illusions. What you see is what you hear. Nature, 408, 788. Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Brain Research Cognitive Brain Research, 14, 147–152. Shore, D.I., C. Spence, and R.M. Klein. 2001. Visual prior entry. Psychological Science, 12, 205–212. Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimental Psychology. Human Perception and Performance, 35, 580–587. Soto-Faraco, S., A. Kingstone, and C. Spence. 2003. Multisensory contributions to the perception of motion. Neuropsychologia, 41, 1847–1862. Sparks, D.L. 1986. Translation of sensory signals into commands for control of saccadic eye movements: Role of primate superior colliculus. Physiological Reviews, 66, 118–171. Sparks, D.L., and Groh, J.M. 1995. The superior colliculus: A window for viewing issues in integrative neuroscience. In The Cognitive Sciences, ed. Gazzaniga, M.S. Cambridge, MA: MIT Press. Spence, C., D.I. Shore, and R.M. Klein. 2001. Multisensory prior entry. Journal of Experimental Psychology. General, 130, 799–832. Stanford, T.R., S. Quessy, and B.E. Stein. 2005. Evaluating the operations underlying multisensory integration in the cat superior colliculus. Journal of Neuroscience, 25, 6499–6508. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., and M.T. Wallace. 1996. Comparisons of cross-modality integration in midbrain and cortex. Progress in Brain Research, 112, 289–299. Stein, B.E., E. Labos, and L. Kruger. 1973a. Determinants of response latency in neurons of superior colliculus in kittens. Journal of Neurophysiology, 36, 680–689. Stein, B.E., E. Labos, and L. Kruger. 1973b. Sequence of changes in properties of neurons of superior colliculus of the kitten during maturation. Journal of Neurophysiology, 36, 667–679. Stein, B.E., W.S. Huneycutt, and M.A. Meredith. 1988. Neurons and behavior: The same rules of multisensory integration apply. Brain Research, 448, 355–358. Stein, B.E., M.A. Meredith, W.S. Huneycutt, and L. McDade. 1989. Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience, 1, 12–24. Stein, B.E., N. London, L.K. Wilkinson, and D.D. Price. 1996. Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506. Stetson, C., X. Cui, P.R. Montague, and D.M. Eagleman. 2006. Motor-sensory recalibration leads to an illusory reversal of action and sensation. Neuron, 51, 651–659. Stevenson, R.A., N.A. Altieri, S. Kim, D.B. Pisoni, and T.W. James. 2010. Neural processing of asynchronous audiovisual speech perception. NeuroImage, 49, 3308–3318. Stone, J.V., N.M. Hunkin, J. Porrill, R. Wood, V. Keeler, M. Beanland, M. Port, and N.R. Porter. 2001. When is now? Perception of simultaneity. Proceedings of the Royal Society of London. Series B. Biological Sciences, 268, 31–38. Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215. Ter-Mikaelian, M., D.H. Sanes, and M.N. Semple. 2007. Transformation of temporal properties between auditory midbrain and cortex in the awake Mongolian gerbil. Journal of Neuroscience, 27, 6091–6102. van Atteveldt, N.M., E. Formisano, L. Blomert, and R. Goebel. 2007. The effect of temporal asynchrony on the multisensory integration of letters and speech sounds. Cerebral Cortex, 17, 962–794.
Spatial and Temporal Features of Multisensory Processes
215
van Wassenhove, V., K.W. Grant, and D. Poeppel. 2007. Temporal window of integration in auditory–visual speech perception. Neuropsychologia, 45, 598–607. van Wassenhove, V., D.V. Buonomano, S. Shimojo, and L. Shams. 2008. Distortions of subjective time perception within and across senses. PLoS One, 3, e1437. Von Kriegstein, K., and A.L. Giraud. 2006. Implicit multisensory associations influence voice recognition. PLoS Biology, 4, e326. Vroomen, J., M. Keetels, B. De Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by exposure to audio-visual asynchrony. Brain Research. Cognitive Brain Research, 22, 32–35. Wade, N.J., and R.H. Day. 1968. Development and dissipation of a visual spatial aftereffect from prolonged head tilt. Journal of Experimental Psychology, 76, 439–443. Wallace, M.T., and B.E. Stein. 1996. Sensory organization of the superior colliculus in cat and monkey. Progress in Brain Research, 112, 301–311. Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience, 17, 2429–2444. Wallace, M.T., and B.E. Stein. 2000. Onset of cross-modal synthesis in the neonatal superior colliculus is gated by the development of cortical influences. Journal of Neurophysiology, 83, 3578–3582. Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior colliculus. Journal of Neuroscience, 21, 8886–8894. Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. Integration of multiple sensory modalities in cat cortex. Experimental Brain Research, 91, 484–488. Wallace, M.T., L.K. Wilkinson, and B.E. Stein. 1996. Representation and integration of multiple sensory inputs in primate superior colliculus. Journal of Neurophysiology, 76, 1246–1266. Wallace, M.T., T.J. Perrault Jr., W.D. Hairston, and B.E. Stein. 2004. Visual experience is necessary for the development of multisensory integration. Journal of Neuroscience, 24, 9580–9584. Wallace, M.T., B.N. Carriere, T.J. Perrault Jr., J.W. Vaughan, and B.E. Stein. 2006. The development of cortical multisensory integration. Journal of Neuroscience, 26, 11844–11849. Wang, X., T. Lu, R.K. Snider, and L. Liang. 2005. Sustained firing in auditory cortex evoked by preferred stimuli. Nature, 435, 341–346. Xu, L., S. Furukawa, and J.C. Middlebrooks. 1999. Auditory cortical responses in the cat to sounds that produce spatial illusions. Nature, 399, 688–691. Ye, C.Q., M.M. Poo, Y. Dan, and X.H. Zhang. 2010. Synaptic mechanisms of direction selectivity in primary auditory cortex. Journal of Neuroscience, 30, 1861–1868. Zampini, M., D.I. Shore, and C. Spence. 2003. Audiovisual temporal order judgments. Experimental Brain Research, 152, 198–210. Zampini, M., S. Guest, D.I. Shore, and C. Spence. 2005a. Audio-visual simultaneity judgments. Perception & Psychophysics, 67, 531–544. Zampini, M., D.I. Shore, and C. Spence. 2005b. Audiovisual prior entry. Neurosci Letters, 381, 217–22.
12
Early Integration and Bayesian Causal Inference in Multisensory Perception Ladan Shams
CONTENTS 12.1 Introduction........................................................................................................................... 217 12.2 Early Auditory–Visual Interactions in Human Brain............................................................ 218 12.3 Why Have Cross-Modal Interactions?................................................................................... 219 12.4 The Problem of Causal Inference.......................................................................................... 220 12.5 Spectrum of Multisensory Combinations..............................................................................220 12.6 Principles Governing Cross-Modal Interactions................................................................... 222 12.7 Causal Inference in Multisensory Perception........................................................................ 223 12.8 Hierarchical Bayesian Causal Inference Model.................................................................... 225 12.9 Relationship with Nonhierarchical Causal Inference Model................................................ 226 12.10 Hierarchical Causal Inference Model versus Human Data................................................. 226 12.11 Independence of Priors and Likelihoods............................................................................. 227 12.12 Conclusions.......................................................................................................................... 229 References....................................................................................................................................... 229
12.1 INTRODUCTION Brain function in general, and perception in particular, has been viewed as highly modular for more than a century. Although phrenology is considered obsolete, its general notion of the brain being composed of compartments each devoted to a single function and independent of other functions has been the dominant paradigm, especially in the context of perception (Pascual-Leone and Hamilton 2001). In the cerebral cortex, it is believed that the different sensory modalities are organized into separate pathways that are independent of each other, and process information almost completely in a self-contained manner until the “well digested” processed signals converge at some higher-order level of processing in the polysensory association cortical areas, wherein the unified perception of the environment is achieved. The notion of modularity of sensory modalities has been particularly strong as related to visual perception. Vision has been considered to be highly self-contained and independent of extramodal influences. This view owes to many sources. Humans are considered to be “visual animals,” and this notion has been underscored in contemporary society with the everincreasingly important role of text and images in our lives along with the advent of electricity (and light at night). The notion of visual dominance has been supported by the classic and well-known studies of cross-modal interactions in which a conflict was artificially imposed between vision and another modality and found that vision overrides the conflicting sensory modality. For example, in the ventriloquist illusion, vision captures the location of discrepant auditory stimulus (Howard and Templeton 1966). Similarly, in the “visual capture” effect, vision captures the spatial location of a tactile or proprioceptive stimulus (Rock and Victor 1964). In the McGurk effect, vision strongly and 217
218
The Neural Bases of Multisensory Processes
qualitatively alters the perceived syllable (McGurk and McDonald 1976). As a result, the influence of vision on other modalities has been acknowledged for some time. However, the influence of other modalities on vision has not been appreciated until very recently. There have been several reports of vision being influenced by another modality; however, most of these have involved quantitative effects (Gebhard and Mowbray 1959; Scheier et al. 1999; Walker and Scott 1981; McDonald et al. 2000; Spence and Driver 1997; Spence et al. 1998; Stein et al. 1996). Over the past few years, two studies have reported radical alterations of visual perception by auditory modality. In one case, the motion trajectory of two visual targets is sometimes changed from a streaming motion to a bouncing motion by a brief sound occurring at the time of visual coincidence (Sekuler et al. 1997). In this case, the motion of the visual stimuli is, in principle, ambiguous in the absence of sound, and one could argue that sound disambiguates this ambiguity. In another study, we found that the perceived number of pulsations of a visual flash (for which there is no obvious ambiguity) is often increased when paired with multiple beeps (Shams et al. 2000, 2002). This phenomenon demonstrates, in an unequivocal fashion, that visual perception can be altered by a nonvisual signal. The effect is also very robust and resistant to changes in the shape, pattern, intensity, and timing of the visual and auditory stimuli (Shams et al. 2001, 2002; Watkins et al. 2006). For this reason, this illusion known as “sound-induced flash illusion” appears to reflect a mainstream mechanism of auditory–visual interaction in the brain as opposed to some aberration in neural processing. Thus, we used the sound-induced flash illusion as an experimental paradigm for investigating auditory–visual interactions in the human brain.
12.2 EARLY AUDITORY–VISUAL INTERACTIONS IN HUMAN BRAIN The first question we asked was, at what level of processing do auditory–visual perceptual interactions occur? Do they occur at some higher-order polysensory area in the association cortex or do they involve the modulation of activation along the visual cortex? We examined whether visually evoked potentials, as recorded from three electrodes in the occipital regions of the scalp, are affected by sound. We recorded evoked potentials under visual-alone (1flash, or 2flashes), auditory-alone (2beeps), and auditory–visual (1flash2beeps) stimulus conditions. When comparing the pattern of activity associated with a second physical flash (2flash – 1flash) with that of an illusory second flash (i.e., 1flash2beeps – 1flash – 2beeps), we obtained a very similar temporal pattern of activity (Shams et al. 2001). Furthermore, for the 1flash2beep condition, comparing illusion and no-illusion trials revealed that the perception of illusion was associated with increased gamma-band activity in the occipital region (Bhattacharya et al. 2002). A magnetoencephalography (MEG) study of the flash illusion revealed the modulation of activity in occipital channels by sound as early as 35 to 65 ms poststimulus onset (Shams et al. 2005a). These results altogether indicated a mechanism of auditory–visual interaction with very short latency, and in the occipital cortex. However, to map the exact location of the interactions, we needed higher spatial resolution. Therefore, we performed functional MRI (fMRI) studies of the sound-induced flash illusion. In these studies (Watkins et al. 2006, 2007), the visual cortical areas were functionally mapped for each individual subject using retinotopic mapping. We contrasted auditory–visual conditions (1flash1beep, 2flash2beep) versus visual-alone conditions (1flash, 2flash). This contrast indicated auditory cortical areas, which is not surprising because in one condition, there is sound, and in another condition, there is no sound. But interestingly, the contrast also indicated areas V1, V2, and V3, which is surprising because the visual stimulus is identical in the contrasted conditions. Therefore, these results (Watkins et al. 2006) clearly demonstrated for the first time (but see Calvert et al. 2001) that activity in the human visual cortex as early as V1 can be modulated by nonvisual stimulation. The observed increase in activation was very robust and significant. We suspected that this increase in activity may reflect a possible general arousal effect caused by sound as opposed to auditory–visual integration per se. Indeed, attention has been previously shown to increase activity in early visual cortical areas. To address this question, we focused on the 1flash2beep condition which, in some trials, gave rise to
Early Integration and Bayesian Causal Inference in Multisensory Perception
219
an illusory percept of two flashes (also referred to as a fission effect). We compared the illusion and no-illusion trials, reasoning that given that the physical stimuli are identical in both of these post hoc–defined conditions, the arousal level should also be equal. Contrasting illusion and nonillusion trials revealed increased activity in V1 in the illusion condition (Watkins et al. 2006), indicating that the perception of illusion is correlated with increased activity in V1. Although this contradicts the attention hypothesis laid out earlier, one could still argue that sound may only increase arousal in some trials and those trials happen to be the illusion trials. Although this argument confounds attention with integration, we could nevertheless address it using another experiment in which we included a 2flash1beep condition. On some trials of this condition, the two flashes are fused, leading to an illusory percept of a single flash (also referred to as a fusion effect), whereas in other trials, the observers correctly perceived two flashes. Contrasting the illusion and nonillusion trials, we again found a significant difference in the activation level of V1; however, this time, the perception of sound-induced visual illusion was correlated with decreased activity in V1 (Watkins et al. 2007), therefore ruling out the role of attention or arousal. As mentioned above, the eventrelated potential (ERP) study showed a similar temporal pattern of activity for the illusory and physical second flash. Here, we found a similar degree of V1 activation for physical and illusory double flash, and a similar degree of activation for the physical and illusory single flash (Watkins et al. 2007). These results altogether establish clearly that activity in early visual cortical areas, as early as in the primary visual cortex, is modulated by sound through cross-modal integration processes. What neural pathway could underlie these early auditory–visual interactions? Again, the last decade has witnessed the overturning of another dogma; the dogma of no connectivity among the sensory cortical areas. There has been mounting evidence for direct and indirect anatomical connectivity among the sensory cortical areas (e.g., Clavagnier et al. 2004; Falchier et al. 2002; Ghazanfar and Schroeder 2006; Rockland and Ojima 2003; Hackett et al. 2007). Of particular interest here are the findings of extensive projections from the auditory core and parabelt and multisensory area superior temporal polysensory cortical areas to V1 and V2 in monkey (Falchier et al. 2002; Rockland and Ojima 2003; Clavagnier et al. 2004). Intriguingly, these projections appear to be only extensive for the peripheral representations in V1, and not for the foveal representations (Falchier et al. 2002). This pattern is highly consistent with the much stronger behavioral and physiological auditory modulation of vision in the periphery compared with the fovea that we have observed (Shams et al. 2001). Interestingly, tactile modulation of visual processing also seems to be stronger in the periphery (Diederich and Colonius 2007). Therefore, it seems likely that a direct projection from A1 or a feedback projection from superior temporal sulcus (STS) could mediate the modulations we have observed. We believe that the former may be more likely because although the activation in V1 was found to correlate with the perception of flash, the activation of area STS was always increased with the perception of illusion regardless of the type of illusion (single or doubleflash; Watkins et al. 2006, 2007). Therefore, these results are more readily consistent with a direct modulation of V1 projections from auditory areas.
12.3 WHY HAVE CROSS-MODAL INTERACTIONS? The findings discussed above as well as those discussed in other chapters, make it clear that crossmodal interactions are prevalent, and can be very strong and robust. But why? At first glance, it may not be obvious why having cross-modal interactions would be advantageous or necessary for human’s survival in the environment. Especially in the context of visual perception, one could argue that visual perception is highly precise and accurate in so many tasks, that it may even be disadvantageous to “contaminate” it with other sensory signals that are not as reliable (which could then cause illusions or errors). Theory tells us, and experimental studies have confirmed, that even when a second source of information is not very reliable, combining two sources of information could result in superior estimation compared with using only the most reliable source. Maximum
220
The Neural Bases of Multisensory Processes
likelihood estimation of an object property using two independent cues, for example, an auditory estimate and a visual estimate, results in an estimate that is more reliable (more precise) than either one of the individual estimates. Many studies of multisensory perception have confirmed that the human nervous system integrates two cross-modal estimates in a similar fashion (e.g., Alais and Burr 2004; Ernst and Banks 2002; van Beers et al. 1999; Ronsse et al. 2009). Therefore, integrating information across modalities is always beneficial. Interestingly, recent studies using single-cell recordings and behavioral measurements from macaque monkeys have provided a bridge between the behavioral manifestations of multisensory integration and neural activity, showing that the activity of multisensory (visual–vestibular) neurons is consistent with Bayesian cue integration (for a review, see Angelaki et al. 2009).
12.4 THE PROBLEM OF CAUSAL INFERENCE Although it is beneficial to integrate information from different modalities if the signals correspond to the same object, one could see that integrating information from two different objects would not be advantageous. For example, while trying to cross the street on a foggy day, it would be beneficial to combine auditory and visual information to estimate the speed and direction of an approaching car. It could be a fatal mistake, on the other hand, to combine the information from the sound of a car moving behind us in the opposite direction with the image of another moving car in front of us. It should be noted that humans (as with most other organisms) are constantly surrounded by multiple objects and thus multiple sources of sensory stimulation. Therefore, at any given moment, the nervous system is engaged in processing multiple sensory signals across the senses, and not all of these signals are caused by the same object, and therefore not all of them should be bound and integrated. The problem of whether to combine two signals involves an (implicit or explicit) inference about whether the two signals are caused by the same object or by different objects, i.e., causal inference. This is not a trivial problem, and cannot be simply solved, for example, based on whether the two signals originate from the same coordinates in space. The different senses have different precisions in all dimensions, including the temporal and spatial dimensions, and even if the two signals are derived from the same object/event, the noise in the environment and in the nervous system makes the sensory signals somewhat inconsistent with each other most of the time. Therefore, the nervous system needs to use as much information as possible to solve this difficult problem. It appears that whether two sensory signals are perceptually bound together typically depends on a combination of spatial, temporal, and structural consistency between the signals as well as the prior knowledge derived from experience about the coupling of the signals in nature. For example, moving cars often make a frequency sweep sound, therefore, the prior probability for combining these two stimuli should be very high. On the other hand, moving cars do not typically create a bird song, therefore the prior bias for combining the image of a car and the sound of a bird is low. Unlike the problem of causal inference in cognition, which only arises intermittently, the problem of causal inference in perception has to be solved by the nervous system at any given moment, and is therefore at the heart of perceptual processing. In addition to solving the problem of causal inference, the perceptual system also needs to determine how to integrate signals that appear to have originated from the same source, i.e., to what extent, and in which direction (which modality should dominate which modality).
12.5 SPECTRUM OF MULTISENSORY COMBINATIONS To investigate these theoretical issues, we used two complementary experimental paradigms: a temporal numerosity judgment task (Shams et al. 2005b), and a spatial localization task (Körding et al. 2007). These two tasks are complementary in that the former is primarily a temporal task, whereas the latter is clearly a spatial task. Moreover, in the former, the auditory modality dominates, whereas in the latter, vision dominates. In both of these paradigms, there are strong illusions that occur under some stimulus conditions: the sound-induced flash illusion and the ventriloquist illusion.
Early Integration and Bayesian Causal Inference in Multisensory Perception
221
In the temporal numerosity experiment, a variable number of flashes were presented in the periphery simultaneously with a variable number of beeps. The task of the observers was to judge the number of flashes and beeps in each trial. In the spatial localization experiment, a Gabor patch and/or a noise burst were briefly presented at one of several locations along a horizontal line and the task of the subject was to judge the location of both the visual and auditory stimuli in each trial. In both experiments, we observed a spectrum of interactions (Figure 12.1). When there was no discrepancy between the auditory and visual stimuli, the two stimuli were fused (Figure 12.1a, left). When the discrepancy was small between the two stimuli, they were again fused in a large fraction of trials (Figure 12.1a, middle and right). These trials are those in which an illusion occurred. For example, when one flash paired with two beeps was presented, in a large fraction of trials, the observers reported seeing two flashes (sound-induced flash illusion) and hearing two beeps. The reverse illusion occurred when two flashes paired with one beep were seen as a single flash in a large fraction of trials. Similarly, in the localization experiment, when the spatial gap between the flash and noiseburst was small (5°), the flash captured the location of the sound in a large fraction of trials (ventriloquist illusion). In the other extreme, when the discrepancy between the auditory and visual stimuli was large, there was little interaction, if any, between the two. For example, in the 1flash4beep or 4flash1beep conditions in the numerosity judgment experiments, or in the conditions in which the flash was all the way to the left and noise all the way to the right or vice versa in the localization experiment, there was hardly any shift in the visual or auditory percepts relative to the unisensory conditions. We refer to this lack of interaction as segregation (Figure 12.1c) because it appears that the signals are kept separate from each other. Perhaps most interestingly, in conditions in which there was a moderate discrepancy between the two stimuli, sometimes there was a partial shift of the two modalities toward each other. We refer to this phenomenon as “partial integration” (Figure 12.1b). For example, in the 1flash3beep condition, the observers sometimes reported seeing two flashes and hearing three beeps. Or in the condition in which the flash is at –5° (left of fixation) and noise is at +5° (right of fixation), the observers sometimes reported hearing the noise at 0 degrees and seeing the flash at –5°. Therefore, in summary, in both experiments, we observed a Fusion
(a)
Conflict
(c)
Partial integration Segregation
(b)
FIGURE 12.1 Range of cross-modal interactions. Horizontal axis in these panels represents a perceptual dimension such as space, time, number, etc. Light bulb and loudspeaker icons represent visual stimulus and auditory stimulus, respectively. Eye and ear icons represent visual and auditory percepts, respectively. (a) Fusion. Three examples of conditions in which fusion often occurs. Left: when stimuli are congruent and veridically perceived. Middle: when discrepancy between auditory and visual stimuli is small, and percept corresponds to a point in between two stimuli. Right: when discrepancy between two stimuli is small, and one modality (in this example, vision) captures the other modality. (b) Partial integration. Left: when discrepancy between two stimuli is moderate, and the less reliable modality (in this example, vision) gets shifted toward the other modality but does not converge. Right: when discrepancy is moderate and both modalities get shifted toward each other but not enough to converge. (c) Segregation. When conflict between two stimuli is large, and the two stimuli do not affect each other.
222
The Neural Bases of Multisensory Processes (a)
(b) 50 % Auditory bias
% Visual bias
50
40
30
20
40
30
20 1
2 3 Number disparity (#)
5 10 15 20 Spatial disparity (deg.)
FIGURE 12.2 Interaction between auditory and visual modalities as a function of conflict. (a) Visual bias (i.e., influence of sound on visual perception) as a function of discrepancy between number of flashes and beeps in temporal numerosity judgment task. (b) Auditory bias (i.e., influence of vision on auditory perception) as a function of spatial gap between the two in spatial localization task.
spectrum of interactions between the two modalities. When the discrepancy is zero or small, the two modalities tend to get fused. When the conflict is moderate, partial integration may occur, and when the conflict is large, the two signals tend to be segregated (Figure 12.1, right). In both experiments, the interaction between the two modalities gradually decreased as the discrepancy between the two increased (Figure 12.2). What would happen if we had more than two sensory signals? For example, if we have a visual, auditory, and tactile signal, as is most often the case in nature. We investigated this scenario using the numerosity judgment task (Wozny et al. 2008). We presented a variable number of flashes paired with a variable number of beeps and a variable number of taps, providing unisensory, bisensory, and trisensory conditions pseudorandomly interleaved. The task of the participants was to judge the number of flashes, beeps, and taps on each trial. This experiment provided a rich set of data that replicated the sound-induced flash illusion (Shams et al. 2000) and the touch-induced flash illusion (Violentyev et al. 2005), as well as many previously unreported illusions. In fact, in every condition in which there was a small discrepancy between two or three modalities, we observed an illusion. This finding demonstrates that the interaction among these modalities is the rule rather than the exception, and the sound-induced flash illusions that have been previously reported are not “special” in the sense that they are not unusual or out of the ordinary, but rather, they are consistent with a general pattern of cross-modal interactions that cuts across modalities and stimulus conditions. We wondered whether these changes in perceptual reports reflect a change in response criterion as opposed to a change in perception per se. We calculated the sensitivity (d′) change between bisensory and unisensory conditions (and between trisensory and bisensory conditions) and found statistically significant changes in sensitivity as a result of the introduction of a second (or third) sensory signal in most of the cases despite the very conservative statistical criterion used. In other words, the observed illusions (both fission and fusion) reflect cross-modal integration processes, as opposed to response bias.
12.6 PRINCIPLES GOVERNING CROSS-MODAL INTERACTIONS Is there anything surprising about the fact that there are a range of interactions between the senses? Let us examine that. Intuitively, it is reasonable for the brain to combine different sources of information to come up with the most informative guess about an object, if all the bits of information are about the same object. For example, if we are holding a mug in our hand, it makes sense that
Early Integration and Bayesian Causal Inference in Multisensory Perception
223
we use both haptic and visual information to estimate the shape of the mug. It is also expected for the bits of information to be fairly consistent with each other if they arise from the same object. Therefore, it would make sense for the nervous system to fuse the sensory signals when there is little or no discrepancy between the signals. Similarly, as discussed earlier, it is reasonable for the nervous system not to combine the bits of information if they correspond to different objects. It is also expected for the bits of information to be highly disparate if they stem from different objects. Therefore, if we are holding a mug while watching TV, it would be best not to combine the visual and haptic information. Therefore, segregation also makes sense from a functional point of view. How about partial integration? Is there a situation in which partial integration would be beneficial? There is no intuitively obvious explanation for partial integration, as we do not encounter situations wherein two signals are only partially caused by the same object. Therefore, the phenomenon of partial integration is rather curious. Is there a single rule that can account for the entire range of cross-modal interactions including partial integration?
12.7 CAUSAL INFERENCE IN MULTISENSORY PERCEPTION The traditional model of cue combination (Ghahramani 1995; Yuille and Bülthoff 1996; Landy et al. 1995), which has been the dominant model for many years, assumes that the sensory cues all originate from the same object (Figure 12.3a) and therefore they should all be fused to obtain an optimal estimate of the object property in question. In this model, it is assumed that the sensory signals are corrupted by independent noise and, therefore, are conditionally independent of each other. The optimal estimate of the source is then a linear combination of the two sensory cues. If a Gaussian distribution is assumed for the distribution of the sensory cues, and no a priori bias, this linear combination would become a weighted average of the two sensory estimates, with each estimate weighted by its precision (or inverse of variance). This model has been very successful in accounting for the integration of sensory cues in various tasks and various combinations of sensory modalities (e.g., Alais and Burr 2004; Ernst and Banks 2002; Ghahramani 1995; van Beers et al. 1999). Although this model can account well for behavior when the conflict between the two signals is small (i.e., for situations of fusion, for obvious reasons), it fails to account for the rest of the spectrum (i.e., partial integration and segregation). (a)
(b)
s
x1 (c)
sA xA
x2
sV xV
sT
sA
sV
xA
xV
(d)
C
C=1 s
xT x1
x2
sA xA
C=2
sV xV
FIGURE 12.3 Generative model of different models of cue combination. (a) Traditional model of cue combination, in which two signals are assumed to be caused by one source. (b) Causal inference model of cue combination, in which each signal has a respective cause, and causes may or may not be related. (c) Generalization of model in (b) to three signals. (d) Hierarchical causal inference model of cue combination. There are two explicit causal structures, one corresponding to common cause and one corresponding to independent causes, and variable C chooses between the two. (b, Adapted from Shams, L. et al., Neuroreport, 16, 1923–1927, 2005b; c, adapted from Wozny, D.R. et al., J. Vis., 8, 1–11, 2008; d, Körding, K. et al., PLoS ONE, 2, e943, 2007.)
224
The Neural Bases of Multisensory Processes
To come up with a general model that can account for the entire range of interactions, we abandoned the assumption of a single source, and allowed each of the sensory cues to have a respective source. By allowing the two sources to be either dependent or independent, we allowed for both conditions of a common cause and conditions of independent causes for the sensory signals (Figure 12.3b). We assume that the two sensory signals (xA and x V) are conditionally independent of each other. This follows from the assumption that up to the point where the signals get integrated, the sensory signals in different modalities are processed in separate pathways and thus are corrupted by independent noise processes. As mentioned above, this is a common assumption. The additional assumption made here is that the auditory signal is independent of the visual source (sV) given the auditory source (sA), and likewise for visual signal. This is based on the observation that either the two signals are caused by the same object, in which case, the dependence of auditory signal on the visual source is entirely captured by its dependence on the auditory source, or they are caused by different objects, in which case, the auditory signal is entirely independent of the visual source (likewise for visual signal). In other words, this assumption follows from the observation that there is either a common source or independent sources. This general model of bisensory perception (Shams et al. 2005b) results in a very simple inference rule:
P ( s A , sV | x A , x V ) =
P ( x A | s A ) P ( x V | sV ) P ( s A , sV ) P( x A , x V )
(12.1)
where the probability of the auditory and visual sources, sA and sV, given the sensory signals xA and x V is a normalized product of the auditory likelihood (i.e., the probability of getting a signal xA given that there is a source sA out there) and visual likelihood (i.e., the probability of getting a signal x V given that there is a source sV) and the prior probability of sources sA and s V occurring jointly. The joint prior probability P(sA,s V) represents the implicit knowledge that the perceptual system has accumulated over the course of a lifetime about the statistics of auditory–visual events in the environment. In effect, it captures the coupling between the two modalities, and therefore, how much the two modalities will interact in the process of inference. If the two signals (e.g., the number of flashes and beeps) have always been consistent in one’s experience, then the expectation is that they will be highly consistent in the future, and therefore, the joint prior matrix would be diagonal (only the identical values of number of flashes and beeps are allowed, and the rest will be zero). On the other hand, if in one’s experience, the number of flashes and beeps are completely independent of each other, then P(sA,sV) would be factorizable (e.g., a uniform distribution or an isotropic Gaussian distribution) indicating that the two events have nothing to do with each other, and can take on any values independently of each other. Therefore, by having nonzero values for both sA = sV and sA ≠ sV in this joint probability distribution, both common cause and independent cause scenarios are allowed, and the relative strength of these probabilities would determine the prior expectation of a common cause versus independent causes. Other recent models of multisensory integration have also used joint prior probabilities to capture the interaction between two modalities, for example, in haptic–visual numerosity judgment tasks (Bresciani et al. 2006) and auditory–visual rate perception (Roach et al. 2006). The model of Equation 12.1 is simple, general, and readily extendable to more complex situations. For example, the inference rule for trisensory perception (Figure 12.3c) would be as follows:
P ( s A , sV , sT | x A , x V , x T ) =
P( x A | sA ) P( x V | sV ) P( x T | sT ) P(sA , sV , sT ) P( x A , xV , xT )
(12.2)
To test the trisensory perception model of Equation 12.2, we modeled the three-dimensional joint prior P(sA,sV,sT) with a multivariate Gaussian function, and each of the likelihood functions with a univariate Gaussian function. The mean of the likelihoods were assumed to be unbiased (i.e., on
Early Integration and Bayesian Causal Inference in Multisensory Perception
225
average at the veridical number), and the standard deviation of the likelihoods was estimated using data from unisensory conditions. It was also assumed that the mean and variance for the prior of the three modalities were equal, and the three covariances (for three pairs of modalities) were also equal.* This resulted in a total of three free parameters (mean, variance, and covariance of the prior). These parameters were fitted to the data from the trisensory numerosity judgment experiment discussed earlier. The model accounted for 95% of variance in the data (676 data points) using only three free parameters. To test whether the three parameters rendered the model too powerful and able to account for any data set, we scrambled the data and found that the model badly failed to account for the arbitrary data (R2 < .01). In summary, the Bayesian model of Figure 12.3c could provide a remarkable account for the myriad of two-way and three-way interactions observed in the data.
12.8 HIERARCHICAL BAYESIAN CAUSAL INFERENCE MODEL The model described above can account for the entire range of interactions. However, it does not directly make predictions about the perceived causal structure. In order to be able to make predictions about the perceived causal structure, one needs a hierarchical model in which there is a variable (variable C in Figure 12.3d) that chooses between the different causal structures. We describe this model in the context of the spatial localization task as an example. In this model, the probability of a common cause (i.e., C = 1) is simply computed using Bayes rule as follows:
(
)
p C = 1 | xV , x A =
(
) ( p(x , x )
)
p xV , x A | C = 1 p C = 1 V
(12.3)
A
According to this rule, the probability of a common cause is simply a product of two factors. The left term in the numerator—the likelihood that the two sensory signals occur if there is a common cause—is a function of how similar the two sensory signals are. The more dissimilar the two signals, the lower this probability will be. The right term in the numerator is the a priori expectation of a common cause, and is a function of prior experience (how often two signals are caused by the same source in general). The denominator again is a normalization factor. Given this probability of a common cause, the location of the auditory and visual stimulus can now be computed as follows:
(
)
(
)
sˆ = p C = 1 | xV , x A sˆC=1 + p C = 2 | x V , x A sˆC=2
(12.4)
where ŝ denotes the overall estimate of the location of sound (or visual stimulus), and ŝ C = 1 and ŝ C = 2 denote the optimal estimates of location for the scenario of common-cause or scenario of independent causes, respectively. The inference rule is interesting because it is a weighted average of two optimal estimates, and it is nonlinear in xA and x V. What does this inference rule mean? Let us focus on auditory estimation of location for example, and assume Gaussian functions for prior and likelihood functions over space. If the task of the observer is to judge the location of sound, then if the observer knows for certain that the auditory and visual stimuli were caused by two independent sources (e.g., a puppeteer talking and a puppet moving), then the optimal estimate of the location of sound would be entirely based on the auditory * These assumptions were made to minimize the number of free parameters and maximize the parsimony of the model. However, the assumptions were verified by fitting a model with nine parameters (allowing different values for the mean, variance, and covariance across modalities) to the data, and finding almost equal values for all three means, all three variances, and all three covariances.
226
The Neural Bases of Multisensory Processes
x A σ A2 + x P σ P2 where σA and σ P are the standard deviations 1 σ A2 + 1 σ P2 of the auditory likelihood and the prior, respectively. On the other hand, if the observer knows for certain that the auditory and visual stimuli were caused by the same object (e.g., a puppet talking and moving), then the optimal estimate of the location of sound would take visual information into x σ 2 + x σ 2 + x P σ P2 account: sˆA ,C=1 = V V2 A 2 A . In nature, the observer is hardly ever certain about the 1 σ V + 1 σ A + 1 σ P2 causal structure of the events in the environment, and in fact, it is the job of the nervous system to solve that problem. Therefore, in general, the nervous system would have to take both of these possibilities into account, thus, the overall optimal estimate of the location of sound happens to be a weighted average of the two optimal estimates each weighted by their respective probabilities as in Equation 12.3. It can now be understood how partial integration could result from this optimal scheme of multisensory perception. It should be noted that Equation 12.4 is derived assuming a mean squared error cost function. This is a common assumption, and roughly speaking, it means that the nervous system tries to minimize the average magnitude of error. The mean squared error function is minimized if the mean of the posterior distribution is selected as the estimate. The estimate shown in Equation 12.4 corresponds to the mean of the posterior distribution, and as it is a weighted average of the estimates of the two causal structures (i.e., ŝA,C = 2 and ŝA,C = 1), it is referred to as “model averaging.” If, on the other hand, the goal of the perceptual system is to minimize the number of times that an error is made, then the maximum of the posterior distribution would be the optimal estimate. In this scenario, the overall estimate of location would be the estimate corresponding to the causal structure with the higher probability, and thus, this strategy is referred to as “model selection.” Although the model averaging strategy of Equation 12.4 provides estimates that are never entirely consistent with either one of the two possible scenarios (i.e., with what occurs in the environment), this strategy does minimize the magnitude of error on average (the mean squared error) more than any other strategy, and therefore, it is optimal given the cost function. information and the prior: sˆA,C= 2 =
12.9 RELATIONSHIP WITH NONHIERARCHICAL CAUSAL INFERENCE MODEL The hierarchical causal inference model of Equation 12.3 can be thought of as a special form of the nonhierarchical causal inference model of Equation 12.1. By integrating out the hidden variable p( x A | sA ) p( x V | sV ) p(sA , sV ) C, the hierarchical model can be recast as p(sA , sV | x A , x V ) = where p( x A , x V ) p(sA,sV) = p(C = 1)p(s) + p(C = 2)p(sA)p(sV). In other words, the hierarchical model is a special form of the nonhierarchical model in which the joint prior is a mixture of two priors, a prior corresponding to the independent sources, and a prior corresponding to common cause. The main advantage of the hierarchical model over the nonhierarchical model is that it performs causal inference explicitly and allows making direct predictions about perceived causal structure (C).
12.10 HIERARCHICAL CAUSAL INFERENCE MODEL VERSUS HUMAN DATA We tested whether the hierarchical causal inference model can account for human auditory–visual spatial localization (Körding et al. 2007). We modeled the likelihood and prior over space using Gaussian functions. We assumed that the likelihood functions are, on average, centered around the veridical location. We also assumed that there is a bias for the center (straight ahead) location. There were four free parameters that were fitted to the data: the prior probability of a common cause, the standard deviation of the visual likelihood (i.e., the visual sensory noise), the standard deviation of auditory likelihoods (i.e., the auditory sensory noise), and the standard deviation of the prior over
Early Integration and Bayesian Causal Inference in Multisensory Perception
227
space (i.e., the strength of the bias for center). Because the width of the Gaussian prior over space is a free parameter, if there is no such bias for center position, the parameter will take on a large value, practically rendering this distribution uniform, and thus, the bias largely nonexistent. The model accounted for 97% of variance in human observer data (1225 data points) using only four free parameters (Körding et al. 2007). This is a remarkable fit, and as before, is not due to the degrees of freedom of the model, as the model cannot account for arbitrary data using the same number of free parameters. Also, if we set the value of the four parameters using some common sense values or the published data from other studies, and compare the data with the predictions of the model with no free parameters, we can still account for the data similarly well. We tested whether model averaging (Equation 12.4) or model selection (see above) explains the observers’ data better, and found that observers’ responses were highly more consistent with model averaging than model selection. In our spatial localization experiment, we did not ask participants to report their perceived causal structure on each trial. However, Wallace and colleagues did ask their subjects to report whether they perceive a unified source for the auditory and visual stimuli on each trial (Wallace et al. 2004). The hierarchical causal inference model can account for their published data; both for the data on judgments of unity, and the spatial localizations and interactions between the two modalities (Körding et al. 2007). We compared this model with other models of cue combination on the spatial localization data set. The causal inference model accounts for the data substantially better than the traditional forced fusion model of integration, and better than two recent models of integration that do not assume forced fusion (Körding et al. 2007). One of these models was a model developed by Bresciani et al. (2006) that assumes a Gaussian ridge distribution as the joint prior, and the other one was a model developed by Roach et al. (2006) that assumes the sum of a uniform distribution and a Gaussian ridge as the joint prior. We tested the hierarchical causal inference model on the numerosity judgment data described earlier. The model accounts for 86% of variance in the data (576 data points) using only four free parameters (Beierholm 2007). We also compared auditory–visual interactions and visual–visual interactions in the numerosity judgment task, and found that both cross-modal and within-modality interactions could be explained using the causal inference model, with the main difference between the two being in the a priori expectation of a common cause (i.e., Pcommon). The prior probability of a common cause for visual–visual condition was higher than that of the auditory–visual condition (Beierholm 2007). Hospedales and Vijayakumar (2009) have also recently shown that an adaptation of the causal inference model for an oddity detection task accounts well for both within-modality and cross-modal oddity detection of observers. Consistent with our results, they found the prior probability of a common cause to be higher for the within-modality task compared with the cross-modality task. In summary, we found that the causal inference model accounts well for two complementary sets of data (spatial localization and numerosity judgment), it accounts well for data collected by another group, it outperforms the traditional and other contemporary models of cue combination (on the tested data set), and it provides a unifying account of within-modality and cross-modality integration.
12.11 INDEPENDENCE OF PRIORS AND LIKELIHOODS These results altogether strongly suggest that human observers are Bayes-optimal in multisensory perceptual tasks. What does it exactly mean to be Bayes-optimal? The general understanding of Bayesian inference is that inference is based on two factors, likelihood and prior. Likelihood represents the sensory noise (in the environment or in the brain), whereas prior captures the statistics of the events in the environment, and therefore, the two quantities are independent of each other. Although this is the general interpretation of Bayesian inference, it is important to note that demonstrating that observers are Bayes-optimal under one condition does not necessarily imply that the
228
The Neural Bases of Multisensory Processes
likelihoods and priors are independent of each other. It is quite possible that changing the likelihoods would result in a change in priors or vice versa. Given that we are able to estimate likelihoods and priors using the causal inference model, we can empirically investigate the question of independence of likelihoods and priors. Furthermore, it is possible that the Bayes-optimal performance is achieved without using Bayesian inference (Maloney and Mamassian 2009). For example, it has been described that an observer using a table-lookup mechanism can achieve near-optimal performance using reinforcement learning (Maloney and Mamassian 2009). Because the Bayes-optimal performance can be achieved by using different processes, it has been argued that comparing human observer performance with a Bayesian observer in one setting alone is not sufficient as evidence for Bayesian inference as a process model of human perception. For these reasons, Maloney and Mamassian (2009) have proposed transfer criteria as more powerful experimental tests of Bayesian decision theory as a process model of perception. The transfer criterion is to test whether the change in one component of decision process (i.e., likelihood, prior, or decision rule) leaves the other components unchanged. The idea is that if the perceptual system indeed engages in Bayesian inference, a change in likelihoods, for example, would not affect the priors. However, if the system uses another process such as a table-lookup then it would fail these kinds of transfer tests. We asked whether priors are independent of likelihoods (Beierholm et al. 2009). To address this question, we decided to induce a strong change in the likelihoods and examine whether this would lead to a change in priors. To induce a change in likelihoods, we manipulated the visual stimulus. We used the spatial localization task and tested participants under two visual conditions, one with a high-contrast visual stimulus (Gabor patch), and one with a low-contrast visual stimulus. The task, procedure, auditory stimulus, and all other variables were identical across the two conditions that were tested in two separate sessions. The two sessions were held 1 week apart, so that if the observers learn the statistics of the stimuli during the first session, the effect of this learning would disappear by the time of the second session. The change in visual contrast was drastic enough to cause the performance on visual-alone trials to be lower than that of the high-contrast condition by as much as 41%. The performance on auditory-alone trials did not change significantly because the auditory stimuli were unchanged. The model accounts for both sets of data very well (R2 = .97 for high contrast, and R2 = .84 for low-contrast session). Therefore, the performance of the participants appears to be Bayes-optimal in both the high-contrast and low-contrast conditions. Considering that the performances in the two sessions were drastically different (substantially worse in the low-contrast condition), and considering that the priors were estimated from the behavioral responses, there is no reason to believe that the priors in these two sessions would be equal (as they are derived from very different sets of data). Therefore, if the estimated priors do transpire to be equal between the two sessions, that would provide a strong evidence for independence of priors from likelihoods. If the priors are equal, then swapping them between the two sessions should not hurt the goodness of fit to the data. We tested this using priors estimated from the low-contrast data to predict high-contrast data, and the priors estimated from the high-contrast data to predict the low-contrast data. The results were surprising: the goodness of fit remained almost as good (R2 = .97 and R2 = .81) as using priors from the same data set (Beierholm et al. 2009). Next, we directly compared the estimated parameters of the likelihood and prior functions for the two sessions. The model was fitted to each individual subject’s data, and the likelihood and prior parameters were estimated for each subject for each of the two sessions separately. Comparing the parameters across subjects (Figure 12.4) revealed a statistically significant (P < .0005) difference only for the visual likelihood (showing a higher degree of noise for the low-contrast condition). No other parameters (neither the auditory likelihood nor the two prior parameters) were statistically different between the two sessions. Despite a large difference between the two visual likelihoods (by >10 standard deviations) no change was detected in either probability of a common cause nor the prior over space. Therefore, these results suggest that priors are encoded independently of the likelihoods (Beierholm et al. 2009). These findings are consistent with the findings of a previous study showing that the change in the kind of perceptual bias transfers qualitatively to other types of stimuli (Adams et al. 2004).
Early Integration and Bayesian Causal Inference in Multisensory Perception ***
n.s.
n.s.
n.s.
100
16
80
12
60
8
40
4
20
0
σV
σA
Likelihoods
σP
Pcommon
Percentage common
Degrees along azimuth
20
229
0
Priors
FIGURE 12.4 Mean prior and likelihood parameter values across participants in two experimental sessions differing only in contrast of visual stimulus. Black and gray denote values corresponding to session with high-contrast and low-contrast visual stimulus, respectively. Error bars correspond to standard error of mean. (From Beierholm, U. et al., J. Vis., 9, 1–9, 2009. With permission.)
12.12 CONCLUSIONS Together with a wealth of other accumulating findings, our behavioral findings suggest that crossmodal interactions are ubiquitous, strong, and robust in human perceptual processing. Even visual perception that has been traditionally believed to be the dominant modality and highly self-contained can be strongly and radically influenced by cross-modal stimulation. Our ERP, MEG, and fMRI findings consistently show that visual processing is affected by sound at the earliest levels of cortical processing, namely at V1. This modulation reflects a cross-modal integration phenomenon as opposed to attentional modulation. Therefore, multisensory integration can occur even at these early stages of sensory processing, in areas that have been traditionally held to be unisensory. Cross-modal interactions depend on a number of factors, namely the temporal, spatial, and structural consistency between the stimuli. Depending on the degree of consistency between the two stimuli, a spectrum of interactions may result, ranging from complete integration, to partial integration, to complete segregation. The entire range of cross-modal interactions can be explained by a Bayesian model of causal inference wherein the inferred causal structure of the events in the environment depends on the degree of consistency between the signals as well as the prior knowledge/ bias about the causal structure. Indeed given that humans are surrounded by multiple objects and hence multiple sources of sensory stimulation, the problem of causal inference is a fundamental problem at the core of perception. The nervous system appears to have implemented the optimal solution to this problem as the perception of human observers appears to be Bayes-optimal in multiple tasks, and the Bayesian causal inference model of multisensory perception presented here can account in a unified and coherent fashion for an entire range of interactions in a multitude of tasks. Not only the performance of observers appears to be Bayes-optimal in multiple tasks, but the priors also appear to be independent of likelihoods, consistent with the notion of priors encoding the statistics of objects and events in the environment independent of sensory representations.
REFERENCES Adams, W.J., E.W. Graf, and M.O. Ernst. 2004. Experience can change the ‘light-from-above’ prior. Nature Neuroscience, 7, 1057–1058. Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14, 257–62.
230
The Neural Bases of Multisensory Processes
Angelaki, D.E., Y. Gu, and G.C. Deangelis. 2009. Multisensory integration: Psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology, 19, 452–458. Beierholm, U. 2007. Bayesian modeling of sensory cue combinations. PhD Thesis, California Institute of Technology. Beierholm, U., S. Quartz, and L. Shams. 2009. Bayesian priors are encoded independently of likelihoods in human multisensory perception. Journal of Vision, 9, 1–9. Bhattacharya, J., L. Shams, and S. Shimojo. 2002. Sound-induced illusory flash perception: Role of gamma band responses. Neuroreport, 13, 1727–1730. Bresciani, J.P., F. Dammeier, and M.O. Ernst. 2006. Vision and touch are automatically integrated for the perception of sequences of events. Journal of Vision, 6, 554–564. Calvert, G., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites in humans by application of electro-physiological criteria to the BOLD effect. NeuroImage, 14, 427–438. Clavagnier, S., A. Falchier, and H. Kennedy. 2004. Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness. Cognitive Affective Behavioral Neuroscience, 4, 117–126. Diederich, A., and H. Colonius. 2007. Modeling spatial effects in visual-tactile saccadic reaction time. Perception & Psychophysics, 69, 56–67. Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience, 22, 5749–5759. Gebhard, J.W., and G.H. Mowbray. 1959. On discriminating the rate of visual flicker and auditory flutter. American Journal of Psychology, 72, 521–528. Ghahramani, Z. 1995. Computation and psychophysics of sensorimotor integration. Ph.D. Thesis, Massachusetts Institute of Technology. Ghazanfar, A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences, 10, 278–285. Hackett, T.A., J.F. Smiley, I. Ulbert, G. Karmos, P. Lakatos, L.A. De La Mothe, and C.E. Schroeder. 2007. Sources of somatosensory input to the caudal belt areas of auditory cortex. Perception, 36, 1419–1430. Hospedales, T., and S. Vijayakumar. 2009. Multisensory oddity detection as Bayesian inference. PLoS ONE, 4, e4205. Howard, I.P., and W.B. Templeton. 1966. Human Spatial Orientation, London, Wiley. Körding, K., U. Beierholm, W.J. Ma, J.M. Tenenbaum, S. Quartz, and L. Shams. 2007. Causal inference in multisensory perception. PLoS ONE, 2, e943. Landy, M.S., L.T. Maloney, E.B. Johnston, and M. Young. 1995. Measurement and modeling of depth cue combination: In defense of weak fusion. Vision Research, 35, 389–412. Maloney, L.T., and P. Mamassian. 2009. Bayesian decision theory as a model of human visual perception: Testing Bayesian transfer. Visual Neuroscience, 26, 147–155. McDonald, J.J., W.A. Teder-Sälejärvi, and S.A. Hillyard. 2000. Involuntary orienting to sound improves visual perception. Nature, 407, 906–908. McGurk, H., and J.W. McDonald. 1976. Hearing lips and seeing voices. Nature, 264, 746–748. Pascual-Leone, A., and R. Hamilton. 2001. The metamodal organization of the brain. Progress in Brain Research, 134, 427–445. Roach, N., J. Heron, and P. McGraw. 2006. Resolving multisensory conflict: A strategy for balancing the costs and benefits of audio-visual integration. Proceedings of the Royal Society B: Biological Sciences, 273. 2159–2168. Rock, I., and J. Victor. 1964. Vision and touch: An experimentally created conflict between the two senses. Science, 143, 594–596. Rockland, K.S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology, 50, 19–26. Ronsse, R., C. Miall, and S.P. Swinnen. 2009. Multisensory integration in dynamical behaviors: Maximum likelihood estimation across bimanual skill learning. Journal of Neuroscience, 29, 8419–8428. Scheier, C.R., R. Nijwahan, and S. Shimojo. 1999. Sound alters visual temporal resolution. In Investigative Ophthalmology and Visual Science, 40, S4169. Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature, 385, 308. Shams, L., Y. Kamitani, and S. Shimojo. 2000. What you see is what you hear. Nature, 408, 788. Shams, L., Y. Kamitani, S. Thompson, and S. Shimojo. 2001. Sound alters visual evoked potentials in humans. Neuroreport, 12, 3849–3852.
Early Integration and Bayesian Causal Inference in Multisensory Perception
231
Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research, 14, 147–152. Shams, L., S. Iwaki, A. Chawla, and J. Bhattacharya. 2005a. Early modulation of visual cortex by sound: An MEG study. Neuroscience Letters, 378, 76–81. Shams, L., W.J. Ma, and U. Beierholm. 2005b. Sound-induced flash illusion as an optimal percept. Neuroreport, 16, 1923–1927. Spence, C., and J. Driver. 1997. Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1–22. Spence, C., M.E. Nicholls, N. Gillespie, and J. Driver. 1998. Cross-modal links in exogenous covert spatial orienting between touch, audition, and vision. Perception and Psychophysics, 60, 544–557. Stein, B.E., N. London, L.K. Wilkinson, and D.D. Price. 1996. Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497–506. Van Beers, R.J., A.C. Sittig, and J.J. Denier van der Gon. 1999. Integration of proprioceptive and visual position information: An experimentally supported model. Journal of Neurophysiology, 81, 1355–1364. Violentyev, A., S. Shimojo, and L. Shams. 2005. Touch-induced visual illusion. Neuroreport, 16, 1107–1110. Walker, J.T., and K.J. Scott. 1981. Auditory–visual conflicts in the perceived duration of lights, tones, and gaps. Journal of Experimental Psychology: Human Perception and Performance, 7, 1327–1339. Wallace, M.T., G.H. Roberson, W.D. Hairston, B.E. Stein, J.W. Vaughan, and J.A. Schirillo. 2004. Unifying mulitsensory signals across time and space. Experimental Brain Research, 158, 252–258. Watkins, S., L. Shams, S. Tanaka, J.-D. Haynes, and G. Rees. 2006. Sound alters activity in human V1 in association with illusory visual perception. NeuroImage, 31, 1247–1256. Watkins, S., L. Shams, O. Josephs, and G. Rees. 2007. Activity in human V1 follows multisensory perception. NeuroImage, 37, 572–578. Wozny, D.R., U.R. Beierholm, and L. Shams. 2008. Human trimodal perception follows optimal statistical inference. Journal of Vision, 8, 1–11. Yuille, A.L., and H.H. Bülthoff. 1996. Bayesian decision theory and psychophysics. In Perception as Bayesian Inference, ed. D.C. Knill and W. Richards. Cambridge, UK: Cambridge Univ. Press.
13
Characterization of Multisensory Integration with fMRI Experimental Design, Statistical Analysis, and Interpretation Uta Noppeney
CONTENTS 13.1 Functional Specialization: Mass- Univariate Statistical Approaches....................................234 13.1.1 Conjunction Analyses................................................................................................234 13.1.2 Max and Mean Criteria............................................................................................. 236 13.1.3 Interaction Approaches.............................................................................................. 236 13.1.3.1 Classical Interaction Design: 2 × 2 Factorial Design Manipulating Presence versus Absence of Sensory Inputs............................................... 236 13.1.3.2 Interaction Design: 2 × 2 Factorial Design Manipulating Informativeness or Reliability of Sensory Inputs....................................... 238 13.1.3.3 Elaborate Interaction Design: m × n Factorial Design (i.e., More than Two Levels).................................................................................................240 13.1.3.4 Interaction Analyses Constrained by Maximum Likelihood Estimation Model........................................................................................ 242 13.1.3.5 Combining Interaction Analyses with Max Criterion................................ 242 13.1.4 Congruency Manipulations....................................................................................... 243 13.1.5 fMRI Adaptation (or Repetition Suppression)........................................................... 243 13.2 Multisensory Representations: Multivariate Decoding and Pattern Classifier Analyses......246 13.3 Functional Integration: Effective Connectivity Analyses..................................................... 247 13.3.1 Data-Driven Effective Connectivity Analysis: Psychophysiological Interactions and Granger Causality............................................................................................... 247 13.3.2 Hypothesis-Driven Effective Connectivity Analysis: Dynamic Causal Modeling....................................................................................................................248 13.4 Conclusions and Future Directions........................................................................................ 249 Acknowledgments........................................................................................................................... 249 References.......................................................................................................................................249
233
234
The Neural Bases of Multisensory Processes
This chapter reviews the potential and limitations of functional magnetic resonance imaging (fMRI) in characterizing the neural processes underlying multisensory integration. The neural basis of multisensory integration can be characterized from two distinct perspectives. From the perspective of functional specialization, we aim to identify regions where information from different senses converges and/or is integrated. From the perspective of functional integration, we investigate how information from multiple sensory regions is integrated via interactions among brain regions. Combining these two perspectives, this chapter discusses experimental design, analysis approaches, and interpretational limitations of fMRI results. The first section describes univariate statistical analyses of fMRI data and emphasizes the interpretational ambiguities of various statistical criteria that are commonly used for the identification of multisensory integration sites. The second section explores the potential and limitations of multivariate and pattern classifier approaches in multisensory integration. The third section introduces effective connectivity analyses that investigate how multisensory integration emerges from distinct interactions among brain regions. The complementary strengths of data-driven and hypothesis-driven effective connectivity analyses will be discussed. We conclude by emphasizing that the combined potentials of these various analysis approaches may help us to overcome or at least ameliorate the interpretational ambiguities associated with each analysis when applied in isolation.
13.1 FUNCTIONAL SPECIALIZATION: MASS- UNIVARIATE STATISTICAL APPROACHES Mass-univariate statistical analyses are used to identify regions where information from multiple senses converges or is integrated. Over the past decade, mass-univariate analyses formed the mainstay of fMRI research in multisensory integration. In the following section, we will discuss the pros and cons of the various analyses and statistical criteria that have been applied in the fMRI literature.
13.1.1 Conjunction Analyses Conjunction analyses explicitly test whether a voxel or an area responds to several unisensory inputs. For instance, a brain area is implicated in audiovisual convergence if it responds to both auditory and visual inputs presented in isolation. Conjunction analyses are well motivated by the neurophysiological findings that unisensory cortical domains are separated from one another by transitional multisensory zones (Wallace et al. 2004) and by the proposed patchy sensory organization of higher-order association cortices such as the superior temporal sulcus (STS; Seltzer et al. 1996; Beauchamp et al. 2004). Given the location of multisensory integration in transition zones between unisensory regions, it seems rational to infer multisensory properties from responsiveness to multiple unisensory inputs. However, whereas conjunction analyses can identify candidate multisensory regions that respond to inputs from multiple senses, even when presented alone (see Figure 13.1b), they cannot capture integration processes in which one unisensory (e.g., visual) input in itself does not elicit a significant response, but rather modulates the response elicited by another unisensory (e.g., auditory) input (see Figure 13.1c). In fact, at the single neuron level, recent neurophysiological studies have demonstrated that these sorts of modulatory multisensory interactions seem to be a rather common phenomenon in both higher level regions such as STS (Barraclough et al. 2005; Avillac et al. 2007) and particularly in low level, putatively unisensory regions (Allman et al. 2009; Meredith and Allman 2009; Dehner et al. 2004; Kayser et al. 2008). Conjunction approaches are blind to these modulatory interactions that can instead be revealed by interaction analyses (see below). Even though, based on neurophysiological results, regions that respond to multiple unisensory inputs are likely to be involved in multisensory integration, conjunction analyses cannot formally dissociate (1) genuine multisensory integration from (2) regional convergence with independent sensory neuronal populations. (1) In the case of true multisensory integration, multisensory neurons
235
Characterization of Multisensory Integration with fMRI (a)
Auditory
Visual
A
V
(b)
1 0.8 0.6 0.4 0.2 0 –0.2
Threshold
V
A
Conjunction
(c)
1 0.8 0.6 0.4 0.2 0 –0.2
Threshold
A
V
FIGURE 13.1 Conjunction design and analysis. (a) Experimental design. (1) Auditory: environmental sounds; (2) visual: pictures or video clips. Example stimuli are presented as visual images and corresponding sound spectrograms. (b and c) Data analysis and interpretation. (b) A region responding to auditory “and” visual inputs when presented in isolation is identified as multisensory in a conjunction analysis. (c) A region responding only to auditory but not visual inputs is identified as unisensory in a conjunction analysis. Therefore, conjunction analyses cannot capture modulatory interactions in which one sensory (e.g., visual) input in itself does not elicit a response, but significantly modulates response of another sensory input (e.g., auditory). Bar graphs represent effect for auditory (black) and visual (darker gray) stimuli, and “multisensory” (lighter gray) effect as defined by a conjunction.
would respond to unisensory inputs from multiple sensory modalities (e.g., AV neurons to A inputs and V inputs). (2) In the case of pure regional convergence, the blood oxygen level dependent (BOLD) response is generated by independent populations of either auditory neurons or visual neurons (e.g., A neurons to A and V neurons to V inputs). Given the low spatial resolution of fMRI, both cases produce a “conjunction” BOLD response profile, i.e., regional activation that is elicited by unisensory inputs from multiple senses. Hence, conjunction analyses cannot unambiguously identify multisensory integration. From a statistical perspective, it is important to note that the term “conjunction analysis” has been used previously to refer to two distinct classes of statistical tests that have later on been coined (1) “global null conjunction analysis” (Friston et al. 1999, 2005) and (2) “conjunction null conjunction analysis” (Nichols et al. 2005). (1) A global null conjunction analysis generalizes the one-sided t-test to multiple dimensions (i.e., comparable to an F-test, but unidirectional) and enables inferences about k or more effects being present. Previous analyses based on minimum statistics have typically used the null hypothesis that k = 0. Hence, they tested whether one or more effects were present. In the context of multisensory integration, this sort of global null conjunction analysis tests whether “at least one” unisensory input significantly activates a particular region or voxel (with all unisensory inputs eliciting an effect greater than a particular minimum t value). (2) The more stringent conjunction null conjunction analysis (implemented in most software packages) explicitly tests whether a region is significantly activated by both classes of unisensory inputs. Hence, a conjunction null conjunction analysis forms a logical “and” operation of the two statistical comparisons. This second type of inference, i.e., a logical “and” operation, is needed when identifying multisensory convergence with
236
The Neural Bases of Multisensory Processes
the help of conjunction analyses. Nevertheless, because conjunction analyses were used primarily in the early stages of fMRI multisensory research, when this distinction was not yet clearly drawn, most of the previous research is actually based on the more liberal and, in this context, inappropriate global null conjunction analysis. For instance, initial studies identified integration sites of motion information by performing a global null conjunction analysis on motion effects in the visual, tactile, and auditory domains (Bremmer et al. 2001). Future studies are advised to use the more stringent conjunction null conjunction approach to identify regional multisensory convergence.
13.1.2 Max and Mean Criteria Although conjunction analyses look for commonalities in activations to unisensory inputs from multiple sensory modalities, fMRI studies based on the max criterion include unisensory and multisensory stimulation conditions. For statistical inference, the BOLD response evoked by a bisensory input is compared to the maximal BOLD response elicited by any of the two unisensory inputs. This max criterion is related, yet not identical, to the multisensory enhancement used in neurophysiological studies. fMRI studies quantify the absolute multisensory enhancement (e.g., AV – max(A,V); van Atteveldt et al. 2004). Neurophysiological studies usually evaluate the relative multisensory enhancement, i.e., the multisensory enhancement standardized by the maximal unisensory response, e.g., (AV – max(A,V))/max(A,V) (Stein and Meredith 1993; Stein et al. 2009). Despite the similarities in criterion, the interpretation and conclusions that can be drawn from neurophysiological and fMRI results differ. Although in neurophysiology, multisensory enhancement or depression in activity in single neurons unambiguously indicate multisensory integration, multisensory BOLD enhancement does not compellingly prove multisensory integration within a region. For instance, if a region contains independent visual and auditory neuronal populations, the response to an audiovisual stimulus should be equal to the sum of the auditory and visual responses, and hence, exceed the response to the maximal unisensory response (Calvert et al. 2001). Hence, like the conjunction analysis, the max criterion cannot dissociate genuine multisensory integration from regional convergence with independent unisensory populations. Nevertheless, it may be useful to further characterize the response profile of multisensory regions identified in interaction analyses using the max criterion (see Section 13.1.3.5). In addition to the max criterion, some researchers have proposed or used a mean criterion, i.e., the response to the bisensory input should be greater than the mean response to the two unisensory inputs when presented in isolation (Beauchamp 2005). However, even in true unisensory (e.g., visual) regions, responses to audiovisual stimuli (equal to visual response) are greater than the mean of the auditory and visual responses (equal to ½ visual response). Hence, the mean criterion does not seem to be theoretically warranted and will therefore not be discussed further (Figure 13.2).
13.1.3 Interaction Approaches As demonstrated in the discussion of the conjunction and max criterion approaches, the limited spatial resolution of the BOLD response precludes dissociation of true multisensory integration from regional convergence—when the bisensory response is equal to the sum of the two unisensory responses. Given this fundamental problem of independent unisensory neuronal populations within a particular region, more stringent methodological approaches have therefore posed response additivity as the null hypothesis and identified multisensory integration through response nonlinearities, i.e., the interaction between, for example, visual and auditory inputs (Calvert et al. 2001; Calvert 2001). 13.1.3.1 Classical Interaction Design: 2 × 2 Factorial Design Manipulating Presence versus Absence of Sensory Inputs In a 2 × 2 factorial design, multisensory integration is classically identified through the interaction between presence and absence of input from two sensory modalities, e.g., (A – fixation) ≠
237
Characterization of Multisensory Integration with fMRI (a)
(b)
Auditory
Visual
A
V
Max criterion:
Audiovisual
AV
Max (A – Fix, V – Fix) < AV – Fix
“Potentially multisensory” 1 0.8 Max 0.6 0.4 0.2 0 V AV A –0.2
MSE
Max criterion: Max (A – Fix, V – Fix) < AV – Fix “Only unisensory” 1 0.8 Max 0.6 0.4 0.2 0 V A AV MSE –0.2
(c)
(d)
Mean criterion:
“Only unisensory” 1 0.8 0.6 0.4 0.2 0 A V –0.2
(A – Fix) + (V – Fix) < AV – Fix 2
Mean
AV
AV – mean
FIGURE 13.2 Max and mean criteria. (a) Experimental design. (1) Auditory: environmental sounds; (2) visual: pictures or video clips; (3) audiovisual: sounds + concurrent pictures. Example stimuli are presented as visual images and corresponding sound spectrograms. (b–d) Data analysis and interpretation. (b) A region where audiovisual response is equal to sum of auditory and visual responses is identified as potentially multisensory. However, this activation profile could equally well emerge in a region with independent auditory and visual neuronal populations. (c and d) A “unisensory” region responding equally to auditory and audiovisual inputs but not to visual inputs is identified as unisensory by max criterion (C), but as multisensory by mean criterion (d). Bar graphs represent effect for auditory (black), visual (darker gray), and audiovisual (lighter gray) stimuli, and “multisensory” (gray) effect as defined by max (multisensory enhancement) or mean criteria.
(AV – V). For example, the interaction approach investigates whether the response to an auditory stimulus depends on the presence versus the absence of a visual stimulus. To relate the interaction approach to the classical neurophysiological criterion of superadditivity, we can rewrite this formula as (AV – fixation) ≠ (A – fixation) + (V – fixation) ↔ (AV + fixation) ≠ (A + V). In other words, the response to the bisensory stimulus is different from the sum of two unisensory stimuli when presented alone (with each stimulus evoked response being normalized relative to, e.g., prestimulus baseline activity; Stanford et al. 2005; Perrault et al. 2005). A positive interaction identifies regions where the bisensory response exceeds the sum of the unisensory responses—hence referred to as
238
The Neural Bases of Multisensory Processes
a superadditive response. Similarly, subadditive (and even suppressive) effects can be identified by negative interactions. Although previous fMRI research has largely ignored and discarded subadditive interactions for methodological reasons (Beauchamp 2005), recent neurophysiological studies have clearly revealed the relevance of different, i.e., superadditive and subadditive interaction profiles for multisensory integration (Stanford et al. 2005; Laurienti et al. 2005; Stanford and Stein 2007; Sugihara et al. 2006; Avillac et al. 2007). This emphasizes the need to develop methodological approaches in fMRI that enable the interpretation of subadditive interactions. A BOLD response profile consistent with a significant superadditive and subadditive interaction cannot be attributed to the summation of independent auditory and visual responses within a region and hence implicates a region in multisensory integration. Furthermore, in contradistinction to the conjunction analysis, the interaction approach does not necessitate that a multisensory region responds to unisensory input from multiple sensory modalities. Therefore, it can also capture the modulatory interactions in which auditory input modulates the processing of visual input even though the auditory input does not elicit a response when presented alone. However, this classical interaction design gives rise to four major drawbacks. First, by definition, the interaction term can only identify nonlinear combinations of modality-specific inputs, leaving out additive multisensory integration effects that have been observed at the single neuron level. Second, for the interaction term to be valid and unbiased, the use of “fixation” (the absence of auditory and visual information) precludes that subjects perform a task on the stimuli (Beauchamp 2005). This is because task-related activations are absent during the “fixation” condition, leading to an overestimation of the summed unisensory relative to the bisensory fMRI-responses in the interaction term. Yet, even in the absence of a task, the interaction term may be unbalanced with respect to processes that are induced by stimuli but not during the fixation condition. For instance, stimulus-induced exogenous attention is likely to be enhanced for (A + V) relative to (AV + fixation). Third, subadditive interactions may be because of nonlinearities or ceiling effects not only in the neuronal but also in the BOLD response—rendering the interpretation ambiguous. Fourth, during the recognition of complex environmental stimuli such as speech, objects, or actions, multisensory interactions could emerge at multiple processing levels, ranging from the integration of low-level spatiotemporal to higher-level object-related perceptual information. These different types of integration processes are all included in the statistical comparison (i.e., interaction) when using a “fixation” condition (Werner and Noppeney 2010c). Hence, a selective dissociation of integration at multiple processing stages such as spatiotemporal and object-related information is not possible (Figure 13.3). 13.1.3.2 Interaction Design: 2 × 2 Factorial Design Manipulating Informativeness or Reliability of Sensory Inputs Some of the drawbacks of the classical interaction design can, in part, be addressed in a 2 × 2 factorial design that manipulates (1) visual informativeness (intact = Vi, noise = Vn) and (2) auditory informativeness (intact = Ai, noise = An). Even though the audiovisual noise stimulus does not provide visual or auditory object information, pure noise stimuli can be treated as a “degraded object stimulus” by subjects (Gosselin and Schyns 2003). Hence, in contrast to the classical interaction that manipulates the presence versus the absence of inputs, subjects can perform a task on the “noise” stimulus rendering the interaction AiVi + VnAn ≠ AiVn + ViAn matched with respect to stimulus evoked attention and response selection processes at least to a certain degree. Obviously, conditions cannot be matched entirely with respect to task demands. However, performance differences in a multisensory integration study should generally not be considered a confound, but rather an interesting property of multisensory integration. Indeed, it is an important question how neural processes mediate multisensory benefits. Furthermore, as auditory and visual inputs are provided in all conditions, the audiovisual interaction focuses selectively on the integration of higher-order object features rather than low-level spatiotemporal information (Figure 13.4). Hence, this design is a first step toward dissociating multisensory integration at multiple processing stages (Werner and Noppeney 2010a).
239
Characterization of Multisensory Integration with fMRI (a)
Auditory Absent
AV
V
Absent
Visual
Present
Present
+ A
Fix
(b) Interaction: Superadditive: (AV + Fix) – (A + V) 1 0.8 0.6 0.4 0.2 0 –0.2
A
V
AV
Fix
> 0
AV + Fix A + V
(c) Subadditive: (AV + Fix) – (A + V) < 0 Enhancement: AV > max (A, V) 1 0.8 0.6 0.4 0.2 0 A V AV Fix AV + Fix A + V –0.2 (d) Subadditive: Suppression: 1 0.8 0.6 0.4 0.2 0 –0.2
A
MSI
MSI
(AV + Fix) – (A + V) < 0 AV < max (A, V)
V
AV
Fix
AV + Fix A + V MSI
FIGURE 13.3 Classical interaction design: 2 × 2 factorial design manipulating presence versus absence of sensory inputs. (a) Experimental design: 2 × 2 factorial design with the factors (1) auditory: present versus absent; (2) visual: present versus absent. Example stimuli are presented as visual images and corresponding sound spectrograms. (b–d) Data analysis and interpretation. Three activation profiles are illustrated. (b) Superadditive interaction as indexed by a positive MSI effect. (c) Subadditive interaction as indexed by a negative interaction term in context of audiovisual enhancement. (d) Subadditive interaction as indexed by a negative interaction term in context of audiovisual suppression. Please note that subadditive (yet not suppressive) interactions can also result from nonlinearities in BOLD response. Bar graphs represent effect for auditory (black), visual (darker gray), and audiovisual (lighter gray) stimuli, and “multisensory” (gray) effect as defined by audiovisual interaction (AV + Fix) – (A + V). To facilitate understanding, two additional bars are inserted indicating sums that enter into interaction, i.e., AV + Fix and A + V.
240
The Neural Bases of Multisensory Processes Auditory
(a)
Noise
AiVi
ViAn
AiVn
AnVn
Noise
Visual
Intact
Intact
(b) Interaction: (AiVi + AdVd) – (AiVd + ViAd) = MSI e.g., superadditive 1 0.8 0.6 0.4 0.2 0 –0.2
AiVn
ViAn
AiVi AnVn
AiVi AiVn + AnVn + ViAn
MSI
FIGURE 13.4 Interaction design: 2 × 2 factorial design manipulating reliability of sensory inputs. (a) Experimental design. 2 × 2 factorial design with the factors (1) auditory: reliable versus unreliable; (2) visual: reliable versus unreliable. Example stimuli are presented as visual images and corresponding sound spectrograms. Please note that manipulating stimulus reliability rather than presence evades the problem of fixation condition. (b) Data analysis and interpretation. One activation profile is illustrated as an example: superadditive interaction as indexed by a positive MSI effect.
13.1.3.3 Elaborate Interaction Design: m × n Factorial Design (i.e., More than Two Levels) The drawbacks of the classical interaction design can be ameliorated further if the factorial design includes more than two levels. For instance, in a 3 × 3 factorial design, auditory and visual modalities may include three levels of sensory input: (1) sensory intact = Vi or Ai, (2) sensory degraded = Vd or Ad, or (3) sensory absent (Figure 13.5). This more elaborate interaction design enables the dissociation of audiovisual integration at multiple stages of information processing (Werner and Noppeney 2010b). The interaction approach can thus open up the potential for a fine-grained characterization of the neural processes underlying the integration of different types of audiovisual information. In addition to enabling the estimation of interactions, it also allows us to compare interactions across different levels. For instance, in a 3 × 3 factorial design, we can investigate whether an additive response combination for degraded stimuli turns into subadditive response combinations for intact stimuli by comparing the superadditivitydegraded to superadditivityintact (formally: AdVd + fixation – Vd – Ad > AiVi + fixation – Vi – Ai → AdVd – Vd – Ad > AiVi – Vi – Ai). Thus, an additive integration profile at one particular sensory input level becomes an interesting finding when it is statistically different from the integration profile (e.g., subadditive) at a different input level. In this way, the interaction approach that is initially predicated on response nonlinearities is rendered sensitive to additive combinations of unisensory responses. Testing for changes in superadditivity (or subadditivity) across different stimulus levels can also be used as a test for the principle of inverse effectiveness. According to the principle of inverse effectiveness, superadditivity is expected to decrease with stimulus efficacy as defined by, for instance, stimulus intensity or informativeness. A more superadditive or less subadditive integration profile would be expected for weak signal intensities (Stein and Stanford 2008). Finally, it should be emphasized that this
241
Characterization of Multisensory Integration with fMRI Auditory
(a)
Degraded
Absent
AiVi
AdVi
AaVi
AiVd
AdVd
AaVd
Degraded Absent
Visual
Intact
Intact
+ AiVa
AdVa
Fix
(b) Interaction: (AiVi + Fix) – (Ai + Vi) = MSIi 1 0.8 0.6 0.4 0.2 0 –0.2
Ai
Vi
AiVi
Fix
AiVi + Fix
Ai + Vi MSIi
(c) Interaction: (AdVd + Fix) – (Ad + Vd) = MSId 1 0.8 0.6 0.4 0.2 0 –0.2
Ad
Vd
AdVd
Fix
AdVd + Fix
Ad + Vd
MSId
(d) Inverse effectiveness: MSId – MSIi 1 0.8 0.6 0.4 0.2 0 –0.2
IE MSId
MSIi
FIGURE 13.5 “Elaborate” interaction design with more than two levels. (a) Experimental design: 3 × 3 factorial design with factors (1) auditory: (i) auditory intact = Ai, (ii) auditory degraded = Ad, and (iii) auditory absent Aa; (2) visual: (i) visual intact = Vi, (ii) visual degraded = Vd, and (iii) visual absent Va. Example stimuli are presented as visual images and corresponding sound spectrograms. (b–d) Data analysis and interpretation. This more elaborate design enables computation of (b) interaction for intact stimuli (MSIi), (c) interaction for degraded stimuli (MSId), and (d) inverse effectiveness contrast, i.e., MSId – MSI i = (AdVd – Vd – Ad) – (A iVi – Vi – Ai) that does not depend on fixation condition.
242
The Neural Bases of Multisensory Processes
more complex inverse effectiveness contrast does not depend on the “fixation” condition, as that is included on both sides of the inequality (and eliminated from the contrast). Thus, the inverse effectiveness contrast is an elegant way to circumvent the problems associated with the fixation condition mentioned above (Stevenson et al. 2009; Stevenson and James 2009; Werner and Noppeney 2010b; also, for a related approach in which audiovisual interactions are compared between intelligible and nonintelligible stimuli, see Lee and Noppeney 2010). 13.1.3.4 Interaction Analyses Constrained by Maximum Likelihood Estimation Model A more elaborate interaction design also accommodates more sophisticated analyses developed from the maximum likelihood framework. Numerous psychophysics studies have shown that humans integrate information from multiple senses in a Bayes optimal fashion by forming a weighted average of the independent sensory estimates (maximum likelihood estimation, MLE; Ernst and Banks 2002; Knill and Saunders 2003). This multisensory percept is Bayes optimal in that it yields the most reliable percept (n.b., reliability is the inverse of variance). Combining fMRI and an elaborate interaction design, we can investigate the neural basis of Bayes optimal multisensory integration at the macroscopic scale as provided by the BOLD response. First, we can investigate whether regional activations are modulated by the relative reliabilities of the unisensory estimates as predicted by the MLE model. For instance, in visuo–tactile integration, we would expect the activation in the somatosensory cortex during visuo–tactile stimulation to increase when the reliability of visual input is reduced and higher weight is attributed to the tactile input (Helbig et al. 2010). Second, we can investigate whether differential activations (i.e., bisensory–unisensory) in higher-order association cortices, for instance, reflect the increase in reliability during bisensory stimulation as predicted by the MLE model. This reliability increase for bisensory stimulation should be maximal when the reliabilities of the two unisensory inputs are equal. By cleverly manipulating the reliabilities of the two sensory inputs, we can thus independently test the two main MLE predictions within the same interaction paradigm: (1) the contributions of the sensory modalities to multisensory processing depend on the reliability of the unisensory estimates and (2) the reliability of the multisensory estimate is greater than the reliability of each unisensory estimate. 13.1.3.5 Combining Interaction Analyses with Max Criterion Interaction analyses can be used to refute the possibility of independent unisensory neuronal populations in a region. Nevertheless, a significant interaction is still open to many different functional interpretations. Further insights need to be gained from the activation profile of the unisensory and bisensory conditions that formed the interaction contrast. More formally, the activation profiles of superadditive and subadditive interactions can be further characterized according to the max criterion (for a related approach, see Avillac et al. 2007; Perrault et al. 2005; Werner and Noppeney 2010c). For instance, a subadditive interaction in which the audiovisual response is greater than the maximal unisensory response may simply be because of nonlinearities in the BOLD response (e.g., saturation effects) and needs to be interpreted with caution. In contrast, a subadditive interaction in which the audiovisual response is smaller than the maximal unisensory response cannot easily be attributed to such nonlinearities in the BOLD response. Instead, suppressive interactions indicate that one sensory input modulates responses to the other sensory input (Sugihara et al. 2006). Finally, a subadditive interaction with equivalent responses for auditory, visual, and audiovisual conditions is most parsimoniously explained by amodal functional properties of a particular brain region. Rather than genuinely integrating inputs from multiple sensory modalities, an amodal region may be located further “upstream” and be involved in higher-order processing of already integrated inputs. For instance, in audiovisual speech integration, a region involved in amodal semantic processing may be equally activated via visual, auditory, or audiovisual inputs. These examples demonstrate that a significant interaction is not the end, but rather the starting point of analysis and interpretation. To reach conclusive interpretations, a careful characterization of the activation profile is required.
Characterization of Multisensory Integration with fMRI
243
13.1.4 Congruency Manipulations Congruency manipulations are based on the rationale that if a region distinguishes between congruent and incongruent component pairs, it needs to have access to both sensory inputs. Congruency manipulations can be used to focus selectively on different aspects of information integration. For instance, audiovisual stimuli can be rendered incongruent in terms of space (Fairhall and Macaluso 2009; Busse et al. 2005; Bonath et al. 2007), time (Noesselt et al. 2007; Lewis and Noppeney 2010), phonology (van Atteveldt et al. 2007a; Noppeney et al. 2008), or semantics (Doehrmann and Naumer 2008; Hein et al. 2007; Noppeney et al. 2008, 2010; Sadaghiani et al. 2009; Adam and Noppeney 2010). Thus, congruency manipulations seem to be ideal to dissociate multisensory integration at multiple processing stages. However, the interpretation of congruency results is impeded by the fact that incongruencies are usually artifactual and contradict natural environmental statistics. At the behavioral level, it is well-known that multisensory integration breaks down and no unified multisensory percept is formed when the senses disagree. However, it is currently unknown how the human brain responds when it encounters discrepancies between the senses. Most of the previous fMRI research has adopted the view that integration processes are reduced for incongruent sensory inputs (Calvert et al. 2000; van Atteveldt et al. 2004; Doehrmann and Naumer 2008). Hence, comparing congruent to incongruent conditions was thought to reveal multisensory integration regions. However, the brain may also unsuccessfully attempt to integrate the discrepant sensory inputs. In this case, activations associated with multisensory integration may actually be enhanced for unfamiliar incongruent (rather than familiar congruent) sensory inputs. A similar argument has been put forward in the language processing domain where activations associated with lexical retrieval were found to be enhanced for pseudowords relative to familiar words, even though pseudowords are supposedly not endowed with a semantic representation (Price et al. 1996). Finally, within the framework of predictive coding, the brain may act as a prediction device and generate a prediction error signal when presented with unpredictable incongruent sensory inputs. Again, in this case, increased activations would be expected for incongruent rather than congruent sensory inputs in brain areas that are involved in processing the specific stimulus attributes that define the incongruency (e.g., temporal, spatial, semantic, etc.). As fMRI activations are known to be very susceptible to top-down modulation and cognitive set, these inherent interpretational ambiguities limit the role of incongruency manipulations in the investigation of multisensory integration, particularly for fMRI (rather than neurophysiological) studies. In fact, a brief review of the literature seems to suggest that congruency manipulations strongly depend on the particular cognitive set and experimental paradigm. Under passive listening/viewing conditions, increased activations have been reported primarily for congruent relative to incongruent conditions (Calvert et al. 2000; van Atteveldt et al. 2004). In contrast, in selective attention paradigms, where subjects attend to one sensory modality and ignore sensory inputs from other modalities, the opposite pattern has been reported, i.e., increased activations are observed for incongruent relative to congruent inputs (Noppeney et al. 2008, 2010; Sadaghiani et al. 2009). Finally, when subjects perform a congruency judgment that requires access and comparison of the two independent unisensory percepts and hence precludes natural audiovisual integration, differences between congruent and incongruent stimulus pairs are attenuated (van Atteveldt et al. 2007b). This complex pattern of fMRI activations suggest that incongruency does not simply prevent the brain from integrating sensory inputs, but elicits a range of other cognitive effects and top-down modulations that need to be taken into account when interpreting fMRI results.
13.1.5 fMRI Adaptation (or Repetition Suppression) fMRI adaptation (used here synonymously with repetition suppression) refers to the phenomenon that prior processing of stimuli (or stimulus attributes) decreases activation elicited by processing
244
The Neural Bases of Multisensory Processes
subsequent stimuli with identical attributes. Repetition suppression has frequently been interpreted as the fMRI analogue of neuronal response suppression, i.e., a decrease in neuronal firing rate as recorded in nonhuman primates (Desimone 1996). Despite current uncertainties about its underlying neural mechanisms, fMRI repetition suppression has been widely used as a tool for dissociating and mapping the various stages of sensory and cognitive processing. These fMRI experiments are based on the rationale that the sensitivity of a brain region to variations in stimulus attributes determines the degree of repetition suppression: the more a brain region is engaged in processing and hence sensitive to a particular stimulus feature, the more it will adapt to stimuli that are identical with respect to this feature—even though they might vary with respect to other dimensions (GrillSpector and Malach 2001; Grill-Spector et al. 2006). Repetition suppression can thus be used to define the response selectivity and invariance of neuronal populations within a region. Initial fMRI adaptation paradigms have used simple block designs, i.e., they presented alternating blocks of “same (adaptation)” versus “different (no adaptation)” stimuli. However, arrangement of the stimuli in blocks introduces a strong attentional confound that renders the interpretations of the adaptation effect difficult (even when attempts are made to maintain attention in a control task). More recent studies have therefore used randomized fMRI adaptation paradigms that reduce attentional topdown modulation at least to a certain degree. In addition to attentional confounds, task effects (e.g., response priming) need to be very tightly controlled in adaptation paradigms (for further discussion, see Henson and Rugg 2003; Henson 2003). In the field of multisensory integration, fMRI adaptation may be used to identify “amodal” neural representations. Thus, despite the changes in sensory modality, a multisensory or amodal region should show fMRI adaptation when presented with identical stimuli in different sensory modalities. For instance, presenting identical words subsequently in a written and spoken format, this crossmodal adaptation effect was used to identify amodal or multisensory phonological representations (Noppeney et al. 2008; Hasson et al. 2007). fMRI adaptation paradigms may also be combined with the outlined interaction approach. Here, a 2 × 2 factorial design would manipulate the repetition of (1) visual and (2) auditory features. A region that integrates visual and auditory features is then expected to show an interaction between the auditory and visual repetition effects, e.g., an increased visual adaptation, if the auditory feature is also repeated (Tal and Amedi 2009). This experimental approach has recently been used to study form and motion integration within the visual domain (Sarkheil et al. 2008). Most commonly, fMRI adaptation is used to provide insights into subvoxel neuronal representation. This motivation is based on the so-called fatigue model that proposes that the fMRI adaptation effect is attributable to a “fatigue” (as indexed by decreased activity) of the neurons initially responding to a specific stimulus (Grill-Spector and Malach 2001). For instance, let us then assume that a voxel contains populations of A and B neurons and responds equally to stimuli A and B, so that a standard paradigm would not be able to reveal selectivity for stimulus A. Yet, repetitive presentation of stimulus A will only fatigue the A-responsive neurons. Therefore, subsequent presentation of stimulus B will lead a rebound response of the “fresh” B neurons. Thus, it was argued the fMRI adaptation can increase the spatial resolution to a subvoxel level. Along similar lines, fMRI adaptation could potentially be used to dissociate unisensory and multisensory neuronal populations. In the case of independent populations of visual and auditory neurons (no multisensory neurons), after adaptation to a specific visual stimulus, a rebound in activation should be observed when the same stimulus is presented in the auditory modality. This activation increase should be comparable to the rebound observed when presented with a new unrelated stimulus. In contrast, if a region contains multisensory neurons, it will adapt when presented with the same stimulus irrespective of sensory modality. Thus, within the fatigue framework, fMRI adaptation may help us to dissociate unisensory and multisensory neuronal populations that evade standard analyses. However, it is likely that voxels containing visual and auditory neurons will also include audiovisual neurons. This mixture of multiple neuronal populations within a voxel may produce a more complex adaptation profile than illustrated in our toy example. Furthermore, given the diversity of multisensory enhancement and depression profiles for concurrently presented sensory inputs,
245
Characterization of Multisensory Integration with fMRI
the adaptation profile for asynchronously presented inputs from multiple modalities is not yet well characterized—it may depend on several factors such as the temporal relationship, stimulus intensity, and a voxel’s responsiveness. Even in the “simple” unisensory case, the interpretation of fMRI adaptation results is impeded by our lack of understanding of the underlying neuronal mechanisms as well as the relationship between the decreased BOLD activation and neuronal response suppression (for review and discussion, see Henson and Rugg 2003; Henson 2003). In fact, multiple models and theories have been advanced to explain repetition suppression. (1) According to the fMRI adaptation approach (the “fatigue” model mentioned above), the number of neurons that are important for stimulus representation and processing remain constant, but show reductions in their firing rates for repeated stimuli (Grill-Spector and Malach 2001). (2) Repetition suppression has been attributed to a sharpening Presentation 1
Presentation 2 (a)
ity al
od
m
ry
so
en
Sa
m
es
Sa
m
es
tim
ul
us
Stim 1
Stim 1
1 0.8 0.6 0.4 0.2 0 –0.2
(b)
ulus ity e stim odal Sam ry m o s n se rent Diffe
(c)
iff
y lit us ul oda tim m t s ry en so er en iff nt s e er
D
D
Diff e Sam rent st imu e se lus nso ry m oda lity
Stim 1
Stim 2
(d)
Stim 2
1 0.8 0.6 0.4 0.2 0 –0.2 1 0.8 0.6 0.4 0.2 0 –0.2 1 0.8 0.6 0.4 0.2 0 –0.2
Cases -Unisensory -Multisensory -Amodal
-Unisensory
-Amodal -Multisensory?
1 0.8 0.6 0.4 0.2 0 –0.2
-Unisensory -Multisensory -Amodal
1 0.8 0.6 0.4 0.2 0 –0.2
-Unisensory -Multisensory -Amodal
FIGURE 13.6 Cross-modal fMRI adaptation paradigm and BOLD predictions. Figure illustrates BOLD predictions for different stimulus pairs with (1) stimulus and/or (2) sensory modality being same or different for the two presentations. Please note that this simplistic toy example serves only to explain fundamental principles rather than characterizing the complexity of multisensory adaptation profiles (see text for further discussion). (a) Same stimulus, same sensory modality: decreased BOLD response is expected in unisensory, multisensory, and amodal areas. (b) Same stimulus, different sensory modality: decreased BOLD response is expected for higher-order “amodal” regions and not for unisensory regions. Given the complex interaction profiles for concurrently presented sensory inputs, prediction for multisensory regions is unclear. Different stimulus, same sensory modality (c) and different stimulus, different sensory modality (d). No fMRI adaptation is expected of unisensory, multisensory, or amodal regions.
246
The Neural Bases of Multisensory Processes
of the cortical stimulus representations, whereby neurons that are not essential for stimulus processing respond less for successive stimulus presentations (Wiggs and Martin 1998). (3) In neural network models, repetition suppression is thought to be mediated by synaptic changes that decrease the settling time of an attractor neural network (Becker et al. 1997; Stark and McClelland 2000). (4) Finally, hierarchical models of predictive coding have proposed that response suppression reflects reduced prediction error, i.e., the brain learns to predict the stimulus attributes on successive exposures to identical stimuli, the firing rate of stimulus-evoked error units are suppressed by top-down predictions mediated by backward connections from higher-level cortical areas (Friston 2005). The predictive coding model raises questions about the relationship between cross-modal congruency and adaptation effects. Both fMRI adaptation and congruency designs manipulate the “congruency” between two stimuli. The two approaches primarily differ in the (a)synchrony between the two sensory inputs. For instance, spoken words and the corresponding facial movements would be presented synchronously in a classical congruency paradigm and sequentially in an adaptation paradigm. The different latencies of the sensory inputs may induce distinct neural mechanisms for congruency and/or adaptation effects. Yet, events in the natural environment often produce temporal asynchronies between sensory signals. For instance, facial movements usually precede the auditory speech signal. Furthermore, the asynchrony between visual and auditory signals depends on the distance between signal source and observer because of differences in velocity of light and sound. Finally, the neural processing latencies for signals from different sensory modalities depend on the particular brain regions and stimuli, which will lead, in turn, to variations in the width and asymmetry of temporal integration windows as a function of stimulus and region. Collectively, the variability in latency and temporal integration window suggests a continuum between “syn chronous” congruency effects and “asynchronous” adaptation effects that may rely on distinct and shared neural mechanisms (Figure 13.6).
13.2 MULTISENSORY REPRESENTATIONS: MULTIVARIATE DECODING AND PATTERN CLASSIFIER ANALYSES All methodological approaches discussed thus far were predicated on encoding models using mass-univariate statistics. In other words, these approaches investigated how external variables or stimulus functions cause and are thus encoded by brain activations in a regionally specific fashion. This is a mass-univariate approach because a general linear model with the experimental variables as predictors is estimated independent for each voxel time course followed by statistical inference (n.b., statistical dependencies are usually taken into account at the stage of statistical inference, using, e.g., Gaussian random field theory; Friston et al. 1995). Over the past 5 years, multivariate decoding models and pattern classifiers have progressively been used in functional imaging studies. In contrast to encoding models that infer a mapping from experimental variables to brain activations, these decoding models infer a mapping from brain activations to cognitive states. There are two main approaches: (1) canonical correlation analyses (and related models such as linear discriminant analyses, etc.) infer a mapping from data features (voxel activations) to cognitive states using classical multivariate statistics (based on Wilk’s lambda). Recently, an alternative Bayesian method, multivariate Bayesian decoding, has been proposed that uses a parametric empirical or hierarchical Bayesian model to infer the mapping from voxel activations to a target variable (Friston et al. 2008). (2) Pattern classifiers (e.g., using support vector machines) implicitly infer a mapping between voxel patterns and cognitive states via cross-validation schemes and classification performance on novel unlabeled feature vectors (voxel activation pattern). To this end, the data are split into two (or multiple) sets. In a cross-validation scheme, the classifier is trained on set 1 and its generalization performance is tested on set 2 (for a review, see Haynes and Rees 2006; Pereira et al. 2009). Linear classifiers are often used in functional imaging, as the voxel weights after training provide direct insights into the contribution of different voxels to the classification performance. Thus, even if the classifier is applied to the entire brain, the voxel weights
Characterization of Multisensory Integration with fMRI
247
may indicate regional functional specialization. Furthermore, multivariate decoding approaches can also be applied locally (at each location in the brain) using searchlight procedures (Nandy and Cordes 2003; Kriegeskorte et al. 2006). Because multivariate decoding and pattern classifiers extract the discriminative signal from multiple voxels, they can be more sensitive than univariate encoding approaches and provide additional insights into the underlying distributed neural representations. By carefully designing training and test sets, pattern classifiers can also characterize the invariance of the neural representations within a region. Within the field of multisensory integration, future studies may, for instance, identify amodal representations by investigating whether a pattern classifier that is trained on visual stimuli generalizes to auditory stimuli. In addition, pattern classifiers trained on different categories of multisensory stimuli could be used to provide a more fine-grained account of multisensory representations in low level putatively unisensory and higher order multisensory areas.
13.3 FUNCTIONAL INTEGRATION: EFFECTIVE CONNECTIVITY ANALYSES From the perspective of functional integration, effective connectivity analyses can be used to investigate how information from multiple senses is integrated via distinct interactions among brain regions. In contrast with functional connectivity analyses that simply characterize statistical dependencies between time series in different voxels or regions, effective connectivity analyses investigate the influence that one region exerts on another region. The aim of these analyses is to estimate and make inference about the coupling among brain areas and how this coupling is influenced by experimental context (e.g., cognitive set, task). We will limit our discussion to approaches that have already been applied in the field of multisensory integration. From the experimenter’s perspective, the models are organized according to data-driven and hypothesis-driven approaches for effective connectivity, even though this is only one of many differences and possible classifications.
13.3.1 Data-Driven Effective Connectivity Analysis: Psychophysiological Interactions and Granger Causality Early studies have used simple regression models to infer a context-dependent change in effective connectivity between brain regions. In psychophysiological interaction analyses, the activation time courses in each voxel within the brain are regressed on the time course in a particular seed voxel under two contexts (Friston et al. 1997). A change in coupling is inferred from a change in regression slopes under the two contexts. Based on a psychophysiological interaction analysis, for instance, visuo–tactile interactions in the lingual gyrus were suggested to be induced by increased connectivity from the parietal cortex (Macaluso et al. 2000). Similarly, a psychophysiological interaction analysis was used to demonstrate increased coupling between the left prefrontal cortex and the inferior temporal gyrus in blind, relative to sighted, subjects as a results of cross-modal plasticity (Noppeney et al. 2003). More recent approaches aim to infer directed connectivity based on Granger causality that is temporal precedence. A time series X is said to Granger cause Y, if the history of X (i.e., the lagged values of X) provides statistically significant information about future values of Y, after taking into account the known history of Y. Inferences of Granger causality are based on multivariate autoregressive models or directed information transfer (a measure derived from mutual information; Roebroeck et al. 2005; Goebel et al. 2003; Harrison et al. 2003; Hinrichs et al. 2006). It is important to note that Granger causality does not necessarily imply true causality because a single underlying process may cause both signals X and Y, yet with different lags. Furthermore, temporal differences between regions in hemodynamic time series that result from variations in vascular architecture and hemodynamic response functions may be misinterpreted as causal influences. The second problem can be partly controlled by comparing Granger causality across two conditions and prior deconvolution to obtain an estimate of the underlying neuronal
248
The Neural Bases of Multisensory Processes
signals (Roebroeck et al. 2009; David et al. 2008). As a primarily data-driven approach, the analysis estimates the Granger causal influences of a seed region on all other voxels in the brain. Because this analysis approach does not require an a priori selection of regions of interest, it may be very useful to generate hypotheses that may then be further evaluated on new data in a more constrained framework. Recently, Granger causality has been used to investigate and reveal top-down influences from the STS on auditory cortex/planum temporale in the context of letter–speech sound congruency (multivariate autoregressive models; van Atteveldt et al. 2009) and temporal synchrony manipulations (directed information transfer; Noesselt et al. 2007). For instance, van Atteveldt et al. (2009) have suggested that activation increases for congruent relative to incongruent letter–sound pairs may be mediated via increased connectivity from the STS. Similarly, Granger causality has been used to investigate the influence of somatosensory areas on the lateral occipital complex during shape discrimination (Deshpande et al. 2010; Peltier et al. 2007).
13.3.2 Hypothesis-Driven Effective Connectivity Analysis: Dynamic Causal Modeling The basic idea of dynamic causal modeling (DCM) is to construct a reasonable realistic model of interacting brain regions that form the key players of the functional system under investigation (Friston et al. 2003). DCM treats the brain as a dynamic input–state–output system. The inputs correspond to conventional stimulus functions encoding experimental manipulations. The state variables are neuronal activities and the outputs are the regional hemodynamic responses measured with fMRI. The idea is to model changes in the states, which cannot be observed directly, using the known inputs and outputs. Critically, changes in the states of one region depend on the states (i.e., activity) of others. This dependency is parameterized by effective connectivity. There are three types of parameters in a DCM: (1) input parameters that describe how much brain regions respond to experimental stimuli, (2) intrinsic parameters that characterize effective connectivity among regions, and (3) modulatory parameters that characterize changes in effective connectivity caused by experimental manipulation. This third set of parameters, the modulatory effects, allows us to explain context-sensitive activations by changes in coupling among brain areas. Importantly, this coupling (effective connectivity) is expressed at the level of neuronal states. DCM uses a forward model, relating neuronal activity to fMRI data, which can be inverted during the model fitting process. Put simply, the forward model is used to predict outputs using the inputs. During model fitting, the parameters are adjusted so that the predicted and observed outputs match. Thus, DCM differs from (auto)regressive-like models that were discussed in the previous section in three important aspects: (1) it is a hypothesis-driven approach that requires a priori selection of regions and specification of model space in terms of potential connectivity structures, (2) the neuronal responses are driven by experimentally designed inputs rather than endogenous noise, and (3) the regional interactions emerge at the neuronal level and are transformed into observable BOLD response using a biophysically plausible hemodynamic forward model. DCM can be used to make two sorts of inferences: first, we can compare multiple models that embody hypotheses about functional neural architectures. Using Bayesian model selection, we will infer the optimal model given the data (Penny et al. 2004; Stephan et al. 2009). Second, given the optimal model, we can make inference on connectivity parameters (Friston et al. 2003). For instance, we can compare the strength of forward and backward connections or test whether attention modulates the connectivity between sensory areas. In the field of multisensory integration, DCM has been used to investigate whether incongruency effects emerge via forward or backward connectivity. Comparing DCMs in which audiovisual incongruency modulates either the forward or the backward connectivity, we suggested that increased activation for incongruent relative to congruent stimulus pairs is mediated via enhanced forward connectivity from low-level auditory areas to STS and IPS (Noppeney et al. 2008). More recently, we used DCM to address the question of whether audiovisual interactions in low-level auditory areas (superior temporal gyrus; Driver and Noesselt 2008; Schroeder and Foxe 2005) are mediated via direct connectivity from visual occipital areas or
249
Characterization of Multisensory Integration with fMRI (a) ‘Direct influence’ DCM
(b) ‘Indirect influence’ DCM
STS
STG A
AV
STS AV
AV
CaS
STG
CaS
V
A
V
FIGURE 13.7 Candidate dynamic causal models. (a) “Direct” influence DCM: audiovisual costimulation modulates direct connectivity between auditory and visual regions. (b) “Indirect” influence DCM: audiovisual costimulation modulates indirect connectivity between auditory and visual regions. STG, superior temporal gyrus; CaS, calcarine sulcus; A, auditory input; V, visual input; AV, audiovisual input.
indirect pathways via the STS. Partitioning the model space into “direct,” “indi rect,” or “indirect + direct” models suggested that visual input may influence auditory processing in the superior temporal gyrus via direct and indirect connectivity from visual cortices (Lewis and Noppeney 2010; Noppeney et al. 2010; Werner and Noppeney 2010a; Figure 13.7).
13.4 CONCLUSIONS AND FUTURE DIRECTIONS Multisensory integration has been characterized with fMRI using a variety of experimental design and statistical analysis approaches. When applied in isolation, each approach provides only limited insights and can lead to misinterpretations. A more comprehensive picture may emerge by combining the potentials of multiple methodological approaches. For instance, pattern classifiers and fMRI adaptation may be jointly used to provide insights into subvoxel neuronal representations and dissociate unisensory and multisensory neuronal populations. Amodal neural representations may then be identified, if the classification performance and fMRI adaptation generalizes across stimuli from different sensory modalities. Increased spatial resolution at higher field strength will enable us to more thoroughly characterize the response properties of individual regions. To go beyond structure–function mapping, we also need to establish the effective connectivity between regions using neurophysiologically plausible observation models. Understanding the neural mechanisms of multisensory integration will require an integrative approach combining computational modeling and the complementary strengths of fMRI, EEG/MEG, and lesion studies.
ACKNOWLEDGMENTS We thank Sebastian Werner, Richard Lewis, and Johannes Tünnerhoff for helpful comments on a previous version of this manuscript and JT for his enormous help with preparing the figures.
REFERENCES Adam, R., U. Noppeney. 2010. Prior auditory information shapes visual category-selectivity in ventral occipitotemporal cortex. NeuroImage 52:1592–1602. Allman, B.L., L.P., Keniston, and M.A. Meredith. 2009. Not just for bimodal neurons anymore: The contribution of unimodal neurons to cortical multisensory processing. Brain Topography 21:157–167. Avillac, M., H.S. Ben, and J.R. Duhamel. 2007. Multisensory integration in the ventral intraparietal area of the macaque monkey. Journal of Neuroscience 27:1922–1932. Barraclough, N.E., D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience 17:377–391.
250
The Neural Bases of Multisensory Processes
Beauchamp, M.S. 2005. Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics 3:93–113. Beauchamp, M.S., B.D. Argall, J. Bodurka, J.H. Duyn, and A. Martin. 2004. Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7:1190–1192. Becker, S., M. Moscovitch, M. Behrmann, and S. Joordens. 1997. Long-term semantic priming: a computational account and empirical evidence. Journal of Experimental Psychology. Learning, Memory, and Cognition 23:1059–1082. Bonath, B., T. Noesselt, A. Martinez, J. Mishra, K. Schwiecker, H.J. Heinze, and S.A. Hillyard. 2007. Neural basis of the ventriloquist illusion. Current Biology 17:1697–1703. Bremmer, F., A. Schlack, N.J. Shah et al. 2001. Polymodal motion processing in posterior parietal and premotor cortex: A human fMRI study strongly implies equivalencies between humans and monkeys. Neuron 29:287–296. Busse, L., K.C. Roberts, R.E. Crist, D.H. Weissman, and M.G. Woldorff. 2005. The spread of attention across modalities and space in a multisensory object. Proceedings of the National Academy of Sciences of the United States of America 102:18751–18756. Calvert, G.A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex 11:1110–1123. Calvert, G.A., R. Campbell, and M.J. Brammer. 2000. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology 10:649–657. Calvert, G.A., P.C. Hansen, S.D. Iversen, and M.J. Brammer. 2001. Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. NeuroImage 14: 427–438. David, O., I. Guillemain, S. Saillet et al. 2008. Identifying neural drivers with functional MRI: An electrophysiological validation. PLoS Biology 6:2683–2697. Dehner, L.R., L.P. Keniston, H.R. Clemo, and M.A. Meredith 2004. Cross-modal circuitry between auditory and somatosensory areas of the cat anterior ectosylvian sulcal cortex: A ‘new’ inhibitory form of multisensory convergence. Cerebral Cortex 14:387–403. Deshpande, G., X. Hu, S. Lacey, R. Stilla, and K. Sathian. 2010. Object familiarity modulates effective connectivity during haptic shape perception. NeuroImage 49:1991–2000. Desimone, R. 1996. Neural mechanisms for visual memory and their role in attention. Proceedings of the National Academy of Sciences of the United States of America 93:13494–13499. Doehrmann, O., and M.J. Naumer. 2008. Semantics and the multisensory brain: how meaning modulates processes of audio-visual integration. Brain Research 1242:136–150. Driver, J., and T. Noesselt 2008. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments. Neuron 57:11–23. Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415:429–433. Fairhall, S.L., and E. Macaluso. 2009. Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. European Journal of Neuroscience 29:1247–1257. Friston, K. 2005. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 360:815–836. Friston, K., C. Chu, J. Mourao-Miranda, O. Hulme, G. Rees, W. Penny, and J. Ashburner. 2008. Bayesian decoding of brain images. NeuroImage 39:181–205. Friston, K.J., C. Buechel, G.R. Fink, J. Morris, E. Rolls, and R.J. Dolan. 1997. Psychophysiological and modulatory interactions in neuroimaging. NeuroImage 6:218–229. Friston, K.J., L. Harrison, and W. Penny. 2003. Dynamic causal modelling. NeuroImage 19:1273–1302. Friston, K.J., A. Holmes, K.J. Worsley, J.B. Poline, C.D. Frith, and R. Frackowiak. 1995. Statistical parametric mapping: A general linear approach. Human Brain Mapping 2:189–210. Friston, K.J., A.P. Holmes, C.J. Price, C. Buchel, and K.J. Worsley. 1999. Multisubject fMRI studies and conjunction analyses. NeuroImage 10:385–396. Friston, K.J., W.D. Penny, and D.E. Glaser. 2005. Conjunction revisited. NeuroImage 25:661–667. Goebel, R., A. Roebroeck, D.S. Kim, and E. Formisano. 2003. Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magnetic Resonance Imaging 21:1251–1261. Gosselin, F., and P.G. Schyns. 2003. Superstitious perceptions reveal properties of internal representations. Psychological Science 14:505–509. Grill-Spector, K., and R. Malach. 2001. fMR-adaptation: A tool for studying the functional properties of human cortical neurons. Acta Psychologica 107:293–321.
Characterization of Multisensory Integration with fMRI
251
Grill-Spector, K., R. Henson, and A. Martin. 2006. Repetition and the brain: neural models of stimulus-specific effects. Trends in Cognitive Sciences 10:14–23. Harrison, L., W.D. Penny, and K. Friston. 2003. Multivariate autoregressive modeling of fMRI time series. NeuroImage 19:1477–1491. Hasson, U., J.I. Skipper, H.C. Nusbaum, and S.L. Small. 2007. Abstract coding of audiovisual speech: Beyond sensory representation. Neuron 56:1116–1126. Haynes, J.D., and G. Rees. 2006. Decoding mental states from brain activity in humans. Nature Reviews. Neuroscience 7:523–534. Hein, G., O. Doehrmann, N.G. Muller, J. Kaiser, L. Muckli, and M.J. Naumer. 2007. Object familiarity and semantic congruency modulate responses in cortical audiovisual integration areas. Journal of Neuroscience 27:7881–7887. Helbig, H.B., M.O. Ernst, E. Ricciardi, P. Pietrini, A. Thielscher, K.M. Mayer, J. Schultz, and U. Noppeney. 2010. Reliability of visual information modulates tactile shape processing in primary somatosensory cortices (Submitted for publication). Henson, R.N. 2003. Neuroimaging studies of priming. Progress in Neurobiology 70:53–81. Henson, R.N., and M.D. Rugg. 2003. Neural response suppression, haemodynamic repetition effects, and behavioural priming. Neuropsychologia 41:263–270. Hinrichs, H., H.J. Heinze, and M.A. Schoenfeld. 2006. Causal visual interactions as revealed by an information theoretic measure and fMRI. NeuroImage 31:1051–1060. Kayser, C., C.I. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–1574. Knill, D.C., and J.A. Saunders. 2003. Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Research 43:2539–2558. Kriegeskorte, N., R. Goebel, and P. Bandettini. 2006. Information-based functional brain mapping. Proceedings of the National Academy of Sciences of the United States of America 103:3863–3868. Laurienti, P.J., T.J. Perrault, T.R. Stanford, M.T. Wallace, and B.E. Stein. 2005. On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research 166:289–297. Lee, H., and U. Noppeney. Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension (submitted for publication). Lewis, R., and U. Noppeney. 2010. Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. Journal of Neuroscience 30:12329–12339. Macaluso, E., C.D. Frith, and J. Driver. 2000. Modulation of human visual cortex by crossmodal spatial attention. Science 289:1206–1208. Meredith, M.A., and B.L. Allman. 2009. Subthreshold multisensory processing in cat auditory cortex. Neuroreport 20:126–131. Nandy, R.R., and D. Cordes. 2003. Novel nonparametric approach to canonical correlation analysis with applications to low CNR functional MRI data. Magnetic Resonance in Medicine 50:354–365. Nichols, T., M. Brett, J. Andersson, T. Wager, and J.B. Poline. 2005 Valid conjunction inference with the minimum statistic. NeuroImage 25:653–660. Noesselt, T., J.W. Rieger, M.A. Schoenfeld et al. 2007. Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience 27:11431–11441. Noppeney, U., K. Friston, and C. Price. 2003. Effects of visual deprivation on the organisation of the semantic system. Brain 126:1620–1627. Noppeney, U., O. Josephs, J. Hocking, C.J. Price, and K.J. Friston. 2008. The effect of prior visual information on recognition of speech and sounds. Cerebral Cortex 18:598–609. Noppeney, U., D. Ostwald, S. Werner. 2010. Perceptual decisions formed by accumulation of audiovisual evidence in prefrontal cortex. Journal of Neuroscience 30:7434–7446. Peltier, S., R. Stilla, E. Mariola, S. LaConte, X. Hu, and K. Sathian. 2007. Activity and effective connectivity of parietal and occipital cortical regions during haptic shape perception. Neuropsychologia 45:476–483. Penny, W.D., K.E. Stephan, A. Mechelli, and K.J. Friston. 2004. Comparing dynamic causal models. NeuroImage 22:1157–1172. Pereira, F., T. Mitchell, and M. Botvinick. 2009. Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 45:S199–S209. Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2005. Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. Journal of Neurophysiology 93: 2575–2586.
252
The Neural Bases of Multisensory Processes
Price, C.J., R.J. Wise, and R.S. Frackowiak. 1996. Demonstrating the implicit processing of visually presented words and pseudowords. Cerebral Cortex 6:62–70. Roebroeck, A., E. Formisano, and R. Goebel. 2005. Mapping directed influence over the brain using Granger causality and fMRI. NeuroImage 25:230–242. Roebroeck, A., E. Formisano, and R. Goebel. 2009. The identification of interacting networks in the brain using fMRI: Model selection, causality and deconvolution. NeuroImage. Sadaghiani, S., J.X. Maier, and U. Noppeney. 2009. Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing. Journal of Neuroscience 29:6490–6499. Sarkheil, P., Q.C. Vuong, H.H. Bulthoff, and U. Noppeney. 2008. The integration of higher order form and motion by the human brain. NeuroImage 42:1529–1536. Schroeder, C.E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Current Opinion in Neurobiology 15:454–458. Seltzer, B., M.G. Cola, C. Gutierrez, M. Massee, C. Weldon, and C.G. Cusick. 1996. Overlapping and nonoverlapping cortical projections to cortex of the superior temporal sulcus in the rhesus monkey: Double anterograde tracer studies. Journal of Comparative Neurology 370:173–190. Stanford, T.R., and B.E. Stein. 2007. Superadditivity in multisensory integration: Putting the computation in context. Neuroreport 18:787–792. Stanford, T.R., S. Quessy, B.E. Stein. 2005. Evaluating the operations underlying multisensory integration in the cat superior colliculus. Journal of Neuroscience 25:6499–6508. Stark, C.E., and J.L. McClelland. 2000. Repetition priming of words, pseudowords, and nonwords. Journal of Experimental Psychology. Learning, Memory, and Cognition 26:945–972. Stein, B.E., and M.A. Meredith. 1993. The Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: current issues from the perspective of the single neuron. Nature Reviews. Neuroscience 9:255–266. Stein, B.E., T.R. Stanford, R. Ramachandran, T.J. Perrault Jr., and B.A. Rowland. 2009. Challenges in quantifying multisensory integration: alternative criteria, models, and inverse effectiveness. Experimental Brain Research 198(2–3):131–126. Stephan, K.E., W.D. Penny, J. Daunizeau, R.J. Moran, and K.J. Friston. 2009. Bayesian model selection for group studies. NeuroImage 46(4):1004–1017. Erratum in NeuroImage 48(1):311. Stevenson, R.A., and T.W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage 44:1210–1223. Stevenson, R.A., S. Kim, and T.W. James. 2009. An additive-factors design to disambiguate neuronal and areal convergence: Measuring multisensory interactions between audio, visual, and haptic sensory streams using fMRI. Experimental Brain Research 198(2–3):183–194 Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–11147. Tal, N., and A. Amedi. 2009. Multisensory visual-tactile object related network in humans: insights gained using a novel crossmodal adaptation approach. Experimental Brain Research 198:165–182. van Atteveldt, N., E. Formisano, R. Goebel, and L. Blomert. 2004. Integration of letters and speech sounds in the human brain. Neuron 43:271–282. van Atteveldt, N.M., E. Formisano, L. Blomert, and R. Goebel. 2007a. The effect of temporal asynchrony on the multisensory integration of letters and speech sounds. Cerebral Cortex 17:962–974. van Atteveldt, N.M., E. Formisano, R. Goebel, and L. Blomert. 2007b. Top-down task effects overrule automatic multisensory responses to letter-sound pairs in auditory association cortex. NeuroImage 36:1345–1360. van Atteveldt, N., A. Roebroeck, and R. Goebel. 2009. Interaction of speech and script in human auditory cortex: Insights from neuro-imaging and effective connectivity. Hearing Research 258(1–2):152–164 Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences of the United States of America 101:2167–2172. Werner, S., and U. Noppeney. 2010a. Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. Journal of Neuroscience 30:2662–2675. Werner, S., and U Noppeney. 2010b. Superadditive responses in superior temporal sulcus predict audiovisual benefits in object categorization. Cerebral Cortex 20:1829–1842. Werner, S., and U. Noppeney. 2010c. The contributions of transient and sustained response codes to audiovisual integration. Cerebral Cortex 21(4):920–931. Wiggs, C.L., and A. Martin. 1998. Properties and mechanisms of perceptual priming. Current Opinion in Neurobiology 8:227–233.
14
Modeling Multisensory Processes in Saccadic Responses Time-Window-ofIntegration Model Adele Diederich and Hans Colonius
CONTENTS 14.1 Summary............................................................................................................................... 253 14.2 Multisensory Processes Measured through Response Time.................................................254 14.3 TWIN Modeling.................................................................................................................... 255 14.3.1 Basic Assumptions..................................................................................................... 255 14.3.2 Quantifying Multisensory Integration in the TWIN Model...................................... 257 14.3.3 Some General Predictions of TWIN......................................................................... 257 14.4 TWIN Models for Specific Paradigms: Assumptions and Predictions................................. 258 14.4.1 Measuring Cross-Modal Effects in Focused Attention and Redundant Target Paradigms.................................................................................................................. 258 14.4.2 TWIN Model for the FAP......................................................................................... 259 14.4.2.1 TWIN Predictions for the FAP...................................................................260 14.4.3 TWIN Model for RTP............................................................................................... 263 14.4.3.1 TWIN Predictions for RTP.........................................................................264 14.4.4 Focused Attention versus RTP.................................................................................. 265 14.5 TWIN Model for Focused Attention: Including a Warning Mechanism..............................266 14.5.1 TWIN Predictions for FAP with Warning................................................................268 14.6 Conclusions: Open Questions and Future Directions............................................................ 270 Appendix A..................................................................................................................................... 271 A.1 Deriving the Probability of Interaction in TWIN................................................................. 271 A.1.1 Focused Attention Paradigm..................................................................................... 271 A.1.2 Redundant Target Paradigm...................................................................................... 272 A.1.3 Focused Attention and Warning................................................................................ 273 References....................................................................................................................................... 274
14.1 SUMMARY Multisensory research within experimental psychology has led to the emergence of a number of lawful relations between response speed and various empirical conditions of the experimental setup (spatiotemporal stimulus configuration, intensity, number of modalities involved, type of instruction, and so forth). This chapter presents a conceptual framework to account for the effects of 253
254
The Neural Bases of Multisensory Processes
cross- modal stimulation on response speed. Although our framework applies to measures of crossmodal response speed in general, here we focus on modeling saccadic reaction time as a measure of orientation performance toward cross-modal stimuli. The central postulate is the existence of a critical “time-window-of-integration” (TWIN) controlling the combination of information from different modalities. It is demonstrated that a few basic assumptions about this timing mechanism imply a remarkable number of empirically testable predictions. After introducing a general version of the TWIN model framework, we present various specifications and extensions of the original model that are geared toward more specific experimental paradigms. Our emphasis will be on predictions and empirical testability of these model versions, but for experimental data, we refer the reader to the original literature.
14.2 MULTISENSORY PROCESSES MEASURED THROUGH RESPONSE TIME For more than 150 years, response time (RT) has been used in experimental psychology as a ubiquitous measure to investigate hypotheses about the mental and motor processes involved in simple cognitive tasks (Van Zandt 2002). Interpreting RT data, in the context of some specific experimental paradigm, is subtle and requires a high level of technical skill. Fortunately, over the years, many sophisticated mathematical and statistical methods for response time analysis and corresponding processing models have been developed (Luce 1986; Schweickert et al., in press). One reason for the sustained popularity of RT as a measure of mental processes may be the simple fact that these processes always have to unfold over time. A similar rationale, of course, is valid for other methods developed to investigate mental processes, such as electrophysiological and related brain-imaging techniques, and it may be one reason why we are currently witnessing some transfer of concepts and techniques from RT analysis into these domains (e.g., Sternberg 2001). Here, we focus on the early, dynamic aspects of simultaneously processing cross-modal stimuli—combinations of vision, audition, and touch—as they are revealed by a quantitative stochastic analysis of response times. One of the first psychological studies on cross-modal interaction using RT to measure the effect of combining stimuli from different modalities and of varying their intensities is the classic article by Todd (1912). A central finding, supported by subsequent research, is that the occurrence of crossmodal effects critically depends on the temporal arrangement of the stimulus configuration. For example, the speedup of response time to a visual stimulus resulting from presenting an accessory auditory stimulus typically becomes most pronounced when the visual stimulus precedes the auditory by an interval that equals the difference in RT between response to the visual alone and the auditory alone (Hershenson 1962). The rising interest in multisensory research in experimental psychology over the past 20 years has led to the emergence of a number of lawful relations between response speed, on the one hand, and properties of the experimental setting, such as (1) spatiotemporal stimulus configuration, (2) stimulus intensity levels, (3) number of modalities involved, (4) type of instruction, and (5) semantic congruity, on the other. In the following, rather than reviewing the abundance of empirical results, we present a modeling framework within which a number of specific quantitative models have been developed and tested. Although such models can certainly not reflect the full complexity of the underlying multisensory processes, their predictions are sufficiently specific to be rigorously tested through experiments. For a long time, the ubiquitous mode of assessing response speed has been to measure the time it takes to press a button, or to release it, by moving a finger or foot. With the advance of modern eye movement registration techniques, the measurement of gaze shifts has become an important additional technique to assess multisensory effects. In particular saccadic reaction time, i.e., the time from the presentation of a target stimulus to the beginning of the eye movement, is ideally suited for studying both the temporal and spatial rules of multisensory integration. Although participants can be asked to move their eyes to either visual, auditory, or somatosensory targets because the ocular system is geared to the visual system, the saccadic RT characteristics will be specific to each modality. For example, it is well-known that saccades to visual targets have a higher level of accuracy than
Modeling Multisensory Processes in Saccadic Responses
255
those to auditory or somatosensory stimuli. Note also, as the superior colliculus is an important site of oculomotor control (e.g., Munoz and Wurtz 1995), measuring saccadic responses is an obvious choice for studying the behavioral consequences of multisensory integration.
14.3 TWIN MODELING We introduce a conceptual framework to account for the effects of cross-modal stimulation as measured by changes in response speed.* The central postulate is the existence of a critical TWIN controlling the integration of information from different modalities. The starting idea simply is that a visual and an auditory stimulus must not be presented too far away from each other in time for bimodal integration to occur. As we will show, this seemingly innocuous assumption has a number of nontrivial consequences that any multisensory integration model of response speed has to satisfy. Most prominently, it imposes a process consisting of—at least—two serial stages: one early stage, before the outcome of the time window check has occurred, and a later one, in which the outcome of the check may affect further processing. Although the TWIN framework applies to measures of cross-modal response speed in general, the focus is on modeling saccadic reaction time. First, a general version of the TWIN model and its predictions, introduced by Colonius and Diederich (2004), will be described. Subsequently, we present various extensions of the original model that are geared toward more specific experimental paradigms. Our emphasis will again be on the predictions and empirical testability of these model versions but because of space limitations, no experimental data will be presented here.
14.3.1 Basic Assumptions A classic explanation for a speedup of responses to cross-modal stimuli is that subjects are merely responding to the first stimulus detected. Taking these detection times to be random variables and glossing over some technical details, observed reaction time would then become the minimum of the reaction times to the visual, auditory, or tactile signal leading to a purely statistical facilitation effect (also known as probability summation) in response speed (Raab 1962). Over time, numerous studies have shown that this race model was not sufficient to explain the observed speedup in saccadic reaction time (Harrington and Peck 1998; Hughes et al. 1994, 1998; Corneil and Munoz 1996; Arndt and Colonius 2003). Using Miller’s inequality as a benchmark test (cf. Colonius and Diederich 2006; Miller 1982), saccadic responses to bimodal stimuli have been found to be faster than predicted by statistical facilitation, in particular, when the stimuli were spatially aligned. Moreover, in the race model, there is no natural explanation for the decrease in facilitation observed with variations in many cross-modal stimulus properties, e.g., increasing spatial disparity between the stimuli. Nevertheless, the initial anatomic separation of the afferent pathways for different sensory modalities suggests that an early stage of peripheral processing exists, during which no intermodal interaction may occur. For example, a study by Whitchurch and Takahashi (2006) collecting (head) saccadic reaction times in the barn owl lends support to the notion of a race between early visual and auditory processes depending on the relative intensity levels of the stimuli. In particular, their data suggest that the faster modality initiates the saccade, whereas the slower modality remains available to refine saccade trajectory. Thus, there are good reasons for retaining the construct of an—albeit very peripheral—race mechanism. Even under invariant experimental conditions, observed responses typically vary from one trial to the next, presumably because of an inherent variability of the underlying neural processes in both ascending and descending pathways. In analogy to the classic race model, this is taken into account in the TWIN framework by assuming any processing duration to be a random variable. In particular, the peripheral processing times for visual, auditory, and somatosensory stimuli are * See Section 14.6 for possible extensions to other measures of performance.
256
The Neural Bases of Multisensory Processes
assumed to be stochastically independent random variables. This leads to the first postulate of the TWIN model: (B1) First Stage Assumption: The first stage consists in a (stochastically independent) race among the peripheral processes in the visual, auditory, and/or somatosensory pathways triggered by a cross-modal stimulus complex.
The existence of a critical “spatiotemporal window” for multisensory integration to occur has been suggested by several authors, based on both neurophysiological and behavioral findings in humans, monkey, and cat (e.g., Bell et al. 2005; Meredith 2002; Corneil et al. 2002; Meredith et al. 1987; see Navarra et al. 2005 for a recent behavioral study). This integration may manifest itself in the form of an increased firing rate of a multisensory neuron (relative to unimodal stimulation), an acceleration of saccadic reaction time (Frens et al. 1995; Diederich et al. 2003), an effective audiovisual speech integration (Van Wassenhove et al. 2007), or in an improved or degraded judgment of temporal order of bimodal stimulus pairs (cf. Spence and Squire 2003). One of the basic tenets of the TWIN framework, however, is the priority of temporal proximity over any other type of proximity: rather than assuming a joint spatiotemporal window of integration permitting interaction to occur only for both spatially and temporally neighboring stimuli, the TWIN model allows for cross-modal interaction to occur, for example, even for spatially rather distant stimuli of different modalities as long as they fall within the time window. (B2) TWIN Assumption: Multisensory integration occurs only if the peripheral processes of the first stage all terminate within a given temporal interval, the TWIN.
In other words, a visual and an auditory stimulus may occur at the same spatial location, or the lip movements of a speaker may be perfectly consistent with the utterance, no intersensory interaction effect will be possible if the data from the two sensory channels are registered too distant from each other in time. Thus, the window acts like a filter determining whether afferent information delivered from different sensory organs is registered close enough in time to allow for multisensory integration. Note that passing the filter is a necessary, but not sufficient, condition for multisensory integration to occur. The reason is that the amount of multisensory integration also depends on other aspects of the stimulus set, such as the spatial configuration of the stimuli. For example, response depression may occur with nearly simultaneous but distant stimuli, making it easier for the organism to focus attention on the more important event. In other cases, multisensory integration may fail to occur—despite near-simultaneity of the unisensory events—because the a priori probability for a cross-modal event is very small (e.g., Körding et al. 2007). Although the priority of temporal proximity seems to afford more flexibility for an organism in a complex environment, the next assumption delimits the role of temporal proximity to the first processing stage: (B3) Assumption of Temporal Separability: The amount of interaction manifesting itself in an increase or decrease of second stage processing time is a function of cross-modal stimulus features, but it does not depend on the presentation asynchrony (stimulus onset asynchrony, SOA) of the stimuli.
This assumption is based on a distinction between intra- and cross-modal stimulus properties, where the properties may refer to both subjective and physical properties. Cross-modal properties are defined when stimuli of more than one modality are present, such as spatial distance of target to nontarget, or subjective similarity between stimuli of different modalities. Intramodal properties, on the other hand, refer to properties definable for a single stimulus, regardless of whether this property is definable in all modalities (such as intensity) or in only one modality (such as wavelength for color or frequency for pitch). Intramodal properties can affect the outcome of the race in the first stage and, thereby, the probability of an interaction. Cross-modal properties may affect the amount of cross-modal interaction occurring in the second stage. Note that cross-modal features cannot influence first stage processing time because the stimuli are still being processed in separate pathways.
Modeling Multisensory Processes in Saccadic Responses
257
(B4) Second Stage Assumption: The second stage comprises all processes after the first stage including preparation and execution of a response.
The assumption of only two stages is certainly an oversimplification. Note, however, that the second stage is defined here by default: it includes all subsequent, possibly overlapping, processes that are not part of the peripheral processes in the first stage (for a similar approach, see Van Opstal and Munoz 2004). Thus, the TWIN model retains the classic notion of a race mechanism as an explanation for cross-modal interaction but restricts it to the very first stage of stimulus processing.
14.3.2 Quantifying Multisensory Integration in the TWIN Model To derive empirically testable predictions from the TWIN framework, its assumptions must be put into more precise form. According to the two-stage assumption, total saccadic reaction time in the cross-modal condition can be written as a sum of two nonnegative random variables defined on a common probability space:
RTcross-modal = S1 + S2,
(14.1)
where S1 and S2 refer to first and second stage processing time, respectively (a base time would also be subsumed under S2). Let I denote the event that multisensory integration occurs, having probability P(I). For the expected reaction time in the cross-modal condition then follows: E[RTcrossmodal ] = E[ S1 ] + E[ S2 ] = E[ S1 ] + P( I ) ⋅ E[ S2 | I ] + (1 − P (I )) ⋅ E[S2 | I c ]
= E[S1 ] + E[S2 | I c ] − P (I ) ⋅ (E[S2 | I c ] − E[ S2 | I ]),
where E[S2|I] and E[S2|Ic] denote the expected second stage processing time conditioned on interaction occurring (I) or not occurring (Ic), respectively. Putting Δ ≡ E[S2|Ic] – E[S2|I], this becomes
E[RTcross-modal] = E[S1] + E[S2|Ic] – P(I) · Δ.
(14.2)
That is, mean RT to cross-modal stimuli is the sum of mean RT of the first stage processing time, mean RT of the second stage processing when no interaction occurs, and the term P(I) · Δ, which is a measure of the expected amount of intersensory interaction in the second stage with positive Δ values corresponding to facilitation, and negative values corresponding to inhibition. This factorization of expected intersensory interaction into the probability of interaction P(I) and the amount and sign of interaction (Δ) is an important feature of the TWIN model. According to Assumptions B1 to B4, the first factor, P(I), depends on the temporal configuration of the stimuli (SOA), whereas the second factor, Δ, depends on nontemporal aspects, in particular their spatial configuration. Note that this separation of temporal and nontemporal factors is in accordance with the definition of the window of integration: the incidence of multisensory integration hinges on the stimuli to occur in temporal proximity, whereas the amount and sign of interaction (Δ) is modulated by nontemporal aspects, such as semantic congruity or spatial proximity reaching, in the latter case, from enhancement for neighboring stimuli to possible inhibition for distant stimuli (cf. Diederich and Colonius 2007b).
14.3.3 Some General Predictions of TWIN In the next section, more specific assumptions on first stage processing time, S1, and probability of interaction P(I) will be introduced to derive detailed quantitative predictions for specific
258
The Neural Bases of Multisensory Processes
experimental cross-modal paradigms. Nonetheless, even at the general level of the framework introduced thus far, a number of qualitative empirical predictions of TWIN are possible. SOA effects. The amount of cross-modal interaction should depend on the SOA between the stimuli because the probability of integration, P(I), changes with SOA. Let us assume that two stimuli from different modalities differ considerably in their peripheral processing times. If the faster stimulus is delayed (in terms of SOA) so that the arrival times of both stimuli have a high probability of falling into the window of integration, then the amount of cross-modal interaction should be largest for that value of SOA (see, e.g., Frens et al. 1995; Colonius and Arndt 2001). Intensity effects. Stimuli of high intensity have relatively fast peripheral processing times. Therefore, for example, if a stimulus from one modality has a high intensity compared to a stimulus from the other modality, the chance that both peripheral processes terminate within the time window will be small, assuming simultaneous stimulus presentations. The resulting low value of P(I) is in line with the empirical observation that a very strong signal will effectively rule out any further reduction of saccadic RT by adding a stimulus from another modality (e.g., Corneil et al. 2002). Cross-modal effects. The amount of multisensory integration (Δ) and its sign (facilitation or inhibition) occurring in the second stage depend on cross-modal features of the stimulus set, for example, spatial disparity and laterality (laterality here refers to whether all stimuli appear in the same hemisphere). Cross-modal features cannot have an influence on first stage processing time because the modalities are being processed in separate pathways. Conversely, parameter Δ not depending on SOA cannot change its sign as a function of SOA and, therefore, the model cannot simultaneously predict facilitation to occur for some SOA values and inhibition for others. Some empirical evidence against this prediction has been observed (Diederich and Colonius 2008). In the classic race model, the addition of a stimulus from a modality not yet present will increase (or, at least, not decrease) the amount of response facilitation. This follows from the fact that— even without assuming stochastic independence—the probability of the fastest of several processes terminating processing before time t will increase with the number of “racers” (e.g., Colonius and Vorberg 1994). In the case of TWIN, both facilitation and inhibition are possible under certain conditions as follows: Number of modalities effect. The addition of a stimulus from a modality not yet present will increase (or, at least, not decrease) the expected amount of interaction if the added stimulus is not “too fast” and the time window is not “too small.” The latter restrictions are meant to guarantee that the added stimulus will fall into the time window, thereby increasing the probability of interaction to occur.
14.4 TWIN MODELS FOR SPECIFIC PARADIGMS: ASSUMPTIONS AND PREDICTIONS In a cross-modal experimental paradigm, the individual modalities may either be treated as being on an equal footing, or one modality may be singled out as a target modality, whereas stimuli from the remaining modalities may be ignored by the participant as nontargets. Cross-modal effects are assessed in different ways, depending on task instruction. As shown below, the TWIN model can take these different paradigms into account simply by modifying the conditions that lead to an opening of the time window.
14.4.1 Measuring Cross-Modal Effects in Focused Attention and Redundant Target Paradigms In the redundant target paradigm (RTP; also known as the divided attention paradigm), stimuli from different modalities are presented simultaneously or with certain SOA, and the participant is instructed to respond to the stimulus detected first. Typically, the time to respond in the cross-
259
Modeling Multisensory Processes in Saccadic Responses
modal condition is faster than in either of the unimodal conditions. In the focused attention paradigm (FAP), cross-modal stimulus sets are presented in the same manner, but now participants are instructed to respond only to the onset of a stimulus from a specifically defined target modality, such as the visual, and to ignore the remaining nontarget stimulus (the tactile or the auditory). In the latter setting, when a stimulus of a nontarget modality, for example, a tone, appears before the visual target at some spatial disparity, there is no overt response to the tone if the participant is following the task instructions. Nevertheless, the nontarget stimulus has been shown to modulate the saccadic response to the target: depending on the exact spatiotemporal configuration of target and nontarget, the effect can be a speedup or an inhibition of saccadic RT (see, e.g., Amlôt et al. 2003; Diederich and Colonius 2007b), and the saccadic trajectory can be affected as well (Doyle and Walker 2002). Some striking similarities to human data have been found in a detection task utilizing both paradigms. Stein et al. (1988) trained cats to orient to visual or auditory stimuli, or both. In one paradigm, the target was a visual stimulus (a dimly illuminating LED) and the animal learned that although an auditory stimulus (a brief, low-intensity broadband noise) would be presented periodically, responses to it would never be rewarded, and the cats learned to “ignore” it (FAP). Visual– auditory stimuli were always presented spatially coincident, but their location varied from trial to trial. The weak visual stimulus was difficult to detect and the cats’ performance was <50% correct detection. However, combining the visual stimulus with the neutral auditory stimulus markedly enhanced performance, regardless of their position. A similar result was obtained when animals learned that both stimuli were potential targets (RTP). In a separate experiment in which the visual and the (neutral) auditory stimuli were spatially disparate, however, performance was significantly worse than when the visual stimulus was presented alone (cf. Stein et al. 2004). A common method to assess the amount of cross-modal interaction is to use a measure that relates mean RT in cross-modal conditions to that in the unimodal condition. The following definitions quantify the percentage of RT enhancement in analogy to a measure proposed for measuring multisensory enhancement in neural responses (cf. Meredith and Stein 1986; Anastasio et al. 2000; Colonius and Diederich 2002; Diederich and Colonius 2004a, 2004b). For visual, auditory, and visual–auditory stimuli with observed mean (saccadic or manual) reaction time, RTV, RTA, and RTVA, respectively, and SOA = τ, the multisensory response enhancement (MRE) for the redundant target task is defined as
MRE RTP =
min(RTV ,RTA + τ ) − RTVA, τ min(RTV ,RTA + τ )
⋅ 100,
(14.3)
where RTAV,τ refers to observed mean RT to the bimodal stimulus with SOA = τ. For the focused attention task, MRE is defined as
MRE FAP =
RTV − RTVA , τ RTV
⋅ 100,
(14.4)
assuming vision as target modality.
14.4.2 TWIN Model for the FAP TWIN is adapted to the focused attention task by replacing the original TWIN Assumption B2 with (B2-FAP) TWIN Assumption: In the FAP, cross-modal interaction occurs only if (1) a nontarget stimulus wins the race in the first stage, opening the TWIN such that (2) the termination of the target peripheral process falls in the window. The duration of the time window is a constant.
260
The Neural Bases of Multisensory Processes
The idea here is that the winning nontarget will keep the saccadic system in a state of heightened reactivity such that the upcoming target stimulus, if it falls into the time window, will trigger crossmodal interaction. At the neural level, this may correspond to a gradual inhibition of fixation neurons (in the superior colliculus) and/or omnipause neurons (in the midline pontine brain stem). In the case of the target being the winner, no discernible effect on saccadic RT is predicted, such as in the unimodal situation. The race in the first stage of the model is made explicit by assigning statistically independent, nonnegative random variables V and A to the peripheral processing times, for example, for a visual target and an auditory nontarget stimulus, respectively. With τ as SOA value and ω as integration window width parameter, Assumption B2-FAP amounts to the event that multisensory integration occurs, IFAP , being
IFAP = {A + τ < V < A + τ + ω}.
Thus, the probability of integration to occur, P(IFAP), is a function of both τ and ω, and it can be determined numerically once the distribution functions of A and V have been specified. Expected reaction time in the bimodal condition then is (cf. Equation 14.2)
c E[RTVA,τ ] = E[V ] + E[ S2 | I FAP ] − P( I FAP ) ⋅ ∆.
(14.5)
No interaction is possible in the unimodal condition. Thus, the expected reaction time for the visual (target) stimulus condition is
c E[RTV ] = E[V ] + E[ S2 | I FAP ].
(14.6)
Note that in the focused attention task, the first stage duration is defined as the time it takes to process the (visual) target stimulus, E[V]. Cross-modal interaction (CI) is defined as difference between mean RT to the unimodal and cross-modal stimuli, i.e.,
CI ≡ E[RTV] – E[RTVA,τ] = P(IFAP) · Δ.
(14.7)
Thus, the separation of temporal and nontemporal factors expressed in the above equation for the observable CI is directly inherited from Assumptions B4 and B2-FAP. 14.4.2.1 TWIN Predictions for the FAP The integration Assumption B2-FAP permits further specification of TWIN’s general predictions of Section 14.3.3. From a model testing point of view, it is a clear strength of the TWIN framework that it allows for numerous qualitative predictions without having to specify the probability distributions for the random processing times. Thus, a violation of any one of these predictions cannot be attributed to an inappropriate choice of the distributions but may point to a more fundamental inadequacy of one or, possibly, several model assumptions. For a quantitative fit to an observed set of data, however, some distributional assumptions are required. In the parametric version of TWIN, all peripheral processing times are assumed to be exponentially distributed (cf. Colonius and Diederich 2004b). This choice is made mainly for computational simplicity: calculating the probability of integration, P(IFAP), is straightforward, and the exponential distribution is characterized by a single quantity, the intensity parameter λ (see Appendix A). As long as predictions are limited to the level of means, no specific assumptions about the distribution of processing times in the second stage are necessary (but see Section 14.6). Next, we demonstrate how the focused attention context leads to more specific empirically testable predictions of TWIN. Predictions relying on the parametric TWIN version are postponed to the
261
Modeling Multisensory Processes in Saccadic Responses
final part of this section. If not specifically mentioned otherwise, we always assume nonnegative Δ values in the following elaborations. SOA effects. When the nontarget is presented very late relative to the target (large positive SOA), its chance of winning the race against the target and thus opening the window of integration become very small. When it is presented rather early (large negative SOA), it is likely to win the race and to open the window, but the window may be closed by the time the target arrives. Again, the probability of integration, P(IFAP), is small. Therefore, the largest probability of integration is expected for some midrange SOA values. Although P(IFAP) is unobservable, it should leave its mark on a wellknown observable measure, i.e., MRE. In fact, MREFAP, defined in (Equation 14.4) as a function of SOA, should have the same form as P(IFAP), scaled only by some constant: MRE FAP = =
( RT
V
− RTVA , τ RTV
) ⋅ 100
P( I FAP ) ⋅ ∆ ⋅ 100 RTV
(14.8)
= P( I FAP ) ⋅ ∆ ⋅ const.
Intensity effects. Increasing the intensity of the visual stimulus will speed up visual peripheral processing (up to some minimum level) thereby increasing the chance for the visual target to win the race. Thus, the probability that the window of integration opens decreases, predicting less multisensory integration. Increasing the intensity of the nontarget auditory stimulus, on the other hand, leads to the opposite prediction: the auditory stimulus will have a better chance to win the race and to open the window of integration, hence, predicting more multisensory integration to occur on average. Two further distinctions can be made. For large negative SOA, i.e., when the auditory nontarget arrives very early, further increasing the auditory intensity makes is more likely for the TWIN to close before the target arrives and therefore results in a lower P(IFAP) value. For smaller negative SOA, however, i.e., when the nontarget is presented shortly before the target, increasing the auditory intensity improves its chances to win against the target and to open the window. Given the complexity of these intensity effects, however, more specific quantitative predictions will require some distributional assumptions for the first stage processing times (see below). Alternatively, it may be feasible to adapt the “double factorial paradigm” developed by Townsend and Nozawa (1995) to analyze predictions when the effects of both targets and nontargets presented at two different intensities levels are observed. Cross-modal effects. If target and nontarget are presented in two distinct cross-modal conditions, one would expect parameter Δ to take on two different values. For example, for two spatial conditions, ipsilateral and contralateral, the values could be Δi and Δc, respectively. Subtracting the corresponding cross-modal interaction terms then gives (cf. Equation 14.7)
CIi – CIc = P(IFAP) · (Δi – Δc),
(14.9)
an expression that should again yield the same qualitative behavior, as a function of SOA, as P(IFAP). In a similar vein, one can capitalize on the factorization of expected cross-modal interaction if some additional experimental factor affecting Δ, but not P(IFAP), is available. In Colonius et al. (2009), an auditory background masker stimulus, presented at increasing intensity levels, was hypothesized to simultaneously increase Δc and decrease Δi. The ratio of CIs in both configurations,
CI i P( I FAP ) ⋅ ∆i ∆i = = , CI c P( I FAP ) ⋅ ∆c ∆c
(14.10)
262
The Neural Bases of Multisensory Processes
should then remain invariant across SOA values, with a separate value for each level of the masker. Number of nontargets effects. For cross-modal interaction to occur in the focused attention task, it is necessary that the nontarget process wins the race in the first stage. With two or more nontargets entering the race, the probability of one of them winning against the target process increases and, therefore, the probability of opening the window of integration increases with the number of nontargets present. In this case, there are even two different ways of utilizing the factorization of CI, both requiring the existence of two cross-modal conditions with two different Δ parameters (spatial or other). The first test is analogous to the previous one. Because the number of nontargets affects P(IFAP) only, the ratio in Equation 14.10 should be the same whether it is computed from conditions with one or two nontargets. The second test results from taking the ratio of CI based on one nontarget, C1, over CI based on two nontargets, C2. Because Δ should not be affected by the number of nontargets, the ratio
CI1 P1 ( I FAP ) ⋅ ∆ P1 ( I FAP ) = = , CI 2 P2 ( I FAP ) ⋅ ∆ P2 ( I FAP )
(14.11)
where P1 and P2 refer to the probability of opening the window under one or two nontargets, respectively, should be the same, no matter from which one of the two cross-modal conditions it was computed. In the study of Diederich and Colonius (2007a), neither of these tests revealed evidence against these TWIN predictions. SOA and intensity effects predicted by a parametric TWIN version. Assuming exponential distributions for the peripheral processing times, the intensity parameter for the visual modality is set to 1/λV = 50 (ms) and to 1/λA = 10, 30, 70, or 90 (ms) for the auditory nontarget. Quantitative predictions of TWIN for focused attention are shown in the left of Figure 14.1. Panels 1 and 2 show mean RT and P(IFAP) as a function of SOA for the various intensities of the auditory nontarget. Note that two intensities result in faster mean RT, whereas two intensities result in lower mean RT, compared to mean unimodal RT to the visual target. Here, the parameter for second stage processing time when no integration occurs, μ, was set to 100 ms. The TWIN was set to 200 ms. The parameter for multisensory integration was set to Δi = 20 ms for bimodal stimuli presented ipsilaterally, implying a facilitation effect. Note that neither λV nor μ are directly observable, but the sum of the peripheral and central processing time for the visual target stimulus constitutes a prediction for unimodal mean saccadic RT:
E[RTV ] =
1 λV
+ ,
which, for the present example, is 50 ms + 100 ms = 150 ms. The dashed line and the dotted line show the bimodal RT predictions for the auditory nontargets with the highest and lowest intensity, respectively. No fits to empirical data sets are presented here, but good support of TWIN has been found thus far (see, e.g., Diederich and Colonius 2007a, 2007b, 2008; Diederich et al. 2008). Close correspondence between data and model prediction, however, is not the only aspect to consider. Importantly, the pattern of parameter values estimated for a given experimental setting should suggest a meaningful interpretation. For example, increasing stimulus intensities are reflected in a decrease of the corresponding λ parameters, assuming higher intensities to lead to faster peripheral processing times (at least, within certain limits). Furthermore, in the study with an auditory background masker (Colonius et al. 2009), the cross-modal interaction parameter (Δ) was a decreasing or increasing function of masker level for the contralateral or ipsilateral condition, respectively, as predicted.
263
Modeling Multisensory Processes in Saccadic Responses Focused attention paradigm 150
Mean RT (ms)
150
Pr(I)
Redundant target paradigm
130 140 110 130
90
1
1
0.5
0.5
0
0 30
MRE
12 20
8
10
4 0 −400 −300 −200 −100 0 SOA (ms)
100
200
0 0 20 40
100
SOA (ms)
200
300
FIGURE 14.1 TWIN predictions for FAP (left panels) and RTP (right panels). Parameters in both paradigms were chosen to be identical. Mean RT for visual stimulus is 150 ms (1/λV = 50, μ = 100). Peripheral processing times for auditory stimuli are 1/λA = 10 ms (dashed line), 1/λA = 30 ms (solid), 1/λA = 70 ms (dash-dotted), and 1/λA = 90 ms (dotted). Interaction parameter is Δ = 20 ms.
14.4.3 TWIN Model for RTP TWIN is adapted to the redundant target task by replacing the original TWIN Assumption B2 by (B2-RTP) TWIN Assumption: In the RTP, (1) the window of integration is opened by whichever stimulus wins the race in the first stage and (2) cross-modal interaction occurs if the termination of the peripheral process of a stimulus of another modality falls within the window. The duration of the time window is a constant.
Obviously, if stimuli from more than two modalities are presented, the question of a possible additional effect on cross-modal interaction arises. There is both behavioral and neurophysiological evidence for trimodal interaction (e.g., Diederich and Colonius 2004b; Stein and Meredith 1993), but data from saccadic eye movement recordings do not yet seem to be conclusive enough to justify further elaboration of Assumption B2-RTP.
264
The Neural Bases of Multisensory Processes
To compute the probability of interaction in the RTP, P(IRTP), we assume that a visual and an auditory stimulus are presented with an SOA equal to τ. Then, either the visual stimulus wins, V < A + τ, or the auditory stimulus wins, A + τ < V; so, in either case, min(V, A + τ) < max(V, A + τ) and, by Assumption B2-RTP,
IRTP = {max(V, A + τ) < min(V, A + τ) + ω}.
Thus, the probability of integration to occur is a function of both τ and ω, as before. Expected reaction time in the cross-modal condition is computed as (see Equation 14.2)
c E[RTVA,τ ] = E[min(V , A + τ )] + E[ S2 | I RTP ] − P( I RTP ) ⋅ ∆.
(14.12)
In the RTP, first stage duration is determined by the termination time of the winner. This is an important difference to the focused attention situation in which first stage duration is defined by the time it takes to process the (visual) target stimulus. Even for a zero probability of interaction, expected reaction time in the bimodal condition is smaller than, or equal to, either of the unimodal stimulus conditions. These are
c E[RTV ] = E[V ] + E[ S2 | I RTP ]
(14.13)
c E[RTA ] = E[ A] + E[ S2 | I RTP ],
(14.14)
and
because in the redundant target version of TWIN, the race in the first stage produces a statistical facilitation effect equivalent to the one in the classic race model. Thus, a possible cross-modal enhancement observed in a redundant target task may be because of multisensory integration or statistical facilitation, or both. Moreover, a possible cross-modal inhibition effect may be weakened by the simultaneous presence of statistical facilitation in the first stage. Predictions for the redundant target case are less straightforward than for focused attention because the factorization of crossmodal interaction (CI) in the latter is no longer valid. Nevertheless, some general predictions can be made assuming, as before, a multisensory facilitation effect, i.e., Δ > 0. 14.4.3.1 TWIN Predictions for RTP In this paradigm, both stimuli are on an equal footing and, therefore, negative SOA values need not be introduced. Each SOA value now indicates the time between the stimulus presented first and the one presented second, regardless of modality. SOA effects. The probability of cross-modal interaction decreases with increasing SOA: the later the second stimulus is presented, the less likely it is to win the race and to open the window of integration; alternatively, if the window has already been opened by the first stimulus, the less likely it is to fall into that window with increasing SOA. For large enough SOA values, mean saccadic RT in the cross-modal condition approaches the mean for the stimulus presented first. To fix ideas, we now assume, without loss of generality, that a visual stimulus of constant intensity is presented first and that an auditory stimulus is presented second, or simultaneous with the visual, and at different intensities. Predictions then depend on the relative intensity difference between both stimuli. Note that the unimodal means constitute upper bounds for bimodal mean RT. Intensity effects. For a visual stimulus presented first, increasing the intensity of the auditory stimulus (presented second) increases the amount of facilitation.
Modeling Multisensory Processes in Saccadic Responses
265
SOA and intensity effects predicted by a parametric TWIN version. Figure 14.1 (right panels) shows the quantitative predictions of TWIN for SOA and intensity variations under exponential distributions for the peripheral processing times. Parameters are the same as for the FAP predictions (left panels). Panels 1 and 2 show mean RT and P(I) as a function of SOA for various intensity levels (λ parameters) of the auditory stimulus. Both panels exhibit the predicted monotonicity in SOA and intensity. The third panel, depicting MRE, reveals some nonmonotonic behavior in both SOA and intensity. Without going into numerical details, this nonmonotonicity of MRE can be seen to be because of a subtle interaction between two mechanisms, both being involved in the generation of MRE: (1) statistical facilitation occurring in the first stage and (2) opening of the time window. The former is maximal if presentation of the stimulus processed faster is delayed by an SOA equal to the difference in mean RT in the unimodal stimulus conditions, that is when peripheral processing times are in physiological synchrony; for example, if mean RT to an auditory stimulus is 110 ms and mean RT to a visual stimulus is 150 ms, the maximal amount of statistical facilitation is expected when the auditory stimulus is presented 150 ms – 110 ms = 40 ms after the visual stimulus. The SOA value being “optimal” for statistical facilitation, however, need not be the one producing the highest probability of opening the time window that was shown to be decreasing with SOA. Moreover, the nonmonotonicity in intensity becomes plausible if one realizes that variation in intensity results in a change in mean processing time analogous to an SOA effect: for example, lowering auditory stimulus intensity has an effect on statistical facilitation and the probability of opening the time window that is comparable to increasing SOA.
14.4.4 Focused Attention versus RTP Top-down versus bottom-up. The distinction between RTP and FAP is not only an interesting experimental variation as such but it may also provide an important theoretical aspect. In fact, because physically identical stimuli can be presented under the same spatiotemporal configuration in both paradigms, any differences observed in the corresponding reaction times would have to be because of the instructions being different, thereby pointing to a possible separation of top-down from bottom-up processes in the underlying multisensory integration mechanism. Probability of integration. Moreover, comparing both paradigms yields some additional insight into the mechanics of TWIN. Note that under equivalent stimulus conditions, IFAP ⊂ IRTP; this relation follows from the observation that
IFAP = IRTP ∩ {A + τ is the winner of the race}.
It means that any realization of the peripheral processing times that leads to an opening of the time window under the focused attention instruction also leads to the same event under the redundant target instruction. Thus, the probability of integration under redundant target instructions cannot be smaller than that under focused attention instruction: P(IFAP) ≤ P(IRTP), given identical stimulus conditions (see also Figure 14.1). Inverse effectiveness. It is instructive to consider the effect of varying stimulus intensity in both paradigms when both stimuli are presented simultaneously (SOA = 0) and at intensity levels producing the same mean peripheral speed, i.e., with the same intensity parameters, λV = λA. Assuming exponential distributions, Figure 14.2 depicts the probability of integration (upper panels) and MRE (lower panels) as a function of time window width (ω) for both paradigms and with each curve presenting a specific intensity level. The probability of integration increases monotonically from zero (for ω = 0) toward 0.5 for the focused attention, and toward 1 for the RTP. For the former, the probability of integration cannot surpass 0.5 because, for any given window width, the target process has the same chance of winning as the nontarget process under the given λ parameters. For both paradigms, P(I), as a function of ω, is ordered with respect to intensity level: it increases monotonically
266
The Neural Bases of Multisensory Processes FAP
1
0.75 Pr(I)
Pr(I)
0.75 0.5
0.25 0
RTP
1
0.5
0.25
0
50 100 200 Time window width (ms)
0
300
0
30
30
20
20
10 0
300
RTP
MRE
MRE
FAP
50 100 200 Time window width (ms)
10
0
50 100 200 Time window width (ms)
300
0
0
50 100 200 Time window width (ms)
300
FIGURE 14.2 TWIN predictions for FAP (left panels) and RTP (right panels) as a function of time window width (ω) at SOA = 0. Upper panels depict probability of integration P(I), whereas lower panels show MRE. Each curve corresponds to a specific intensity parameter of stimuli. Peripheral processing times for auditory and visual stimuli are 1/λA = 1/λV equal to 30 ms (dashed line), 50 ms (solid), 70 ms (dash-dotted), and 90 ms (black dotted). Mean second stage processing time is μ = 100 ms). Interaction parameter is Δ = 20 ms.
with the mean processing time of both stimuli* (upper panels of Figure 14.2). The same ordering is found for MRE in the FAP; somewhat surprisingly, however, the ordering is reversed for MRE in the RTP: increasing intensity implies less enhancement, i.e., it exhibits the “inverse effectiveness” property often reported in empirical studies (Stein and Meredith 1993; Rowland and Stein 2008). Similar to the above discussion of intensity effects for RTP, this is because of an interaction generated by increasing intensity: it weakens statistical facilitation in first stage processing but simultaneously increases the probability of integration.
14.5 TWIN MODEL FOR FOCUSED ATTENTION: INCLUDING A WARNING MECHANISM Although estimates for the TWIN vary somewhat across subjects and task specifics, a 200-ms width showed up in several studies (e.g., Eimer 2001; Sinclair and Hammond 2009). In a focused attention task, when the nontarget occurs at an early point in time (i.e., 200 ms or more before the target), a substantial decrease of RT compared to the unimodal condition has been observed by Diederich * This is because of a property of the exponential distribution: mean and SD are identical.
267
Modeling Multisensory Processes in Saccadic Responses
and Colonius (2007a). This decrease, however, no longer depended on whether target and nontarget appeared at ipsilateral or contralateral positions, thus supporting the hypothesis that the nontarget plays the role of a spatially unspecific alerting cue, or warning signal, for the upcoming target whenever the SOA is large enough. The hypothesis of increased cross-modal processing triggered by an alerting cue had already been advanced by Nickerson (1973), who called it “preparation enhancement.” In the eye movement literature, the effects of a warning signal have been studied primarily in the context of explaining the “gap effect,” i.e., the latency to initiate a saccade to an eccentric target is reduced by extinguishing the fixation stimulus approximately 200 ms before target onset (Reuter-Lorenz et al. 1991; Klein and Kingston 1993). An early study on the effect of auditory or visual warning signals on saccade latency, but without considering multisensory integration effects, was conducted by Ross and Ross (1981). Here, the dual role of the nontarget—inducing multisensory integration that is governed by the above-mentioned spatiotemporal rules, on the one hand, and acting as a spatially unspecific crossmodal warning cue, on the other—will be taken into account by an extension of TWIN that yields an estimate of the relative contribution of either mechanism for any specific SOA value. (W) Assumption on warning mechanism: If the nontarget wins the processing race in the first stage by a margin wide enough for the TWIN to be closed again before the arrival of the target, then subsequent processing will be facilitated or inhibited (“warning effect”) without dependence on the spatial configuration of the stimuli.*
The time margin by which the nontarget may win against the target will be called head start denoted as γ. The assumption stipulates that the head start is at least as large as the width of the time window for a warning effect to occur. That is, the warning mechanism of the nontarget is triggered whenever the nontarget wins the race by a head start γ ≥ ω ≥ 0. Taking, for concreteness, the auditory as nontarget modality, occurrence of a warning effect corresponds to the event:
W = {A + τ + γ < V}.
The probability of warning to occur, P(W), is a function of both τ and γ. Because γ ≥ ω ≥ 0 this precludes the simultaneous occurrence of both warning and multisensory interaction within one and the same trial and, therefore, P(I ∩ W) = 0 (because no confusion can arise, we write I for IFAP throughout this section). The actual value of the head start criterion is a parameter to be estimated in fitting the model under Assumption W. The expected saccadic reaction time in the cross-modal condition in the TWIN model with warning assumption can then be shown to be E[RTcross-modal ] = E[ S1 ] + E[ S2 ] = E[ S1 ] + E[ S2 | I c ∩ W c ] − P( I ) ⋅ {E[ S2 | I c ∩ W c ] − E[S2 | I ]}
− P(W ) ⋅ {E[S2 | I c ∩ W c ] − E[ S2 | W ]},
* In the study of Diederich and Colonius 2008, an alternative version of this assumption was considered as well (version B). If the nontarget wins the processing race in the first stage by a wide enough margin, then subsequent processing will in part be facilitated or inhibited without dependence on the spatial configuration of the stimuli. This version is less restrictive: All that is needed for the nontarget to act as a warning signal is a “large enough” headstart against the target in the race and P(I ∩ W) can be larger than 0. Assuming that the effects on RT of the two events I and W, integration and warning, combine additively, it can then be shown that the cross-modal interaction prediction of this model version is captured by the same equation as under the original version, i.e., Equation 14.17 below. The only difference is in the order restriction for the parameters, γ ≥ ω. Up to now, no empirical evidence has been found in favor of one of the two versions over the other.
268
The Neural Bases of Multisensory Processes
where E[S2|I], E[S2|W], and E[S2|Ic ∩ Wc] denote the expected second stage processing time conditioned on interaction occurring (I), warning occurring (W), or neither of them occurring (Ic ∩ Wc), respectively (Ic, Wc stand for the complement of events I, W). Setting ∆ ≡ E[ S 2 | I c ∩ W c ] − E [ S 2 | I ]
κ ≡ E[ S 2 | I c ∩ W c ] − E[ S 2 | W ]
where κ denotes the amount of the warning effect (in milliseconds), this becomes
E[RTcross-modal] = E[S1] + E[S2|Ic ∩ Wc] – P(I) · Δ – P(W) · κ.
(14.15)
In the unimodal condition, neither integration nor warning are possible. Thus,
E[RTunimodal] = E[S1] + E[S2|Ic ∩ Wc],
(14.16)
and we arrive at a simple expression for the combined effect of multisensory integration and warning, cross-modal interaction (CI),
CI ≡ E[RTunimodal] – E[RTcross-modal] = P(I) · Δ + P(W) · κ.
(14.17)
Recall that the basic assumptions of TWIN imply that for a given spatial configuration and nontarget modality, there are no sign reversals or changes in magnitude of Δ across all SOA values. The same holds for κ. Note, however, that Δ and κ can separately take on positive or negative values (or zero) depending on whether multisensory integration and warning have a facilitative or inhibitory effect. Furthermore, for the probability of integration P(I), the probability of warning P(W) does change with SOA.
14.5.1 TWIN Predictions for FAP with Warning The occurrence of a warning effect depends on intramodal characteristics of the target and the nontarget, such as modality or intensity. Assuming that increasing stimulus intensity goes along with decreased reaction time (for auditory stimuli, see, e.g., Frens et al. 1995; Arndt and Colonius 2003; for stimuli, see Diederich and Colonius 2004b), TWIN makes specific predictions regarding the effect of nontarget intensity variation. Intensity effects. An intense (auditory) nontarget may have a higher chance to win the race with a head start compared to a weak nontarget. In general, increasing the intensity of the nontarget (1) increases the probability of it functioning as a warning signal, and (2) makes it more likely for the nontarget to win the peripheral race against the target process. SOA effects. The probability of warning P(W) decreases monotonically with SOA: the later the nontarget is presented, the smaller its chances to win the race against the target with some head start γ. This differs from the nonmonotonic relationship predicted between P(IFAP) and SOA (see above). It is interesting to note that the difference in how P(I) and P(W) should depend on SOA is, in principle, empirically testable without any distributional assumptions by manipulating the conditions of the experiment. Specifically, if target and nontarget are presented in two distinct spatial conditions, for example, ipsilateral and contralateral, one would expect Δ to take on two different values, Δi and Δc, whereas P(W) · κ, the expected nonspatial warning effect, should remain the same under both conditions. Subtracting the corresponding cross-modal interaction terms then gives, after canceling the warning effect terms (Equation 14.17),
CIi – CIc = P(I) · (Δi – Δc).
(14.18)
269
Modeling Multisensory Processes in Saccadic Responses
This expression is an observable function of SOA and, because the factor Δi – Δc does not depend on SOA by Assumption B3, it should exhibit the same functional form as P(I): increasing and then decreasing (see Figure 14.1, middle left panel). Context effects. The magnitude of the warning effect may be influenced by the experimental design. Specifically, presenting nontargets from different modalities in two distinct presentation modes, e.g., blocking or mixing the modality of the auditory and tactile nontargets within an experimental block of trials, such that supposedly no changes in the expected amount of multisensory integration should occur, then subtraction of the corresponding CI values yields, after canceling the integration effect terms, CIblocked – CImixed = P(W) · (κmixed – κ blocked),
(14.19)
a quantity that should decrease monotonically with SOA because P(W) does. The extension of the model to include warning effects has been probed for both auditory and tactile nontargets. Concerning the warning assumptions, no clear superiority of version A over version Warning
Integration and warning
150
Mean RT (ms)
Mean RT (ms)
150
140
140
1
1
0.5
0.5
0
0
12
12
8
8
MRE
MRE
Pr(W), Pr(I)
130
Pr(W)
130
4
4
0 −400 −300 −200 −100 0 SOA (ms)
0 100
200
−400 −300 −200 −100 0 SOA (ms)
100
200
FIGURE 14.3 TWIN predictions for FAP when only warning occurs (left panels) and when both integration and warning occur (right panels). Parameters are chosen as before: 1/λV = 50 and μ = 100, resulting in a mean RT for visual stimulus of 150 ms. Peripheral processing times for auditory stimuli are 1/λA = 10 ms (dashed line), 1/λA = 30 ms (solid), 1/λA = 70 ms (dash-dotted), and 1/λA = 90 ms (black dotted).
270
The Neural Bases of Multisensory Processes
B was found in the data. For detailed results on all of the tests described above, we refer the reader to Diederich and Colonius (2008). SOA and intensity: quantitative predictions. To illustrate the predictions of TWIN with warning for mean SRT, we choose the following set of parameters. As before, the intensity parameter for the visual modality is set to 1/λV = 50 (ms) and to 1/λA = 10, 30, 70, or 90 (ms) for the (auditory) nontarget, the parameter for second stage processing time when no integration and no warning occurs, μ ≡ E[S2|Ic ∩ Wc], is set to 100 ms, and the TWIN to 200 ms. The parameter for multisensory integration is set to Δi = 20 ms for bimodal stimuli presented ipsilaterally, and κ is set to 5 ms (Figure 14.3).
14.6 CONCLUSIONS: OPEN QUESTIONS AND FUTURE DIRECTIONS The main contribution of the TWIN framework thus far is to provide an estimate of the multisensory integration effect—and, for the extended model, also of a possible warning effect—that is “contaminated” neither by a specific SOA nor by intramodal stimulus properties such as intensity. This is achieved through factorizing* expected cross-modal interaction into the probability of interaction in a given trial, P(I), times the amount of interaction Δ (cf. Equation 14.2), the latter being measured in milliseconds. Some potential extensions of the TWIN framework are discussed next. Although the functional dependence of P(I) on SOA and stimulus parameters is made explicit in the rules governing the opening and closing of the time window, the TWIN model framework as such does not stipulate a mechanism for determining the actual amount of interaction. By Assumption B4, Δ depends on cross-modal features like, for example, spatial distance between the stimuli of different modalities, and by systematically varying the spatial configuration, some insight into the functional dependence can be gained (e.g., Diederich and Colonius 2007b). Given the diversity of intersensory interaction effects, however, it would be presumptuous to aim at a single universal mechanism for predicting the amount of Δ. This does not preclude incorporating multisensory integration mechanisms into the TWIN framework within a specific context such as a spatial orienting task. Such an approach, which includes stipulating distributional properties of second stage processing time in a given situation, would bring along the possibility of a stronger quantitative model test, namely at the level of the entire observable reaction time distribution rather than at the level of means only. In line with the framework of modeling multisensory integration as (nearly) optimal decision making (Körding et al. 2007), we have recently suggested a decision rule that determines an optimal window width as a function of (1) the prior odds in favor of a common multisensory source, (2) the likelihood of arrival time differences, and (3) the payoff for making correct or wrong decisions (Colonius and Diederich 2010). Another direction is to extend the TWIN framework to account for additional experimental paradigms. For example, in many studies, a subject’s task is not simply to detect the target but to perform a speeded discrimination task between two stimuli (Driver and Spence 2004). Modeling this task implies not only a prediction of reaction time but also of the frequency of a correct or incorrect discrimination response. Traditionally, such data have been accommodated by assuming an evidence accumulation mechanism sequentially sampling information from the stimulus display favoring either response option A or B, for example, and stopping as soon as a criterion threshold for one or the other alternative has been reached. A popular subclass of these models are the diffusion models, which have been considered models of multisensory integration early on (Diederich 1995, 2008). At this point, however, it is an open question how this approach can be reconciled with the TWIN framework.
* Strictly speaking, this only holds for the focused attention version of TWIN; for the redundant target version, an estimate of the amount of statistical facilitation is required and can be attained empirically (cf. Colonius and Diederich 2006).
271
Modeling Multisensory Processes in Saccadic Responses
One of the most intriguing neurophysiological findings has been the suppression of multisensory integration ability of superior colliculus neurons by a temporary suspension of corticotectal inputs from the anterior ectosylvian sulcus and the lateral suprasylvian sulcus (Clemo and Stein 1986; Jiang et al. 2001). A concomitant effect on multisensory orientation behavior observed in the cat (Jiang et al. 2002) suggests the existence of more general cortical influences on multisensory integration. Currently, there is no explicit provision of a top-down mechanism in the TWIN framework. Note, however, that the influence of task instruction (FAP vs. RTP) is implicitly incorporated in TWIN because the probability of integration is supposed to be computed differently under otherwise identical stimulus conditions (cf. Section 14.4.4). It is a challenge for future development to demonstrate that the explicit incorporation of top-down processes can be reconciled with the two-stage structure of the TWIN framework.
APPENDIX A A.1 DERIVING THE PROBABILITY OF INTERACTION IN TWIN The peripheral processing times V for the visual and A for the auditory stimulus have an exponential distribution with parameters λV and λA, respectively. That is, fV (t ) = λV e − λV t , fA (t ) = λA e − λA t
for t ≥ 0, and f V(t) = fA(t) ≡ 0 for t < 0. The corresponding distribution functions are referred to as FV(t) and FA(t).
A.1.1 Focused Attention Paradigm The visual stimulus is the target and the auditory stimulus is the nontarget. By definition, P(I FAP ) = Pr ( A + τ < V < A + τ + ω ) ∞
=
∫ f (x){F (x + τ + ω ) − F (x + τ )} dx, A
V
V
0
where τ denotes the SOA value and ω is the width of the integration window. Computing the integral expression requires that we distinguish between three cases for the sign of τ + ω: (1) τ < τ + ω < 0 −τ
P(I FAP ) =
∫
{
}
λA e − λA x 1 − e − λV ( x +τ +ω ) d x
− τ −ω ∞
+
∫ λ e {e A
− λA x
− λV ( x +τ )
}
− e − λV ( x +τ +ω ) d x
−τ
=
λV λV + λ A
e λAτ ( −1 + e λA ω ) ;
272
The Neural Bases of Multisensory Processes
(2) τ < 0 < τ + ω −τ
P(I FAP ) =
∫ λ e {1 − e − λA x
A
− λV ( x +τ +ω )
} dx
0
∞
+
∫ λ e {e − λAx
A
}
− λV ( x +τ )
− e − λV ( x +τ +ω ) d x
−τ
=
1 λV + λ A
{λ
A
(1 − e−
λV (ω +τ )
) + λV (1 − e )}; λA τ
(3) 0 < τ < τ + ω ∞
P(I FAP ) =
∫ λ e {e − λA x
A
− λV ( x +τ )
}
− e − λV ( x +τ +ω ) d x
0
=
λA λV + λ A
{e
− λV τ
− e − λ V (ω + τ )
}
The mean RT for cross-modal stimuli is c E[RTVA,τ ] = E[V ] + E[ S2 | I FAP ] − P( I FAP ) ⋅ ∆
=
1
+ − P( I FAP ) ⋅ ∆
λV
and the mean RT for the visual target is E[RTV ] =
1 λV
+ ,
where 1/λV, the mean of the exponential distribution, is the mean RT of the first stage and μ is the mean RT of the second stage when no interaction occurs.
A.1.2 Redundant Target Paradigm The visual stimulus is presented first and the auditory stimulus second. By definition,
P(IRTP) = Pr{max(V, A + τ) < min(V, A + τ) + ω}
If the visual stimulus wins: (1) 0 ≤ τ ≤ ω τ
P(I RTPV ) =
− λV x
(1 − e−
− λV x
(1 − e
∫λ e V
λ A ( x +ω −τ )
) dx
0
∞
+
∫λ e V
− λ A ( x +ω −τ )
− (1 − e − λA ( x −τ ) ) ) d x
τ
=
1 λV (1 − e λA ( − ω +τ ) ) + λ A (1 − e( − λV τ ) ; λV + λ A
{
}
273
Modeling Multisensory Processes in Saccadic Responses
(2) 0 < ω ≤ τ τ
P(I RTPV ) =
∫ λ x (1 − e V
− λ A ( x +ω −τ )
) dx
τ −ω
∞
+
∫ λ e (1 − e V
− λV x
− λ A ( x +ω − τ )
)
− (1 − e − λA ( x −τ ) ) d x
τ
=
λA λV + λ A
{e
− λV τ
}
⋅ ( −1 + e λV ω )
If the auditory stimulus wins: 0 < τ ≤ τ + ω and ∞
P(I RTPA ) =
∫ λ e {e − λA x
A
− λ V ( x +τ )
}
− e − λ V ( x +τ +ω ) d x
0
=
λA λV + λ A
{e
− e − λ V ( ω +τ )
− λV τ
}
The probability that the visual or the auditory stimulus wins is therefore P( I RTP ) = P( I RTPV ) + P( I RTPA ).
The mean RT for cross-modal stimuli is c E[RTVA,τ ] = E[min(V , A + τ )] + E[ S2 | I RTP ] − P( I RTP ) ⋅ ∆
=
1 λV
1
− e − λV τ ⋅
λV
1
−
+ − P( I RTP ) ⋅ ∆
λV + λ A
and the mean RT for the visual and auditory stimulus is E[RTV ] =
1 λV
+ ,
and E[RTA ] =
1 λA
+ ,
A.1.3 Focused Attention and Warning By definition, P(W ) = Pr ( A + τ + γ A < V ) ∞
=
∫ f (x){1 − F (x + τ + γ A
V
A
)} d x
0
∞
= 1−
∫ f ( x ) F (a + τ + γ A
0
V
A
) d x.
274
The Neural Bases of Multisensory Processes
Again, we need to consider different cases: (1) τ + γA < 0 ∞
∫
P(W ) = 1 −
{
}
λ A e − λ A a 1 − e − λ V ( a +τ + γ A ) d a
− τ −γ A
= 1−
λV λV + λ A
e λ A (τ + γ A ) ;
(2) τ + γA ≥ 0 ∞
∫ λ e {1 − e
P(W ) = 1 −
A
− λA a
− λV ( a +τ + γ A )
} da
0
=
λA λV + λ A
e − λV ( τ + γ A ) .
The mean RT for cross-modal stimuli is c E[RTVA,τ ] = E[V ] + E[ S2 | I FAP ] − P( I FAP ) ⋅ ∆ − P(W ) ⋅ κ
=
1 λV
+ − P( I FAP ) ⋅ ∆ − P(W ) ⋅ κ
where 1/λV is the mean RT of the first stage, μ is the mean RT of the second stage when no interaction occurs, P(IFAP) · Δ is the expected amount of intersensory interaction, and P(W) · κ is the expected amount of warning.
REFERENCES Amlôt, R., R. Walker, J. Driver, and C. Spence. 2003. Multimodal visual-somatosensory integration in saccade generation. Neuropsychologia 41:1–15. Anastasio, T.J., P.E. Patton, and K. Belkacem-Boussaid. 2000. Using Bayes’ rule to model multisensory enhancement in the superior colliculus. Neural Computation 12:1165–1187. Arndt, A., and H. Colonius. 2003. Two separate stages in crossmodal saccadic integration: Evidence from varying intensity of an auditory accessory stimulus. Experimental Brain Research 150:417–426. Bell, A.H., A. Meredith, A.J. Van Opstal, and D.P. Munoz. 2005. Crossmodal integration in the primate superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of Neurophysiology 93:3659–3673. Clemo, H.R., and B.E. Stein. 1986. Effects of cooling somatosensory corticotectal influences in cat. Journal of Neurophysiology 55:1352–1368. Colonius, H., and P. Arndt. 2001. A two-stage model for visual-auditory interaction in saccadic latencies. Perception & Psychophysics, 63:126–147. Colonius, H., and A. Diederich. 2002. A maximum-likelihood approach to modeling multisensory enhancement. In Advances in Neural Information Processing Systems 14, T.G. Ditterich, S. Becker, and Z. Ghahramani (eds.). Cambridge, MA: MIT Press. Colonius, H., and A. Diederich. 2004. Multisensory interaction in saccadic reaction time: A time-window-ofintegration model. Journal of Cognitive Neuroscience 16:1000–1009. Colonius, H., and A. Diederich. 2006. Race model inequality: Interpreting a geometric measure of the amount of violation. Psychological Review 113(1):148–154. Colonius, H., and A. Diederich. 2010. The optimal time window of visual–auditory integration: A reaction time analysis. Frontiers in Integrative Neuroscience, 4:11. doi:10.3389/fnint.2010.00011.
Modeling Multisensory Processes in Saccadic Responses
275
Colonius, H., and D. Vorberg. 1994. Distribution inequalities for parallel models with unlimited capacity. Journal of Mathematical Psychology 38:35–58. Colonius, H., A. Diederich, and R. Steenken. 2009. Time-window-of-integration (TWIN) model for saccadic reaction time: Effect of auditory masker level on visual-auditory spatial interaction in elevation. Brain Topography 21:177–184. Corneil, B.D., and D.P. Munoz. 1996. The influence of auditory and visual distractors on human orienting gaze shifts. Journal of Neuroscience 16:8193–8207. Corneil, B.D., M. Van Wanrooij, D.P. Munoz, A.J. Van Opstal. 2002. Auditory-visual interactions subserving goal-directed saccades in a complex scene. Journal of Neurophysiology 88:438–454. Diederich, A. 1995. Intersensory facilitation of reaction time: Evaluation of counter and diffusion coactivation models. Journal of Mathematical Psychology 39:197–215. Diederich, A. 2008. A further test on sequential sampling models accounting for payoff effects on response bias in perceptual decision tasks. Perception & Psychophysics 70(2):229–256. Diederich, A., and H. Colonius. 2004a. Modeling the time course of multisensory interaction in manual and saccadic responses. In Handbook of multisensory processes, ed. G. Calvert, C. Spence, and B.E. Stein, 395–408. Cambridge, MA: MIT Press. Diederich, A., and H. Colonius. 2004b. Bimodal and trimodal multisensory enhancement: Effects of stimulus onset and intensity on reaction time. Perception & Psychophysics 66(8):1388–1404. Diederich, A., and H. Colonius. 2007a. Why two “distractors” are better than one: Modeling the effect of nontarget auditory and tactile stimuli on visual saccadic reaction time. Experimental Brain Research 179:43–54. Diederich, A., and H. Colonius. 2007b. Modeling spatial effects in visual–tactile saccadic reaction time. Perception & Psychophysics 69(1):56–67. Diederich, A., and H. Colonius. 2008. Crossmodal interaction in saccadic reaction time: Separating multisensory from warning effects in the time window of integration model. Experimental Brain Research 186:1–22. Diederich, A., H. Colonius, D. Bockhorst, and S. Tabeling. 2003. Visual–tactile spatial interaction in saccade generation. Experimental Brain Research 148:328–337. Diederich, A., H. Colonius, and A. Schomburg. 2008. Assessing age-related multisensory enhancement with the time-window-of-integration model. Neuropsychologia 46:2556–2562. Doyle, M.C., and R. Walker. 2002. Multisensory interactions in saccade target selection: Curved saccade trajectories Experimental Brain Research 142:116–130. Driver, J., and C. Spence. 2004. Crossmodal spatial attention: Evidence from human performance. In Crossmodal space and crossmodal attention, ed. C. Spence and J. Driver, 179–220. Oxford: Oxford Univ. Press. Eimer, M. 2001. Crossmodal links in spatial attention between vision, audition, and touch: Evidence from event-related brain potentials. Neuropsychologia 39:1292–1303. Frens, M.A., A.J. Van Opstal, and R.F. Van der Willigen. 1995. Spatial and temporal factors determine auditory– visual interactions in human saccadic eye movements. Perception & Psychophysics 57:802–816. Harrington, L.K., and C.K. Peck. 1998. Spatial disparity affects visual–auditory interactions in human sensorimotor processing. Experimental Brain Research 122:247–252. Hershenson, M. 1962. Reaction time as a measure of intersensory facilitation. Journal of Experimental Psychology 63:289–293. Hughes, H.C., P.-A. Reuter-Lorenz, G. Nozawa, and R. Fendrich. 1994. Visual–auditory interactions in sensorimotor processing: Saccades versus manual responses. Journal of Experimental Psychology: Human Perception and Performance 20:131–153. Hughes, H.C., M.D. Nelson, and D.M. Aronchick. 1998. Spatial characteristics of visual–auditory summation in human saccades. Vision Research 38:3955–3963. Jiang, W., M.T. Wallace, H. Jiang, J.W. Vaughan, and B.E. Stein. 2001. Two cortical areas mediate multisensory integration in superior colliculus neurons. Journal of Neurophysiology 85:506–522. Jiang, W., H. Jiang, and B.E. Stein. 2002. Two cortical areas facilitate multisensory orientation behaviour. Journal of Cognitive Neuroscience 14:1240–1255. Körding, K.P., U. Beierholm, W.J. Ma, S. Quartz, J.B. Tenenbaum et al. 2007. Causal inference in multisensory perception. PLoS ONE 2(9):e943, doi:10.1371/journal.pone.0000943. Klein, R., and A. Kingstone. 1993. Why do visual offsets reduce saccadic latencies? Behavioral and Brain Sciences 16(3):583–584. Luce, R.D. 1986. Response times: Their role in inferring elementary mental organization. New York: Oxford Univ. Press. Meredith, M.A. 2002. On the neural basis for multisensory convergence: A brief overview. Cognitive Brain Research 14:31–40.
276
The Neural Bases of Multisensory Processes
Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56:640–662. Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. Journal of Neuroscience 10:3215–3229. Miller, J.O. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology 14:247–279. Munoz, D.P., and R. H. Wurtz. 1995. Saccade-related activity in monkey superior colliculus. I. Characteristics of burst and buildup cells. Journal of Neurophysiology 73:2313–2333. Navarra, J., A. Vatakis, M. Zampini, S. Soto-Faraco, W. Humphreys, and C. Spence. 2005. Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain Research 25:499–507. Nickerson, R.S. 1973. Intersensory facilitation of reaction time: Energy summation or preparation enhancement. Psychological Review 80:489–509. Raab, D.H. 1962. Statistical facilitation of simple reaction times. Transactions of the New York Academy of Science 24:574–590. Reuter-Lorenz, P.A., H.C. Hughes, and R. Fendrich. 1991. The reduction of saccadic latency by prior offset of the fixation point: An analysis of the gap effect. Perception & Psychophysics 49(2):167–175. Ross, S.M., and L.E. Ross. 1981. Saccade latency and warning signals: Effects of auditory and visual stimulus onset and offset. Perception & Psychophysics 29(5):429–437. Rowland, B.A., and B.E. Stein. 2008. Temporal profiles of response enhancement in multisensory integration. Frontiers in Neuroscience 2:218–224. Schweickert, R., D.L. Fisher, and K. Sung. Discovering Cognitive Architecture by Selectively Influencing Mental Processes. London: World Scientific Publishing (in press). Sinclair, C., and G.R. Hammond. 2009. Excitatory and inhibitory processes in primary motor cortex during the foreperiod of a warned reaction time task are unrelated to response expectancy. Experimental Brain Research 194:103–113. Spence, C., and S. Squire. 2003. Multisensory integration: Maintaining the perception of synchrony. Current Biology 13:R519–R521. Stein, B.E., and Meredith M.A. 1993. The Merging of the Senses. Cambridge, MA: MIT Press. Stein, B.E., W.S. Huneycutt, and M.A. Meredith. 1988. Neurons and behavior: The same rules of multisensory integration apply. Brain Research 448:355–358. Stein, B.E., W. Jiang, and T.R. Stanford. 2004. Multisensory integration in single neurons in the midbrain. In Handbook of multisensory processes, ed. G. Calvert, C. Spence, and B.E. Stein, 243–264. Cambridge, MA: MIT Press. Sternberg, S. 2001. Separate modifiability, mental modules, and the use of pure and composite measures to reveal them. Acta Psychologica 106:147–246. Todd, J.W. 1912. Reaction to multiple stimuli, in Archives of Psychology, No. 25. Columbia contributions to philosophy and psychology, ed. R.S. Woodworth, Vol. XXI, No. 8, New York: The Science Press. Townsend, J.T., and G. Nozawa. 1995. Spatio-temporal properties of elementary perception: An investigation of parallel, serial, and coactive theories. Journal of Mathematical Psychology 39:321–359. Van Opstal, A.J., and D.P. Munoz. 2004. Auditory–visual interactions subserving primate gaze orienting. In Handbook of multisensory processes, ed. G. Calvert, C. Spence, and B.E. Stein, 373–393. Cambridge, MA: MIT Press. Van Wassenhove, V., K.W. Grant, and D. Poeppel. 2007. Temporal window of integration in auditory–visual speech perception. Neuropsychologia 45:598–607. Van Zandt, T. 2002. Analysis of response time distributions. In Stevens’ handbook of experimental psychology, vol. 4, 3rd edn, ed. H. Pashler. New York: Wiley & Sons, Inc. Whitchurch, E.A., and T.T. Takahashi. 2006. Combined auditory and visual stimuli facilitate head saccades in the barn owl (Tyto alba). Journal of Neurophysiology 96:730–745.
Section IV Development and Plasticity
15
The Organization and Plasticity of Multisensory Integration in the Midbrain Thomas J. Perrault Jr., Benjamin A. Rowland, and Barry E. Stein
CONTENTS 15.1 15.2 15.3 15.4
Impact of Multisensory Integration....................................................................................... 279 Organization of Multisensory Organization in Adult SC......................................................280 SC Multisensory Integration Depends on Influences from Cortex....................................... 287 Ontogeny of SC Multisensory Integration............................................................................. 288 15.4.1 Impact of Developing in Absence of Visual–Nonvisual Experience........................ 289 15.4.2 Altering Early Experience with Cross-Modal Cues by Changing Their Spatial Relationships.............................................................................................................. 291 15.4.3 Role of Cortical Inputs during Maturation................................................................ 291 15.4.4 Ontogeny of Multisensory Integration in Cortex...................................................... 292 15.4.5 Ontogeny of SC Multisensory Integration in a Primate............................................ 292 Acknowledgments........................................................................................................................... 294 References....................................................................................................................................... 294 A great deal of attention has been paid to the physiological processes through which the brain integrates information from different senses. This reflects the substantial impact of this process on perception, cognitive decisions, and overt behavior. Yet, less attention has been given to the postnatal development, organization, and plasticity associated with this process. In the present chapter we examine what is known about the normal development of multisensory integration and how early alterations in postnatal experience disrupt, change, and dramatically alter the fundamental properties of multisensory integration. The focus here is on the multisensory layers of the cat superior colliculus (SC), a system that has served as an excellent model for understanding multisensory integration at the level of the single neuron and at the level of overt orientation behavior. Before discussing this structure’s normal development and its capacity to change, it is important to examine what has been learned about multisensory integration and the functional role of the SC in this process.
15.1 IMPACT OF MULTISENSORY INTEGRATION The ability of the brain to integrate information from different sources speeds and enhances its ability to detect, locate, and identify external events as well as the higher-order and behavioral processes necessary to deal with these events (Corneil and Munoz 1996; Frens et al. 1995a; Hughes et al. 1994; Marks 2004; Newell 2004; Sathian et al. 2004; Shams et al. 2004; Stein et al. 1989; Stein and Meredith 1993; Woods et al. 2004). All brains engage in this process of multisensory integration, and do so at multiple sites within the nervous system (Calvert et al. 2004a). The proper identification of an event includes the ability to disambiguate potentially confusing signals, including those associated with speech and animal communication (Bernstein et al. 2004; Busse et al. 2005; Corneil 279
280
The Neural Bases of Multisensory Processes
and Munoz 1996; Frens et al. 1995b; Ghazanfar et al. 2005; Ghazanfar and Schroeder 2006; Grant et al. 2000; Hughes et al. 1994; King and Palmer 1985; Lakatos et al. 2007; Liotti et al. 1998; Marks 2004; Massaro 2004; Newell 2004; Partan 2004; Recanzone 1998; Sathian 2000, 2005; Sathian et al. 2004; Schroeder and Foxe 2004; Senkowski et al. 2007; Shams et al. 2004; Stein et al. 1989; Sugihara et al. 2006; Sumby and Pollack 1954; Talsma et al. 2006, 2007; Wallace et al. 1996; Weisser et al. 2005; Woldorff et al. 2004; Woods and Recanzone 2004a, 2004b; Zangaladze et al. 1999). The facilitation of these capabilities has enormous survival value, so its retention and elaboration in all extant species is no surprise. What is surprising is that despite the frequent discussion of this phenomenon in adults (see Calvert et al. 2004b; Ghazanfar and Schroeder 2006; Spence and Driver 2004; Stein and Meredith 1993), there is much less effort directed to understanding how this process develops, and how it adapts to the environment in which it will be used. The multisensory neuron in the cat SC is an excellent model system to explore the organization and plasticity of multisensory integration. This is because it is not only the primary site of converging inputs from different senses (Fuentes-Santamaria et al. 2008; Stein et al. 1993; Wallace et al. 1993), but because it is involved in well-defined behaviors (orientation and localization), thereby providing an opportunity to relate physiology to behavior. Furthermore, we already know a good deal about the normal development of the unisensory properties of SC neurons (Kao et al. 1994; Stein 1984) and SC neurons have been one of the richest sources of information about the ontogeny and organization of multisensory integration (Barth and Brett-Green 2004; Calvert et al. 2004b; Groh and Sparks 1996a, 1996b; Gutfreund and Knudsen 2004; Jay et al. 1987a, 1987b; King et al. 2004; Lakatos et al. 2007; Peck 1987b; Sathian et al. 2004; Senkowski et al. 2007; Stein 1984; Stein and Arigbede 1972; Stein and Clamann 1981; Stein and Meredith 1993; Stein et al. 1973, 1976, 1993; Wallace 2004; Woods et al. 2004a). Of the most interest in the present context are two experimental observations. The first is that influences from the cortex are critical for the maturation of SC multisensory integration, the second is that experience during early postnatal life guides the nature of that integrative process. These are likely to be interrelated observations given the well-known plasticity of neonatal cortex. One reasonable possibility is that experience is coded in the cortex and in the morphology and functional properties of its connections with the SC.
15.2 ORGANIZATION OF MULTISENSORY ORGANIZATION IN ADULT SC Traditionally, the seven-layered structure of the SC has been subdivided into two functional sets of laminae: the superficial laminae (I–III) are exclusively visual, and the deeper laminae (IV–VII) contain unisensory (visual, auditory, and somatosensory) and multisensory neurons of all possible combinations (Stein and Meredith 1993). Visual, auditory, and somatosensory representations in the SC are all arranged in a similar map-like fashion so that they are all in register with each other (see Figure 15.1; Meredith and Stein 1990; Meredith et al. 1991; Middlebrooks and Knudsen 1984; Stein and Clamann 1981; Stein et al. 1976, 1993). The frontal regions of sensory space (forward visual and auditory space, and the face), are represented in the anterior aspect of the structure, whereas more temporal space (and the rear of the body) are represented in the posterior SC. Superior sensory space is represented in the medial aspect of the structure, and inferior space in the more lateral aspect of the structure. As a consequence, the neurons in a given region of the SC represent the same region of sensory space. These sensory maps are in register with the premotor map in the SC. This is a convenient way of matching incoming sensory information with the outgoing signals that program an orientation to the initiating event (Grantyn and Grantyn 1982; Groh et al. 1996a, 1996b; Guitton and Munoz 1991; Harris 1980; Jay and Sparks 1984, 1987a, 1987b; Munoz and Wurtz 1993a, 1993b; Peck 1987b; Sparks 1986; Sparks and Nelson 1987; Stein and Clamann 1981; Wurtz and Goldberg 1971; Wurtz and Albano 1980). Each multisensory SC neuron has multiple receptive fields, one for each of the modalities to which it responds. As would be expected from the structure’s map-like representations of the senses,
281
The Organization and Plasticity of Multisensory Integration in the Midbrain
Nas al
Nasal Temporal
ora
l
Sup erio r Infe rior
Medial
Caudal
Multisensory
Face
r erio Sup rior e Inf
Body Dor sal Ven tral
Tem p
Visual Auditory
Somatosensory
FIGURE 15.1 Correspondence of visual, auditory, and somatosensory representations in SC. Horizontal and vertical meridians of different sensory representations in SC suggest a common coordinate system representing multisensory space. (From Stein, B.E., and Meredith, M.A., The merging of the senses, MIT Press, Cambridge, 1993. With permission.)
these receptive fields are in spatial coincidence with each other (King et al. 1996; Meredith and Stein 1990; Meredith et al. 1991, 1992). Cross-modal stimuli that are in spatial and temporal coincidence with one another and fall within the excitatory receptive fields of a given neuron function synergistically. They elicit more vigorous responses (more impulses) than are evoked by the strongest of them individually. This is called “multisensory enhancement” and is illustrated in Figure 15.2. However, when these same stimuli are disparate in space, such that one falls within its excitatory receptive Response enhancement
Response depression
Auditory RF
S
94% 100
Sum
50
4
I
N Ao
V Ai
2 0
V Ai VAi
0
Mean impulses
Mean impulses
6
*
100 10 8 6 4 2 0 Visual RF
50
Sum *
V AoVAo
–47%
% Interaction
8
% Interaction
10
0
50
FIGURE 15.2 Multisensory enhancement and depression. Middle: visual (dark gray) and auditory (light gray) receptive fields (RF) of this SC neuron are plotted on hemispheres representing visual and auditory space. Each concentric circle represents 10° of space with right caudal aspect of auditory space represented by the half hemisphere. White bar labeled V represents a moving visual stimulus, whereas speakers labeled A0 and Ai represent auditory stimuli. Left: response enhancement occurred when visual and auditory stimuli were placed in spatial congruence (VAi). Note, in plot to the left, multisensory response exceeded sum of visual and auditory responses (horizontal dotted line) and was 94% greater than response to the most effective component stimulus (visual). Right: response depression occurred when visual and auditory stimuli were spatially disparate (VA0) so that multisensory response was 47% less than response to visual stimulus.
282
The Neural Bases of Multisensory Processes
field and the other falls within the inhibitory portion of its receptive field, the result is “multisensory depression.” Now the response consists of fewer impulses than that evoked by the most effective individual component stimulus. This ubiquitous phenomenon of enhancement and depression has been described in the SC and cortex for a number of organisms ranging from the rat to the human (Barth and Brett-Green 2004; Calvert et al. 2004b; DeGelder et al. 2004; Fort and Giard 2004; Ghazanfar and Schroeder 2006; King and Palmer 1985; Lakatos et al. 2007; Laurienti et al. 2002; Lovelace et al. 2003; Macaluso and Driver 2004; Meredith and Stein 1983, 1986a, 1986b, 1996; Morgan et al. 2008; Romanski 2007; Sathian et al. 2004; Schroeder et al. 2001; Schroeder and Foxe 2002, 2004; Wallace and Stein 1994; Wallace et al. 1992, 1993, 1998, 2004b). The clearest indicator that a neuron can engage in multisensory integration is its ability to show multisensory enhancement because multisensory depression occurs only in a subset of neurons that show multisensory enhancement (Kadunce et al. 2001). The magnitude of response enhancement will vary dramatically, both among neurons across the population as well as within a particular neuron throughout its dynamic range. This variation is in part due to differences in responses to different cross-modal stimulus combinations. When spatiotemporally aligned cross-modal stimuli are poorly effective, multisensory response enhancement magnitudes are often proportionately greater than those elicited when stimuli are robustly effective. Single neurons have demonstrated that multisensory responses are capable of exceeding predictions based on the simple addition of the two unisensory responses. These superadditive interactions generally occur at the lower end of a given neuron’s dynamic range and as stimulus effectiveness increases, multisensory responses tend to exhibit more additive or subadditive interactions (Alvarado et al. 2007b; Perrault et al. 2003, 2005; Stanford and Stein 2007; Stanford et al. 2005), a series of transitions that are consistent with the concept of “inverse effectiveness” (Meredith and Stein 1986b), in which the product of an enhanced multisensory interaction is proportionately largest when the effectiveness of the cross-modal stimuli are weakest. Consequently, the proportionate benefits that accrue to performance based on this neural process will also be greatest. This makes intuitive sense because highly effective cues are generally easiest to detect, locate, and identify. Using the same logic, the enhanced magnitude of a multisensory response is likely to be proportionately largest at its onset, because it is at this point when the individual component responses would be just beginning, and thus, weakest. Recent data suggests this is indeed the case (Rowland et al. 2007a, 2007b; see Figure 15.3). This is of substantial interest because it means that individual responses often, if not always, involve multiple underlying computations: superadditivity at their onset and additivity (and perhaps subadditivity) as the response evolves. In short, the superadditive multisensory computation may be far more common than previously thought, rendering the initial portion of the response of far greater impact than would otherwise be the case and markedly increasing its likely role in the detection and localization of an event. Regarding computational modes, one should be cautious when interpreting multisensory response enhancements from pooled samples of neurons. As noted earlier, the underlying computation varies among neurons as a result of their inherent properties and the specific features of the cross-modal stimuli with which they are evaluated. Many of the studies cited above yielded significant population enhancements that appear “additive,” yet one cannot conclude from these data that this was their default computation (e.g., Alvarado et al. 2007b; Perrault et al. 2005; Stanford et al. 2005). This is because they were examined with a battery of stimuli whose individual efficacies were disproportionately high. Because of inverse effectiveness, combinations of such stimuli would, of course, be expected to produce less robust enhancement and a high incidence of additivity (Stanford and Stein 2007). If those same neurons were tested with minimally effective stimuli exclusively, the incidence of superadditivity would have been much higher. Furthermore, most neurons, regardless of the computation that best describes their averaged response, exhibit superadditive computations at their onset, when activity is weakest (Rowland and Stein 2007). It is important to consider that this initial portion of a multisensory response may have the greatest impact on behavior (Rowland et al. 2007a).
The Organization and Plasticity of Multisensory Integration in the Midbrain
Trials
V
Trials
3
VA
2
A
1 0
0
100 200 300 Time from V stim onset (ms)
100 200 300 Time from V stim onset (ms)
0
300
Event estimate comparison
0.5
A
VA
.25
VA
V
100 200 Time from V stim onset (ms)
A
0
0
100 200 300 Time from V stim onset (ms)
Qsum (# impulses)
0
Qsum comparison
Event estimate
Trials
Impulse rasters
283
V 0
100 200 Time from V stim onset (ms)
300
FIGURE 15.3 Temporal profile of multisensory enhancement. Left: impulse rasters illustrating responses of a multisensory SC neuron to visual (V), auditory (A), and combined visual–auditory (VA) stimulation. Right: two different measures of response show the same basic principle of “initial response enhancement.” Multisensory responses are enhanced from their very onset and have shorter latencies than either of individual unisensory responses. Upper right: measure is mean stimulus-driven cumulative impulse count (qsum), reflecting temporal evolution of enhanced response. Bottom right: an instantaneous measure of response efficacy using event estimates. Event estimates use an appropriate kernel function that convolves impulse spike trains into spike density functions that differentiate spontaneous activity from stimulus-driven activity using a mutual information measure. Spontaneous activity was then subtracted from stimulus-driven activity and a temporal profile of multisensory integration was observed. (From Rowland, B.A., and Stein, B.E., Frontiers in Neuroscience, 2, 218–224, 2008. With permission.)
This process of integrating information from different senses is computationally distinct from the integration of information within a sense. This is likely to be the case, in large part, because the multiple cues in the former provide independent estimates of the same initiating event whereas the multiple cues in the latter contain substantial noise covariance (Ernst and Banks 2002). Using this logic, one would predict that a pair of within-modal stimuli would not yield the same response enhancement obtained with a pair of cross-modal stimuli even if both stimulus pairs were positioned at the same receptive field locations. On the other hand, one might argue that equivalent results would be likely because, in both cases, the effect reflects the amount of environmental energy. This latter argument posits that multiple, redundant stimuli explain the effect, rather than some unique underlying computation (Gondan et al. 2005; Leo et al. 2008; Lippert et al. 2007; Miller 1982; Sinnett et al. 2008). The experimental results obtained by Alvarado and colleagues (Figure 15.4) argue for the former explanation. The integration of cross-modal cues produced significantly greater response products than did the integration of within-modal cues. The two integration products also reflected very different underlying neural computations, with the latter most frequently reflecting subadditivity—a computation that was rarely observed with cross-modal cues (Alvarado et al. 2007b). Gingras et al. (2009) tested the same assumption and came to the same conclusions using an overt behavioral measure in which cats performed a detection and localization task in response to cross-modal (visual– auditory) and within-modal (visual–visual or auditory–auditory) stimulus combinations (Gingras et al. 2009; Figure 15.5).
284
The Neural Bases of Multisensory Processes (a) Multisensory response (impulses)
Multisensory integration cross-modal stimulus condition
Combined unisensory response (impulses)
(b)
30 R = 0.93 25 20 y = 1.29x + 1.11
15 10 5 0
Significant difference No significant difference
0
5 10 15 20 25 Best unisensory response (impulses)
30
Unisensory integration within-modal stimulus condition 30
R = 0.94
25 20 15 10
y = 0.91x + 0.77
5 0
Significant difference No significant difference
0
5 10 15 20 25 Best unisensory response (impulses)
30
30 R = 0.94 25 20 15 10 y = 0.87x + 1.16 5 Multisensory neurons 0 0 5 10 15 20 25 30 30 R = 0.95 25 20 15 10 y = 0.96x + 0.26 5 Unisensory neurons 0 0 5 10 15 20 25 30
FIGURE 15.4 Physiological comparisons of multisensory and unisensory integration. (a) Magnitude of response evoked by a cross-modal stimulus (y-axis) is plotted against magnitude of largest response evoked by component unisensory stimuli (x-axis). Most of observations show multisensory enhancement (positive deviation from solid line of unity). (b) The same cannot be said for response magnitudes evoked by two withinmodal stimuli. Here, typical evoked response is not statistically better than that evoked by largest response to a component stimulus. Within-modal responses are similar in both multisensory and unisensory neurons (insets on right). (From Alvarado, J.C. et al., Journal of Neurophysiology 97, 3193–205, 2007b. With permission.)
Because the SC is a site at which modality-specific inputs from the different senses converge (Meredith and Stein 1986b; Stein and Meredith 1993; Wallace et al. 1993), it is a primary site of their integration, and is not a reflection of multisensory integration elsewhere in the brain. The many unisensory structures from which these inputs are derived have been well-described (e.g., see Edwards et al. 1979; Huerta and Harting 1984; Stein and Meredith 1993; Wallace et al. 1993). Most multisensory SC neurons send their axons out of the structure to target motor areas of the brainstem and spinal cord. It is primarily via this descending route that the multisensory responses of SC neurons effect orientation behaviors (Moschovakis and Karabelas 1985; Peck 1987a; Stein and Meredith 1993; Stein et al. 1993). Thus, it is perhaps no surprise that the principles found governing the multisensory integration at the level of the individual SC neuron also govern SC-mediated overt behavior (Burnett et al. 2004, 2007; Jiang et al. 2002, 2007; Stein et al. 1989; Wilkinson et al. 1996).
27%
C
36%
Wrong location +9% –29%*
+49%*
–56%*
W NG
38%
Correct
No-Go
26%
–29%
C
*
0°
+137%*
(c)
0
50
100
150
200
250
25
–30
30
–30
15
–15
30
30
15
0
+45°
V1A1
35
V1V2
–15
0
+58%
+156% *
*
A1 V1 V2 V1V2V1A1
Best unisensory accuracy (%)
20
45
–45
45
–45
+30°
*
*
+15°
+63%
+147%
A1 V1 V2 V1V2V1A1
*
*
A1 V1 V2 V1V2V1A1
+32%
+125%
FIGURE 15.5 Multisensory integration was distinct from unisensory visual–visual integration. (a) At every spatial location, multisensory integration produced substantial performance enhancements (94–168%; mean, 137%), whereas unisensory visual integration produced comparatively modest enhancements (31–79%; mean, 49%). Asterisks indicate comparisons that were significantly different (χ2 test; P < 0.05). (b) Pie charts to left show performance in response to modality-specific auditory (A1) and visual (V1 and V2 are identical) stimuli. Figures within the bordered region show performance to cross-modal (V1A1) and within-modal (V1V2) stimulus combinations. No-Go errors (NG; gray) and Wrong Localization errors (W; white) were significantly decreased as a result of multisensory integration, but only No-Go errors were significantly reduced as a result of unisensory integration. (c) Differential effect of multisensory and unisensory integration was reasonably constant, regardless of effectiveness of best component stimulus, and both showed an inverse relationship, wherein benefits were greatest when effectiveness of component stimuli was lowest. V, visual; A, auditory; C, correct. (From Gingras, G. et al., Journal of Neuroscience, 29, 4897–902, 2009. With permission.)
24%
V1V2
% Change 100
25%
W
C
W
+31% *
+137%
A1 V1 V2 V1V2V1A1
NG
65%
NG
17%
–50 W
C
W
18%
V1A1
–15°
C
NG
NG
30%
*
A1 V1 V2 V1V2V1A1
+52%
*
0
V1 & V2
*
+94%
50
51%
*
–30°
+45%
–45°
*
A1 V1 V2 V1V2V1A1
*
+123%
A1 V1 V2 V1V2V1A1
+79%
+168%
150
A1
42%
% Accuracy
(b)
0
50
100
% Enhancement
(a)
The Organization and Plasticity of Multisensory Integration in the Midbrain 285
286
The Neural Bases of Multisensory Processes Control A
V1
V2
V3
V4
20 15 10 5 0
V1
V2
V3
V4
V5
AV2
AV3
AV4
AV5
Impulses
Impulses
20 15 10 5 0
AV1
AV2
AV3
AV4
Multisensory AV5
Impulses VA V
15
*
*
150 +89%
200 10
+37%
V1
150
100 +58% 50
100 5
0 0
AES
200
+129%
5
rLS
15
*
V2 V3 V4 Visual effectiveness
V5
0
50 +9%
0%
0%
V1
V2 V3 V4 Visual effectiveness
–14% –3%
Response enhancement (%)
+180% 10
AV1
20 15 10 5 0
Impulses
20 15 10 5 0
Mean impulses
AES & rLS deactivated
Modality-specific V5 A
0
V5
FIGURE 15.6 SC multisensory integration depends on influences from association cortex. SC responses to auditory (A), visual (V), and multisensory (AV) stimuli were recorded before (left) and after (right) deactivation of association cortex. Visual stimulus was presented at multiple (five) levels of effectiveness. At the top of the figure are individual stimulus traces, impulse rasters, and peristimulus time histograms for each response. Graphs at bottom summarize these data showing mean response levels (lines) and percentage of multisensory enhancement (bars) observed for each of stimulus pairings. Before cortical deactivation, enhanced responses showed characteristic “inverse effectiveness” profile with larger unisensory responses associated with smaller multisensory enhancements. However, after cortical deactivation (shaded region of inset), multisensory enhancements were eliminated at each of stimulus effectiveness levels tested so that multisensory and unisensory responses were no longer significantly different. (From Jiang, W. et al., Journal of Neurophysiology, 85, 506–22, 2001. With permission.)
287
The Organization and Plasticity of Multisensory Integration in the Midbrain
15.3 SC MULTISENSORY INTEGRATION DEPENDS ON INFLUENCES FROM CORTEX Although, as noted above, SC neurons become multisensory as a result of receiving converging inputs from multiple visual, auditory, and somatosensory sources, this does not automatically render them capable of integrating these multiple sensory inputs. Rather, a specific component of the circuit must be operational: the projection from the association cortex. As shown in Figure 15.6, deactivating this input renders SC neurons incapable of multisensory integration. Their multisensory responses now approximate those elicited by the most effective modality-specific component stimulus, a result that is paralleled at the level of overt behavior (Alvarado et al. 2007a; Jiang and Stein 2003; Jiang et al. 2001, 2002, 2006; Stein and Meredith 1993a; Stein et al. 2002; Wallace and Stein 1994, 1997). This association cortical area in the cat is the anterior ectosylvian sulcus (AES), and an adjacent area, the rostral aspect of the lateral suprasylvian sulcus (rLS). The homologue in other species has not yet been determined. These two areas appear to be unique in this context (Burnett et al. 2004; Jiang et al. 2003, 2006, 2007; Wilkinson et al. 1996). Thus, when one of them is damaged during early life, the other can take on its role, but when both are damaged, no other cortical areas seem capable of substituting for them. In the normal animal, they generally function together in mediating SC multisensory integration, but the AES is the more important of the two, as many more neurons in the SC are dependent on AES influences than on rLS influences for this capability (Jiang et al. 2001). The intense experimental scrutiny on the influences of AES over SC multisensory integration has helped us understand the nature of these descending influences. First, their projections to the SC are derived from unisensory neurons; second, they converge from different subregions of the AES (visual, AEV; auditory, FAES; and somatosensory, SIV) onto a given SC neuron in a pattern that matches the convergence pattern from non-AES input sources (Fuentes-Santamaria et al. 2008; Wallace et al. 1992). For example, an individual multisensory SC neuron that receives converging visual input from the retina and auditory input from the inferior colliculus, will also likely receive convergent input from AEV and FAES.
FAES SIV
FIGURE 15.7 (See color insert.) SC neurons receive converging input from different sensory subregions of anterior ectosylvian (association) cortex. Flourescent tracers were deposited in auditory (FAES; green) and somatosensory (SIV; red) subregions. Axons of these cortical neurons often had boutons in contact with SC neurons, and sometimes could be seen converging onto the same target neurons. Presumptive contact points are indicated by arrows. (From Fuentes-Santamaria, V. et al., Cerebral Cortex, 18, 1640–52, 2008. With permission.)
288
The Neural Bases of Multisensory Processes
Rowland et al. (2007b) used these convergence patterns as the basis for an explanatory model in which AES inputs and other inputs have different convergence patterns on the dendrites of their SC target neurons (Rowland et al. 2007b; Figure 15.7). The model assumption of N-methyld-aspartate (NMDA) (and 2-amino-3-(5-methyl-3-oxo-1,2-oxazol-4-yl)propanoic acid (AMPA)) receptors at every dendritic region provides the possibility of producing nonlinear interaction between inputs that cluster in the same region. These clustering inputs are selectively those from AES, and are preferentially on proximal dendrites. The currents they introduce affect one another, and produce a nonlinear amplification through the NMDA receptors, something that the inputs from non-AES areas cannot do because they are more computationally segregated from one another. All inputs also contact a population of inhibitory interneurons, and these also contact SC multisensory neurons, so that the output of the SC neuron depends on the relative balance of excitatory inputs from the direct projecting inputs and the shunting inhibition via the inhibitory interneurons.
15.4 ONTOGENY OF SC MULTISENSORY INTEGRATION The multisensory properties of SC neurons described above are not characteristic of the neonate. This is evident from studies of the cat SC. The cat is an excellent model for exploring the ontogeny of sensory information processing because it is an altricial species, so that a good deal of its development is observable after birth. At this time, its eyelids are still fused and its ear canals have not yet opened. Most SC neurons are unresponsive to sensory stimuli at this time, and the few that do respond to external stimulation are activated by tactile stimuli, often on the perioral region. This is a condition that is already evident in late fetal stages (Stein et al. 1973) and has been thought to help prepare the infant for finding the nipple and suckling (Larson and Stein 1984). The first neurons that respond to auditory stimulation are encountered at approximately 5 days postnatal, but neurons responsive to visual neurons in the multisensory (i.e., deep) layers are not evident until approximately 3 weeks postnatal, long after their overlying superficial layer counterparts have been active (Kao et al. 1994; Stein et al. 1973, 1984; Wallace and Stein 1997). Just as the appearance of multisensory neurons is delayed relative to their unisensory counterparts, so is the maturation of their most characteristic property, multisensory integration. This may be because they, compared with their unisensory neighbors, have to accommodate a more complex task: determining which signals from different senses should be coupled, and which should be segregated. The first multisensory neurons that appear are those responsive to somatosensory and auditory stimuli. They become active at about postnatal day 10, several days after auditory responsiveness appears. Visual–auditory, visual–somatosensory, and trisensory neurons become active at about 3 weeks, as soon as deep-layer visual responsiveness is evident. Yet, the capacity to integrate a neuron’s multiple sensory inputs does not appear until approximately 5 weeks of age, and at this time, very few neurons are capable of this feat (Figure 15.8a). During this time, the characteristic response properties of these neurons change dramatically, exhibiting substantially reduced receptive fields and decreased response latencies (Figure 15.8b and c). Achieving the normal complement of multisensory neurons capable of multisensory integration requires months of development, a period of maturation during which inputs from the association cortex also become functional (Stein and Gallagher 1981b; Stein et al. 2002; Wallace and Stein 1997, 2000). The observation that this ontogenetic process is so gradual was taken to suggest that this period is one in which experience plays a substantial role in guiding the maturation of multisensory integration. One possibility considered was that the brain is learning to expect that certain physical properties of cues from different senses are linked to common events, specifically their timing and location. This would provide the brain with a way of crafting the principles that govern multisensory integration to adapt to the environment in which it will be used. To examine this possibility, animals were reared without the opportunity to obtain experience with visual and nonvisual cues (i.e.,
289
The Organization and Plasticity of Multisensory Integration in the Midbrain (a) 70
% Multisensory neurons
60 50 40
s Po
tn
5
30
16-20 13-15 eeks) e (w 9-1011-12 g a l 8 ata 6
adult
7
4 3
20
2
10
Unimodal Multisensory
1
0 0
10
15
(c)
500 400
Somatosensory Auditory Visual
300 200
adult
250
Somatosensory Auditory Visual
200 150 100 50
100 0
20
Postnatal age (weeks)
Mean latency (ms)
Receptive field size (% of adult value)
(b)
5
0
5
10
15
20
Postnatal age (weeks)
adult
0
0
5
10
Postnatal age (weeks)
15
adult
FIGURE 15.8 Developmental chronology of SC multisensory neurons. (a) Percentage of multisensory neurons as a proportion of sensory-responsive neurons in deep SC is shown as a function of postnatal age. Each closed circle represents a single age, and increasing proportion of such neurons is also shown on pie charts. (b) Rapid decrease in size of different receptive fields (as a percentage of mean adult value) of multisensory neurons is shown as a function of postnatal age. (c) Decrease in response latencies of multisensory neurons to each modality-specific stimulus is shown as a function of postnatal age. (From Wallace, M.T., and Stein, B.E., Journal of Neuroscience, 17, 2429–44, 1997. With permission.)
in darkness), and also in situations in which the spatial cues associated with common events were perturbed. The first experimental condition tests the notion that in the absence of such experience, multisensory integration would not develop, and the second tests the possibility that the specific features of experience guide the formation of the principles governing multisensory integration.
15.4.1 Impact of Developing in Absence of Visual–Nonvisual Experience In this experimental series, animals were reared in darkness until they were 6 months of age, a time at which most of the physiological properties of SC neurons appear mature, or near-mature. These animals developed a near-normal set of visual, auditory, and somatosensory neurons that were highly responsive to natural physiological stimuli (Wallace et al. 2001, 2004a). That these neurons were atypical, however, was indicated by their abnormally large receptive fields, receptive fields that
290
(b) Disparity-reared
V
A
Stimuli coincident in ARF
V
A
Stimuli disparate
V
A
No change +16% V
A VA
No change –3% V
A VA * +144%
0 0
Stimuli coincident in VRF
75 150 225 % Change (MSI)
A VA
0
V
75 150 225 % Change (MSI)
Mean impulses/trial 0 5 10 15 Mean impulses/trial 0 5 10 15 Mean impulses/trial 0 5 10 15 Mean impulses/trial 0 5 10 15
No change +8%
0
Impulses 0 5 10 15 20 25 Impulses 0 5 10 15 20 25
A
VA
Impulses 0 5 10 15 20 25
V
A
Impulses 0 5 10 15 20 25
V
75 150 225 % Change (MSI)
(a) Dark-reared
75 150 225 % Change (MSI)
The Neural Bases of Multisensory Processes
V
A VA
FIGURE 15.9 Early experience influences receptive field and response properties of SC multisensory neurons. Impact of dark rearing (a) and disparity rearing (b) on properties of adult multisensory neurons are shown using two exemplar neurons. Rearing in absence of visual experience was characterized by large visual and auditory receptive fields (a) that were more characteristic of neonates than adults. This neuron was typical of population of neurons from dark-reared animals. It was responsive to visual and auditory stimuli, but its inexperience with visual–auditory stimuli was evident in its lack of ability to integrate those cross-modal stimuli to producing an enhanced response. Responses from neuron depicted in panel (b) were characteristic of those affected by a rearing environment in which visual and auditory stimuli were always spatially disparate. Its visual and auditory receptive fields did not develop normal spatial register, but were completely out of alignment. It was also incapable of “normal” multisensory integration as indicated by absence of enhanced responses to spatiotemporally aligned cross-modal stimuli (B1 and B2). Nevertheless, it did show multisensory enhancement to spatially disparate stimuli (B3), revealing that its multisensory integrative properties had been crafted to adapt them to presumptive environment in which they would be used. (Adapted from Wallace, M.T. et al., Journal of Neuroscience, 24, 9580–4, 2004a; Wallace, M.T. et al., Proceedings of the National Academy of Sciences of the United States of America, 101, 2167–72, 2004b; Wallace, M.T., and Stein, B.E., Journal of Neurophysiology, 97, 921–6, 2007.)
The Organization and Plasticity of Multisensory Integration in the Midbrain
291
were more characteristic of a neonate than of an adult animal. These neurons were also unable to integrate their multiple sensory inputs as evidenced by the absence of visual–auditory integration (Figure 15.9a). This too made them appear more like neonatal, or adults who have had association cortex removed, than like adult animals (Jiang et al. 2006). These observations are consistent with the idea that experience with cross-modal cues is necessary for integrating those cues.
15.4.2 Altering Early Experience with Cross-Modal Cues by Changing Their Spatial Relationships If early experience does indeed craft the principles governing multisensory integration, changes in those experiences should produce corresponding changes in those principles. Under normal circumstances, cross-modal events provide cues that have a high degree of spatial and temporal fidelity. In short, the different sensory cues come from the same event, so they come from about the same place at about the same time. Presumably, with extensive experience, the brain links stimuli from the two senses by their temporal and spatial relationships. In that way, similar concordances among cross-modal stimuli that are later encountered facilitate the detection, localization, and identification of those initiating events. Given those assumptions, any experimental changes in the physical relationships of the crossmodal stimuli that are experienced during early life should be reflected in adaptations in the principles governing multisensory integration. In short, they should be appropriate for that “atypical” environment and inappropriate for the normal environment. To examine this expectation, a group of cats was reared in a darkroom from birth to 6 months of age, and were periodically presented with visual and auditory cues that were simultaneous, but derived from different locations in space (Wallace and Stein 2007). This was accomplished by fixing speakers and light-emitting diodes to different locations on the wall of the cages. When SC neurons were then examined, many had developed visual–auditory responsiveness. Most of them looked similar to those found in animals reared in the dark. They had very large receptive fields, and were unable to integrate their visual–auditory inputs. The retention of these neonatal properties was not surprising in light of the fact that these stimuli presented in an otherwise dark room required no response, and were not associated with any consequence. However, there were a substantial number of SC neurons in these animals that did appear to reflect their visual–auditory experience. Their visual–auditory receptive fields had contracted as would be expected with sensory experience, but they had also developed poor alignment. A number of them had no overlap between them (see Figure 15.9b), a relationship almost never seen in animals reared in illuminated conditions or in animals reared in the dark. However, it did reflect their unique rearing condition. Most significant in the present context is that they could engage in multisensory integration. However, only when the cross-modal stimuli were disparate in space were they able to fall simultaneously in their respective visual and auditory receptive fields. In this case, the magnitude of the response to the cross-modal stimulus was significantly enhanced, just as in normally reared animals when presented with spatially aligned visual–auditory stimuli. Similarly, the cross-modal stimulus configurations that are spatially coincident fail to fall within the corresponding receptive fields of the neuron, and the result is to produce response depression or no integration (see Kadunce et al. 2001; Meredith and Stein 1996). These observations are consistent with the prediction above, and reveal that early experience with the simple temporal coincidence of the two cross-modal stimuli was sufficient for the brain to link them, and initiate multisensory integration.
15.4.3 Role of Cortical Inputs during Maturation The data from the above experiments did not reveal where in the multisensory SC circuitry these early sensory experiences were exerting their greatest effects. Nevertheless, the fact that the cortex is known to be highly dependent on early experience for its development made it a prime candidate
292
The Neural Bases of Multisensory Processes
for this role. To test this idea, Rowland and colleagues (Stein and Rowland 2007) reversibly deactivated both AES and rLS during the period (25–81 days postnatal) in which multisensory integration normally develops (see Wallace and Stein 1997), so that their neurons were unable to participate in these sensory experiences. This was accomplished by implanting a drug-infused polymer over these cortical areas. The polymer would gradually release its store of muscimol, a gamma-aminobutyric acid A (GABAa) receptor agonist that blocked neuronal activity. Once the stores of muscimol were depleted over many weeks, or the polymer was physically removed, these cortical areas would once again become active and responsive to external stimulation. As predicted, SC neurons in these animals were unable to integrate their visual and auditory inputs to enhance their responses. Rather, their responses were no greater to the cross-modal combination of stimuli than they were to the most effective of its component stimuli. Furthermore, comparable deficits were apparent in overt behavior. Animals were no better at localizing a cross-modal stimulus than they were at localizing the most effective of its individual component stimuli. Although these data do not prove the point, they do suggest that the cortical component of the SC multisensory circuit is a critical site for incorporating the early sensory experiences required for the development of SC multisensory integration.
15.4.4 Ontogeny of Multisensory Integration in Cortex The development of the cortex is believed to lag the development of the midbrain, and this principle would be expected to extend to the maturation of sensory response properties. Consequently, the inability of SC neurons in the neonatal cat brain to exhibit multisensory integration before 4 postnatal weeks suggests that the property would develop even later in the cortex. To evaluate this issue, multisensory neurons were studied in the developing AES. Although, as discussed above, neurons from the AES that project to the SC are unisensory, there are multisensory neurons scattered along the AES and concentrated at the borders between its three largely modality-specific zones. The visual–auditory neurons in this “SC independent” multisensory group were the target of this study. They, like their counterparts in the SC, share many fundamental characteristics of an integrated response, such as response enhancement and depression (Wallace et al. 1992), and significant alterations in their temporal response profile (Royal et al. 2009). Neurons in the AES can serve as a good maturational referent for the SC. As predicted, multisensory neurons in the neonatal AES were unable to integrate their visual and auditory inputs. They too developed their capacity for multisensory integration only gradually, and did so within a time window that began and ended later in ontogeny than does the time window for SC neurons (Wallace et al. 2006). The data not only support the contention that cortical sensory processes lag those of the midbrain during development, but also raise the possibility that, just as in the SC, experience with visual and auditory stimuli in cross-modal configurations is required for the maturation of multisensory integration. The likelihood of this possibility was strengthened using the same rearing strategy as discussed earlier. Animals were raised in the dark to preclude visual–nonvisual experience. As a result, AES neurons failed to develop the capacity to integrate their visual and auditory inputs. Once again, this rearing condition did not impair the development of visually-responsive, auditory-responsive, and even visual–auditory neurons. They were common. The rearing condition simply impaired AES multisensory neurons from developing an ability to use these inputs synergistically (Carriere et al. 2007).
15.4.5 Ontogeny of SC Multisensory Integration in a Primate The multisensory properties of SC neurons discussed above are not unique to the cat. Although their incidence is somewhat lower, multisensory neurons in the rhesus monkey SC have properties very similar to those described above (Wallace et al. 1996). They have multiple, overlapping receptive fields and show multisensory enhancement and multisensory depression, respectively, to
293
The Organization and Plasticity of Multisensory Integration in the Midbrain
spatially aligned and spatially disparate cross-modal stimuli. Although there may seem to be no a priori reason to assume that their maturation would depend on different factors than those of the cat, the monkey, unlike the cat, is a precocial species. Its SC neurons have comparatively more time to develop in utero than do those of the cat. Of course, they also have to do so in the dark, making one wonder if the late in utero visual-free experiences of the monkey have some similarity to the visual-free environment of the dark-reared cat. Wallace and Stein (2001) examined the multisensory properties of the newborn monkey SC and found that, unlike the SC of the newborn cat, there were already multisensory neurons present (Wallace et al. 2001; Figure 15.10). However, as in the cat SC, these multisensory neurons were unable to integrate visual–nonvisual inputs. Their responses to combinations of coincident visual and auditory or somatosensory cues were no better than were their responses to the most effective of these component stimuli individually. Although there is no data regarding when they develop this capacity, and whether dark-rearing would preclude its appearance, it seems highly likely that the monkey shares the same developmental antecedents for the maturation of multisensory integration as the cat. Recent reports in humans suggest that this may be a general mammalian plan. People who have experienced early visual deprivation due to dense congenital cataracts were examined many years after surgery to remove those cataracts. The observations are consistent with predictions that would be made from the animal studies. Specifically, their vision appeared to be normal, but their ability to integrate visual–nonvisual information was significantly less well developed than in normal subjects. This ability was compromised in a variety of tasks including those that involved speech and those that did not (Putzar et al. 2007). Whether neurons in the human SC, like those in the SC of cat and monkey, are incapable of multisensory integration is not yet known. However, human infants do poorly on tasks requiring the integration of visual and auditory information to localize events before 8 months of age (Neil
Multisensory neurons (14.7%)
VAS (1%)
AS (1%)
VA
9.3%
11.1%
VS VA
5.3 %
Adult
Multisensory neurons (28%) AS (0.9%) VAS VS 6.5%
7.4 %
37.0%
17.6%
Somatosensory
Visual
17.6%
Auditory
Modality-specific neurons (72%)
Somatosensory
49.5 %
23.2 %
Visual
12.6 %
Newborn
Auditory
Modality-specific neurons (85.3%)
FIGURE 15.10 Modality convergence patterns in SC of newborn and adult (inset) monkey. Pie charts show distributions of all recorded sensory-responsive neurons in multisensory laminas (IV–VII) of SC. (From Wallace, M.T., and Stein, B.E., Journal of Neuroscience, 21, 8886–94, 2001. With permission.)
294
The Neural Bases of Multisensory Processes
et al. 2006), and do poorly on tasks requiring the integration of visual and haptic information before 8 years of age (Gori et al. 2008). These data indicate that multisensory capabilities develop over far longer periods in the human brain than in the cat brain, an observation consistent with the long period of postnatal life devoted to human brain maturation. These observations, coupled with those indicating that early sensory deprivation has a negative effect on multisensory integration even far later in life suggests that early experience with cross-modal cues is essential for normal multisensory development in all higher-order species. If so, we can only wonder how well the human brain can adapt its multisensory capabilities to the introduction of visual or auditory input later in life via prosthetic devices. Many people who had congenital hearing impairments, and later received cochlear implants, have shown remarkable accommodation to them. They learn to use their newly found auditory capabilities with far greater precision than one might have imagined when they were first introduced. Nevertheless, it is not yet known whether they can use them in concert with other sensory systems. Although the population of people with retinal implants is much smaller, there are very encouraging reports among them as well. However, the same questions apply: Are they able to acquire the ability to engage in some forms of multisensory integration after experience with visual–auditory cues later in life and, if so, how much experience and what kinds of experiences are necessary for them to develop this capability? These issues remain to be determined.
ACKNOWLEDGMENTS The research described here was supported in part by NIH grants NS36916 and EY016716.
REFERENCES Alvarado, J.C., T.R. Stanford, J.W. Vaughan, and B.E. Stein. 2007a. Cortex mediates multisensory but not unisensory integration in superior colliculus. Journal of Neuroscience 27:12775–86. Alvarado, J.C., J.W. Vaughan, T.R. Stanford, and B.E. Stein. 2007b. Multisensory versus unisensory integration: Contrasting modes in the superior colliculus. Journal of Neurophysiology 97:3193–205. Barth, D.S., and B. Brett-Green. 2004. Multisensory-Evoked Potentials in Rat Cortex. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 357–70. Cambridge, MA: MIT Press. Bernstein, L.E., J. Edward, T. Auer, and J.K. Moore. 2004. Audiovisual Speech Binding: Convergence or Association. In Handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 203–23. Cambridge, MA: MIT Press. Burnett, L.R., B.E. Stein, D. Chaponis, and M.T. Wallace. 2004. Superior colliculus lesions preferentially disrupt multisensory orientation. Neuroscience 124:535–47. Burnett, L.R., B.E. Stein, T.J. Perrault Jr., and M.T. Wallace. 2007. Excitotoxic lesions of the superior colliculus preferentially impact multisensory neurons and multisensory integration. Experimental Brain Research 179:325–38. Busse, L., K.C. Roberts, R.E. Crist, D.H. Weissman, and M.G. Woldorff. 2005. The spread of attention across modalities and space in a multisensory object. Proceedings of the National Academy of Sciences of the United States of America 102:18751–6. Calvert, G., C. Spence, and B.E. Stein. 2004a. The handbook of multisensory processes. Cambridge, MA: MIT Press. Calvert, G. A., and J. Lewis, W. 2004b. Hemodynamic Studies of Audiovisual Interactions. In The Handbook of Multisensory Processes, ed. G. A. Calvert, C. Spence, and B.E. Stein, 483–502. Cambridge, MA: MIT Press. Carriere, B.N., D.W. Royal, T.J. Perrault et al. 2007. Visual deprivation alters the development of cortical multisensory integration. Journal of Neurophysiology 98:2858–67. Corneil, B.D., and D.P. Munoz. 1996. The influence of auditory and visual distractors on human orienting gaze shifts. Journal of Neuroscience 16:8193–207. DeGelder, B., J. Vroomen, and G. Pourtois. 2004. Multisensory Perception of Emotion, Its Time Course, and Its Neural Basis. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 581–96. Cambridge, MA: MIT Press. Edwards, S.B., C.L. Ginsburgh, C.K. Henkel, and B.E. Stein. 1979. Sources of subcortical projections to the superior colliculus in the cat. Journal of Comparative Neurology 184:309–29.
The Organization and Plasticity of Multisensory Integration in the Midbrain
295
Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415:429–33. Fort, A., and M.-H. Giard. 2004. Multiple Electrophysiological Mechanisms of Audiovisual Integration in Human Perception. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 503–13. Cambridge, MA: MIT Press. Frens, M.A., and A.J. Van Opstal. 1995a. A quantitative study of auditory-evoked saccadic eye movements in two dimensions. Experimental Brain Research 107:103–17. Frens, M.A., A.J. Van Opstal, and R.F. Van der Willigen. 1995b. Spatial and temporal factors determine auditoryvisual interactions in human saccadic eye movements. Perception & Psychophysics 57:802–16. Fuentes-Santamaria, V., J.C., Alvarado, B.E., Stein, and J.G. McHaffie. 2008. Cortex contacts both output neurons and nitrergic interneurons in the superior colliculus: Direct and indirect routes for multisensory integration. Cerebral Cortex 18:1640–52. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10:278–285. Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12. Gingras, G., B.A. Rowland, and B.E. Stein. 2009. The differing impact of multisensory and unisensory integration on behavior. Journal of Neuroscience 29:4897–902. Gondan, M., B., Niederhaus, F. Rosler, and B. Roder. 2005. Multisensory processing in the redundant-target effect: A behavioral and event-related potential study. Perception & Psychophysics 67:713–26. Gori, M., M. Del Viva, G. Sandini, and D.C. Burr. 2008. Young children do not integrate visual and haptic form information. Current Biology 18:694–8. Grant, A.C., M.C. Thiagarajah, and K. Sathian. 2000. Tactile perception in blind Braille readers: A psychophysical study of acuity and hyperacuity using gratings and dot patterns. Perception & Psychophysics 62:301–12. Grantyn, A., and R. Grantyn. 1982. Axonal patterns and sites of termination of cat superior colliculus neurons projecting in the tecto-bulbo-spinal tract. Experimental Brain Research 46:243–56. Groh, J.M., and D.L. Sparks. 1996a. Saccades to somatosensory targets: II. Motor convergence in primate superior colliculus. Journal of Neurophysiology 75:428–38. Groh, J.M., and D.L. Sparks. 1996b. Saccades to somatosensory targets: III. Eye-position-dependent somatosensory activity in primate superior colliculus. Journal of Neurophysiology 75:439–53. Guitton, D., and D.P. Munoz. 1991. Control of orienting gaze shifts by the tectoreticulospinal system in the head-free cat: I. Identification, localization, and effects of behavior on sensory responses. Journal of Neurophysiology 66:1605–23. Gutfreund, Y., and E.I. Knudsen. 2004. Visual Instruction of the Auditory Space Map in the Midbrain. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence and B.E. Stein, 613–24. Cambridge, MA: MIT Press. Harris, L.R. 1980. The superior colliculus and movements of the head and eyes in cats. Journal of Physiology 300:367–91. Huerta, M.F., and J.K. Harting. 1984. The mammalian superior colliculus: Studies of its morphology and connections. In Comparative neurology of the optic tectum, ed. H. Vanegas, 687–773. New York: Plenum Publishing Corporation. Hughes, H.C., P.A. Reuter-Lorenz, G. Nozawa, and R. Fendrich. 1994. Visual–auditory interactions in sensorimotor processing: Saccades versus manual responses. Journal of Experimental Psychology. Human Perception and Performance 20:131–53. Jay, M.F., and D.L. Sparks. 1984. Auditory receptive fields in primate superior colliculus shift with changes in eye position. Nature 309:345–7. Jay, M.F., and D.L. Sparks. 1987a. Sensorimotor integration in the primate superior colliculus: I. Motor convergence. Journal of Neurophysiology 57:22–34. Jay, M.F., and D.L. Sparks. 1987b. Sensorimotor integration in the primate superior colliculus: II. Coordinates of auditory signals. Journal of Neurophysiology 57:35–55. Jiang, W., and B.E. Stein. 2003. Cortex controls multisensory depression in superior colliculus. Journal of Neurophysiology 90:2123–35. Jiang, W., M.T. Wallace, H. Jiang, J.W. Vaughan, and B.E. Stein. 2001. Two cortical areas mediate multisensory integration in superior colliculus neurons. Journal of Neurophysiology 85:506–22. Jiang, W., H. Jiang, and B.E. Stein. 2002. Two corticotectal areas facilitate multisensory orientation behavior. Journal of Cognitive Neuroscience 14:1240–55. Jiang, H., B.E. Stein, and J.G. McHaffie. 2003. Opposing basal ganglia processes shape midbrain visuomotor activity bilaterally. Nature 423:982–6.
296
The Neural Bases of Multisensory Processes
Jiang, W., H. Jiang, B.A. Rowland, and B.E. Stein. 2007. Multisensory orientation behavior is disrupted by neonatal cortical ablation. Journal of Neurophysiology 97:557–62. Jiang, W., H. Jiang, and B.E. Stein. 2006. Neonatal cortical ablation disrupts multisensory development in superior colliculus. Journal of Neurophysiology 95:1380–96. Kadunce, D.C., J.W. Vaughan, M.T. Wallace, and B.E. Stein. 2001. The influence of visual and auditory receptive field organization on multisensory integration in the superior colliculus. Experimental Brain Research 139:303–10. Kao, C.Q., B.E. Stein, and D.A. Coulter. 1994. Postnatal development of excitatory synaptic function in deep layers of SC. Society of Neuroscience Abstracts. King, A.J., T.P. Doubell, and I. Skaliora. 2004. Epigenetic factors that align visual and auditory maps in the ferret midbrain. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 599–612. Cambridge, MA: MIT Press. King, A.J., and A.R. Palmer. 1985. Integration of visual and auditory information in bimodal neurones in the guinea-pig superior colliculus. Experimental Brain Research. 60:492–500. King, A.J., J.W. Schnupp, S. Carlile, A.L. Smith, and I.D. Thompson. 1996. The development of topographically-aligned maps of visual and auditory space in the superior colliculus. Progress in Brain Research 112:335–50. Lakatos, P., C.M. Chen, M.N. O’Connell, A. Mills, and C.E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–92. Larson, M.A., and B.E. Stein. 1984. The use of tactile and olfactory cues in neonatal orientation and localization of the nipple. Developmental Psychobiology 17:423–36. Laurienti, P.J., J.H. Burdette, M.T. Wallace et al. 2002. Deactivation of sensory-specific cortex by cross-modal stimuli. Journal of Cognitive Neuroscience 14:420–9. Leo, F., N. Bolognini, C. Passamonti, B.E. Stein, and E. Ladavas. 2008. Cross-modal localization in hemianopia: New insights on multisensory integration. Brain 131: 855–65. Liotti, M., K. Ryder, and M.G. Woldorff. 1998. Auditory attention in the congenitally blind: Where, when and what gets reorganized? Neuroreport 9:1007–12. Lippert, M., N.K. Logothetis, and C. Kayser. 2007. Improvement of visual contrast detection by a simultaneous sound. Brain Research 1173:102–9. Lovelace, C.T., B.E. Stein, and M.T. Wallace. 2003. An irrelevant light enhances auditory detection in humans: A psychophysical analysis of multisensory integration in stimulus detection. Cognitive Brain Research 17:447–453. Macaluso, E., and J. Driver. 2004. Functional imaging evidence for multisensory spatial representations and cross-modal attentional interactions in the human brain. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 529–48. Cambridge, MA: MIT Press. Marks, L.E. 2004. Cross-modal interactions in speeded classification. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 85–106. Cambridge, MA: MIT Press. Massaro, D.W. 2004. From multisensory integration to talking heads and language learning. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 153–76. Cambridge, MA: MIT Press. Meredith, M.A., and B.E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221:389–91. Meredith, M.A., and B.E. Stein. 1986a. Spatial factors determine the activity of multisensory neurons in cat superior colliculus. Brain Research 365:350–4. Meredith, M.A., and B.E. Stein. 1986b. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56:640–62. Meredith, M.A., and B.E. Stein. 1990. The visuotopic component of the multisensory map in the deep laminae of the cat superior colliculus. Journal of Neuroscience 10:3727–42. Meredith, M.A., and B.E. Stein. 1996. Spatial determinants of multisensory integration in cat superior colliculus neurons. Journal of Neurophysiology 75:1843–57. Meredith, M.A., H.R. Clemo, and B.E. Stein. 1991. Somatotopic component of the multisensory map in the deep laminae of the cat superior colliculus. Journal of Comparative Neurology 312:353–70. Meredith, M.A., M.T. Wallace, and B.E. Stein. 1992. Visual, auditory and somatosensory convergence in output neurons of the cat superior colliculus: Multisensory properties of the tecto-reticulo-spinal projection. Experimental Brain Research 88:181–6. Middlebrooks, J.C., and E.I. Knudsen. 1984. A neural code for auditory space in the cat’s superior colliculus. Journal of Neuroscience 4:2621–34. Miller, J. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology 14:247–79.
The Organization and Plasticity of Multisensory Integration in the Midbrain
297
Morgan, M.L., G.C. Deangelis, and D.E. Angelaki. 2008. Multisensory integration in macaque visual cortex depends on cue reliability. Neuron 59:662–73. Moschovakis, A.K., and A.B. Karabelas. 1985. Observations on the somatodendritic morphology and axonal trajectory of intracellularly HRP-labeled efferent neurons located in the deeper layers of the superior colliculus of the cat. Journal of Comparative Neurology 239:276–308. Munoz, D.P., and R.H. Wurtz. 1993a. Fixation cells in monkey superior colliculus. I. Characteristics of cell discharge. Journal of Neurophysiology 70:559–75. Munoz, D.P., and R.H. Wurtz. 1993b. Fixation cells in monkey superior colliculus: II. Reversible activation and deactivation. Journal of Neurophysiology 70:576–89. Neil, P.A., C. Chee-Ruiter, C. Scheier, D.J. Lewkowicz, and S. Shimojo. 2006. Development of multisensory spatial integration and perception in humans. Developmental Science 9:454–64. Newell, F.N. 2004. Cross-modal object recognition. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 123–39: Cambridge, MA: MIT Press. Partan, S.R. 2004. Multisensory animal communication. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 225–40. Cambridge, MA: MIT Press. Peck, C.K. 1987a. Saccade-related burst neurons in cat superior colliculus. Brain Research 408:329–33. Peck, C.K. 1987b. Visual–auditory interactions in cat superior colliculus: Their role in the control of gaze. Brain Research 420:162–6. Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2003. Neuron-specific response characteristics predict the magnitude of multisensory integration. Journal of Neurophysiology 90:4022–6. Perrault Jr., T.J., J.W. Vaughan, B.E. Stein, and M.T. Wallace. 2005. Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. Journal of Neurophysiology 93:2575–86. Putzar, L., I. Goerendt, K. Lange, F. Rosler, and B. Roder. 2007. Early visual deprivation impairs multisensory interactions in humans. Nature Neuroscience 10:1243–5. Recanzone, G.H. 1998. Rapidly induced auditory plasticity: The ventriloquism aftereffect. Proceedings of the National Academy of Sciences of the United States of America 95:869–75. Romanski, L.M. 2007. Representation and integration of auditory and visual stimuli in the primate ventral lateral prefrontal cortex. Cerebral Cortex 17(Suppl 1):i61–9. Rowland, B.A., and B.E. Stein. 2007. Multisensory integration produces an initial response enhancement. Frontiers in Integrative Neuroscience 1:4. Rowland, B.A., and B.E. Stein. 2008. Temporal profiles of response enhancement in multisensory integration. Frontiers in Neuroscience 2:218–24. Rowland, B.A., S. Quessy, T.R. Stanford, and B.E. Stein. 2007a. Multisensory integration shortens physiological response latencies. Journal of Neuroscience 27:5879–84. Rowland, B.A., T.R. Stanford, and B.E. Stein. 2007b. A model of the neural mechanisms underlying multisensory integration in the superior colliculus. Perception 36:1431–43. Royal, D.W., B.N. Carriere, and M.T. Wallace. 2009. Spatiotemporal architecture of cortical receptive fields and its impact on multisensory interactions. Experimental Brain Research 198:127–36. Sathian, K. 2000. Practice makes perfect: Sharper tactile perception in the blind. Neurology 54:2203–4. Sathian, K. 2005. Visual cortical activity during tactile perception in the sighted and the visually deprived. Developmental Psychobiology 46:279–86. Sathian, K., S.C. Prather, and M. Zhang. 2004. Visual cortical involvement in normal tactile perception. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 703–9. Cambridge, MA: MIT Press. Schroeder, C.E., and J.J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Research. Cognitive Brain Research 14:187–98. Schroeder, C. E., and J.J. Foxe. 2004. Multisensory convergence in early cortical processing. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 295–309. Cambridge, MA: MIT Press. Schroeder, C.E., R.W. Lindsley, C. Specht et al. 2001. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85:1322–7. Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, and M.G. Woldorff. 2007. Good times for multisensory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations. Neuropsychologia 45:561–71. Shams, L., Y. Kamitani, and S. Shimojo. 2004. Modulations of visual perception by sound. In The handbook of multisensoty processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 27–33. Cambridge, MA: MIT Press. Sinnett, S., S. Soto-Faraco, and C. Spence. 2008. The co-occurrence of multisensory competition and facilitation. Acta Psychologica 128:153–61.
298
The Neural Bases of Multisensory Processes
Sparks, D.L. 1986. Translation of sensory signals into commands for control of saccadic eye movements: Role of primate superior colliculus. Physiological Reviews 66:118–71. Sparks, D.L., and J.S. Nelson. 1987. Sensory and motor maps in the mammalian superior colliculus. Trends in Neuroscience 10:312–7. Spence, C., and J. Driver. 2004. Crossmodal space and crossmodal attention. Oxford: Oxford Univ. Press. Stanford, T.R., and B.E. Stein. 2007. Superadditivity in multisensory integration: Putting the computation in context. Neuroreport 18:787–92. Stanford, T.R., S. Quessy, and B.E. Stein. 2005. Evaluating the operations underlying multisensory integration in the cat superior colliculus. Journal of Neuroscience 25:6499–508. Stein, B.E. 1984. Development of the superior colliculus. Annual Review of Neuroscience 7:95–125. Stein, B.E., and M.O. Arigbede. 1972. Unimodal and multimodal response properties of neurons in the cat’s superior colliculus. Experimental Neurology 36:179–96. Stein, B.E., and H.P. Clamann. 1981. Control of pinna movements and sensorimotor register in cat superior colliculus. Brain, Behavior and Evolution 19:180–92. Stein, B.E., and H.L. Gallagher. 1981. Maturation of cortical control over superior colliculus cells in cat. Brain Research 223:429–35. Stein, B.E., and M.A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press. Stein, B.E., and B.A. Rowland. 2007. The critical role of cortico-collicular interactions in the development of multisensory integration. Paper presented at the Society for Neuroscience. Stein, B.E., E. Labos, and L. Kruger. 1973. Sequence of changes in properties of neurons of superior colliculus of the kitten during maturation. Journal of Neurophysiology 36:667–79. Stein, B.E., B. Magalhaes-Castro, and L. Kruger. 1976. Relationship between visual and tactile representations in cat superior colliculus. Journal of Neurophysiology 39:401–19. Stein, B.E., R.F. Spencer, and S.B. Edwards. 1984. Efferent projections of the neonatal cat superior colliculus: Facial and cerebellum-related brainstem structures. Journal of Comparative Neurology 230:47–54. Stein, B.E., M.A. Meredith, W.S. Huneycutt, and L. McDade. 1989. Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience 1:12–24. Stein, B.E., M.A. Meredith, and M.T. Wallace. 1993. The visually responsive neuron and beyond: Multisensory integration in cat and monkey. Progress in Brain Research 95:79–90. Stein, B.E., M.W. Wallace, T.R. Stanford, and W. Jiang. 2002. Cortex governs multisensory integration in the midbrain. Neuroscientist 8:306–14. Sugihara, T., M.D. Diltz, B.B. Averbeck, and L.M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26:11138–47. Sumby, W.H., and I. Pollack. 1954. Visual contribution to speech intelligibillity in noise. Journal of the Acoustical Society of America 26:212–5. Talsma, D., T.J. Doty, R. Strowd, and M.G. Woldorff. 2006. Attentional capacity for processing concurrent stimuli is larger across sensory modalities than within a modality. Psychophysiology 43:541–9. Talsma, D., T.J. Doty, and M.G. Woldorff. 2007. Selective attention and audiovisual integration: Is attending to both modalities a prerequisite for early integration? Cerebral Cortex 17:679–90. Wallace, M.T. 2004. The development of multisensory integration. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 625–42. Cambridge, MA: MIT Press. Wallace, M.T., and B.E. Stein. 1994. Cross-modal synthesis in the midbrain depends on input from cortex. Journal of Neurophysiology 71:429–32. Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience 17:2429–44. Wallace, M.T., and B.E. Stein. 2000. Onset of cross-modal synthesis in the neonatal superior colliculus is gated by the development of cortical influences. Journal of Neurophysiology 83:3578–82. Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior colliculus. Journal of Neuroscience 21:8886–94. Wallace, M.T., and B.E. Stein. 2007. Early experience determines how the senses will interact. Journal of Neurophysiology 97:921–6. Wallace, M.T., M.A. Meredith, and B.E. Stein. 1992. Integration of multiple sensory modalities in cat cortex. Experimental Brain Research 91:484–8. Wallace, M.T., M.A. Meredith, and B.E. Stein. 1993. Converging influences from visual, auditory, and somatosensory cortices onto output neurons of the superior colliculus. Journal of Neurophysiology 69:1797–809.
The Organization and Plasticity of Multisensory Integration in the Midbrain
299
Wallace, M.T., L.K. Wilkinson, and B.E. Stein. 1996. Representation and integration of multiple sensory inputs in primate superior colliculus. Journal of Neurophysiology 76:1246–66. Wallace, M.T., M.A. Meredith, and B.E. Stein. 1998. Multisensory integration in the superior colliculus of the alert cat. Journal of Neurophysiology 80:1006–10. Wallace, M.T., W.D. Hairston, and B.E. Stein. 2001. Long-term effects of dark-rearing on multisensory processing. Paper presented at the Society for Neuroscience. Wallace, M.T., T.J. Perrault Jr., W.D., Hairston, and B.E. Stein. 2004a. Visual experience is necessary for the development of multisensory integration. Journal of Neuroscience 24:9580–4. Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004b. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences of the United States of America 101:2167–72. Wallace, M.T., B.N. Carriere, T.J. Perrault Jr., J.W. Vaughan, and B.E. Stein. 2006. The development of cortical multisensory integration. Journal of Neuroscience 26:11844–9. Weisser, V., R. Stilla, S. Peltier, X. Hu, and K. Sathian. 2005. Short-term visual deprivation alters neural processing of tactile form. Experimental Brain Research 166:572–82. Wilkinson, L.K., M.A. Meredith, and B.E. Stein. 1996. The role of anterior ectosylvian cortex in cross-modality orientation and approach behavior. Experimental Brain Research 112:1–10. Woldorff, M.G., C.J. Hazlett, H.M. Fichtenholtz et al. 2004. Functional parcellation of attentional control regions of the brain. Journal of Cognitive Neuroscience 16:149–65. Woods, T.M., and G.H. Recanzone. 2004a. Cross-modal interactions evidenced by the ventriloquism effect in humans and monkeys. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 35–48. Cambridge, MA: MIT Press. Woods, T.M., and G.H. Recanzone. 2004b. Visually induced plasticity of auditory spatial perception in macaques. Current Biology 14:1559–64. Wurtz, R.H., and J.E. Albano. 1980. Visual–motor function of the primate superior colliculus. Annual Review of Neuroscience 3:189–226. Wurtz, R.H., and M.E. Goldberg. 1971. Superior colliculus cell responses related to eye movements in awake monkeys. Science 171:82–4. Zangaladze, A., C.M. Epstein, S.T. Grafton, and K. Sathian. 1999. Involvement of visual cortex in tactile discrimination of orientation. Nature 401:587–90.
16
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony on Interaction Dynamics between Primary Auditory and Primary Visual Cortex Antje Fillbrandt and Frank W. Ohl
CONTENTS 16.1 Introduction...........................................................................................................................302 16.1.1 Speed of Signal Transmission Is Modality-Specific.................................................. 303 16.1.2 Simultaneity Constancy............................................................................................. 303 16.1.3 Temporal Recalibration............................................................................................. 303 16.1.4 Mechanisms of Temporal Recalibration....................................................................304 16.1.4.1 Are There Any Indications for Recalibration at Early Levels of Stimulus Processing?..................................................................................304 16.1.4.2 To What Extent Does Temporal Recalibration Need Attentional Resources?..................................................................................................304 16.1.4.3 Is Recalibration Stimulus-Specific?............................................................ 305 16.1.4.4 Is Recalibration Modality-Specific?........................................................... 305 16.1.4.5 Does Recalibration Occur at Decision Level?............................................ 305 16.1.5 Outlook on Experiments............................................................................................ 305 16.2 Methods.................................................................................................................................306 16.2.1 Animals.....................................................................................................................306 16.2.2 Electrodes..................................................................................................................306 16.2.3 Animal Preparation and Recording...........................................................................306 16.2.4 Stimuli.......................................................................................................................306 16.2.5 Experimental Protocol...............................................................................................306 16.2.6 Data Preprocessing....................................................................................................307 16.2.7 DTF: Mathematical Definition.................................................................................. 307 16.2.8 Estimation of Autoregressive Models........................................................................ 308 16.2.9 Normalization of DTF...............................................................................................309 16.2.10 Statistical Testing.....................................................................................................309
301
302
The Neural Bases of Multisensory Processes
16.3 Results.................................................................................................................................... 310 16.3.1 Stimulus-Induced Changes in Single-Trial nDTF, Averaged across All Trials from All Sessions....................................................................................................... 310 16.3.1.1 Animals Receiving Light Followed by Tone Stimulus (VA-Animals)....... 311 16.3.1.2 Animals Receiving Tone Followed by Light Stimulus (AV-Animals)........ 312 16.3.2 Development of Amplitude of nDTFA→V and nDTFV→A within Sessions.................. 313 16.3.2.1 VA-Animals................................................................................................ 313 16.3.2.2 AV-Animals................................................................................................ 313 16.3.3 Development of the Amplitude of nDTFA→V and nDTFV→A across Sessions............ 314 16.4 Discussion.............................................................................................................................. 316 16.4.1 Interpretation of DTF-Amplitudes............................................................................ 316 16.4.2 Development of nDTF-Amplitude within Sessions................................................... 317 16.4.3 Audiovisual Stimulus Association as a Potential Cause of Observed Changes in nDTF-Amplitudes...................................................................................................... 318 16.4.4 Changes in Lag Detection as a Potential Cause of Observed Changes in DTFAmplitudes................................................................................................................. 318 16.4.5 Mechanisms of Recalibration: Some Preliminary Restrictions................................ 318 16.4.5.1 Expectation and Lag Detection................................................................... 318 16.4.5.2 Processes after the Second Stimulus.......................................................... 319 16.4.5.3 Speed of Processing.................................................................................... 319 16.5 Conclusions............................................................................................................................ 319 References....................................................................................................................................... 320 Temporal congruity between auditory and visual stimuli has frequently been shown to be an important factor in audiovisual integration, but information about temporal congruity is blurred by the different speeds of transmission in the two sensory modalities. Compensating for the differences in transmission times is challenging for the brain because at each step of transmission, from the production of the signal to its arrival at higher cortical areas, the speed of transmission can be affected in various ways. One way to deal with this complexity could be that the compensation mechanisms remain plastic throughout its lifetime so that they can flexibly adapt to the typical transmission delays of new types of stimuli. Temporal recalibration to new values of stimulus asynchronies has been demonstrated in several behavioral studies. This study seeks to explore the potential mechanisms underlying such recalibration at the cortical level. Toward this aim, tone and light stimuli were presented repeatedly to awake, passively listening, Mongolian gerbils at the same constant lag. During stimulation, the local field potential was recorded from electrodes implanted into the auditory and visual cortices. The interaction dynamics between the auditory and visual cortices were examined using the directed transfer function (DTF; Kaminski and Blinowska 1991). With an increasing number of stimulus repetitions, the amplitude of the DTF showed characteristic changes at specific time points between and after the stimuli. Our findings support the view that repeated presentation of audiovisual stimuli at a constant delay alters the interactions between the auditory and visual cortices.
16.1 INTRODUCTION Listening to a concert is also enjoyable while watching the musicians play. Under normal circumstances, we are not confused by seeing the drumstick movement or the lip movement of the singer after hearing the beat and the vocals. When in our conscious experience of the world, the senses appear as having been united, this also seems to imply that stimulus processing in different modalities must have reached consciousness at about the same time.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
303
Apparently, the task to judge which stimuli have appeared simultaneously is quite challenging for the brain: during the past decade, an increasing number of studies have been published indicating that temporal perception remains plastic throughout the lifetime. These studies demonstrated that when stimuli from different sensory modalities are presented repeatedly at a small constant temporal onset asynchrony, after a while, their temporal disparity is perceived as being diminished in the conscious experience. This chapter describes the electrophysiological results of interaction processes between the auditory and visual cortices during constant asynchronous presentation of audiovisual stimuli in a rodent preparation designed to mimic relevant aspects of classic experiments in humans on the recalibration of temporal order judgment.
16.1.1 Speed of Signal Transmission Is Modality-Specific From the point in time a single event causes an auditory and a visual signal, to the point in time a certain brain area is activated by these signals, information about temporal congruity is blurred in various ways by the different speeds of transmission of the two signals. The first temporal disparities in signal propagation arise outside the brain from the different velocities of sound and light. At the receptor level, sound transduction in the ear is faster than phototransduction in the retina (see Fain 2003, for a detailed review). The minimum response latency for a bright flash, approximately 7 ms, is nearly the same in rods and cones (Cobbs and Pugh 1987; Hestrin and Korenbrot 1990; Robson et al. 2003). But with low light intensities, the rod-driven response might take as long as 300 ms (Baylor et al. 1984, 1987). In contrast, transduction by the hair cells of the inner ear is effectively instantaneous via direct mechanic linkage (~10 µs; Corey and Hudspeth 1979, 1983; Crawford and Fettiplace 1985; Crawford et al. 1991). Next, the duration of the transmission of auditory and visual signals depends on the length of the nerves used for their transmission (Von Békésy 1963; Harrar and Harris 2005). The relationship of transmission delays between sensory modalities is further complicated by the fact that, in each modality, processing speed seems to be modulated by detailed physical stimulus characteristics, such as stimulus intensity (Wilson and Anstis 1969) and visual eccentricity (Nickalls 1996; Kopinska and Harris 2004), as well as by subjective factors, such as attention (e.g., Posner et al. 1980).
16.1.2 Simultaneity Constancy The ability to perceive stimuli as simultaneous despite their different transmission delays has been termed simultaneity constancy (Kopinska and Harris 2004). Several studies demonstrated that human beings are able to compensate for temporal lags caused by variances in spatial distance (Engel and Dougherty 1971; Sugita and Suzuki 2003; Kopinska and Harris 2004; Alais and Carlile 2005). Interestingly, the compensation also worked when distance cues were presented only to a single modality. In the study by Sugita and Suzuki (2003), only visual distance cues were used. Alais and Carlile (2005) varied only cues for auditory distance perception. The question of which cues are essential to induce a lag compensation is still a matter of ongoing debate as there are also several studies that failed to find evidence for a similar perceptual compensation (Stone 2001; Lewald and Guski 2004; Arnold et al. 2005; Heron et al. 2007).
16.1.3 Temporal Recalibration Because the transmission delays of auditory and visual signals depend on many factors, they cannot be described by simple rules. One way to deal with this complexity could be that the compensation mechanisms remain plastic throughout its lifetime so that they can flexibly adapt to new sets of stimuli and their typical transmission delays.
304
The Neural Bases of Multisensory Processes
The existence of temporal recalibration to new stimuli has been demonstrated in several studies (Fujisaki et al. 2004; Vroomen et al. 2004; Navarra et al. 2005; Heron et al. 2007; Keetels and Vroomen 2007). In these studies, experimental paradigms typically start with an adaptation phase with auditory and visual stimuli being presented repeatedly over several minutes, and consistently at a slight onset asynchrony of about 0 to 250 ms. In a subsequent behavioral testing phase, auditory and visual stimuli are presented at various temporal delays and their perceived temporal distance is usually assessed by a simultaneity judgment task (subjects have to indicate whether the stimuli are simultaneous or not) or a temporal order judgment task (subjects have to indicate which of the stimuli they perceived first). Using these procedures, temporal recalibration could be demonstrated repeatedly: the average time one stimulus had to lead the other for the two to be judged as occurring simultaneously, the point of subjective simultaneity (PSS), was shifted in the direction of the lag used in the adaptation phase (Fujisaki et al. 2004; Vroomen et al. 2004). For example, if sound was presented before light in the adaptation phase, in the testing phase, the sound stimulus had to be presented earlier in time than before the adaptation to be regarded as having occurred simultaneously with the light stimulus. In addition, in several studies, an increase in the just notable difference (JND) was observed (the smallest temporal interval between two stimuli needed for the participants in a temporal order task to be able to judge correctly which of the stimuli was presented first in 75% of the trials; Fujisaki et al. 2004; Navarra et al. 2005).
16.1.4 Mechanisms of Temporal Recalibration The neural mechanisms underlying temporal recalibration have not yet been investigated in detail. In the following we will review current psychophysical data with respect to cognitive processes hypothetically involved in recalibration to develop first ideas about the neural levels at which recalibration might operate. The idea that temporal recalibration works on an early level of processing is quite attractive: more accurate temporal information is available at the early stages because the different processing delays of later stages have not yet been added. However, there are also reasons to believe that recal ibration works at later levels: recalibration effects are usually observed in the conscious perception. It is plausible to assume that the conscious percept is also shaped by the results of later processing stages. For recalibration to operate correctly, it should also compensate for delays of later processing stages. 16.1.4.1 Are There Any Indications for Recalibration at Early Levels of Stimulus Processing? There are indications that recalibration does not occur at the very periphery. Fujisaki et al. (2004) presented sound stimuli during the testing phase to a different ear than during the adaptation phase and found clear evidence for recalibration. They concluded that recalibration occurs at least at stages of processing where information from both ears had already been combined. To investigate the possible neuronal mechanism of temporal recalibration, the neuronal sites at which temporal onset asynchronies are represented might be of interest. There are indications that neurons are tuned to different onset asynchronies of multimodal stimuli at the level of the superior colliculus (Meredith et al. 1987). But in addition, there are also first findings of neural correlates of onset asynchrony detection at the cortical level (Bushara et al. 2001; Senkowski et al. 2007). 16.1.4.2 To What Extent Does Temporal Recalibration Need Attentional Resources? An increasing number of results indicate that processes of synchrony detection require attentional resources (Fujisaki and Nishida 2005, 2008; Fujisaki et al. 2006). Recalibration is often measured by a change in the perception of synchrony, but preliminary results suggest that the mechanisms
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
305
of recalibration and attention might be independent: Fujisaki and colleagues found no interaction between the shift in the PSS caused by attention and the shift in PSS caused by adaptation in a recalibration experiment (Fujisaki et al. 2004). 16.1.4.3 Is Recalibration Stimulus-Specific? Several studies demonstrated that the lag adaptation can easily generalize to stimuli not presented during the adaptation phase, suggesting that temporal recalibration occurs at a level processing abstracts from the details of the specific stimuli (Fujisaki et al. 2004; Navarra et al. 2005; Vatakis et al. 2007, 2008). 16.1.4.4 Is Recalibration Modality-Specific? Also fundamental for understanding the basics of recalibration is the question of whether it is a supramodal process. As in the conscious experience, the information from all senses usually appears to be temporally aligned, a hypothetical compensatory process should take into account the various temporal delays of all modalities. If there were separate compensatory mechanisms for all combinations of modality pairs, this might cause conflicts between the different compensatory mechanisms. Results from recalibration experiments invoking modality pairs other than the audiovisual have yielded variable results (Miyazaki et al. 2006; Navarra et al. 2006; Hanson et al. 2008; Harrar and Harris 2008). If there was a single compensatory mechanism, we should be able to observe a transfer of recal ibration across modality pairings. In the study of Harrar and Harris (2008), an exposure to visuo tactile asynchronous stimuli in the adaptation phase shifted the PSS when participants had to do an audiovisual temporal order judgment task, an adaptation to audiotactile asynchronous stimuli caused an increase in the JND in an audiovisual temporal order judgment task. However, the effects do not seem to be simple because, in this study, no recalibration effects were found when audiotactile and visuotactile pairings were used in the testing phase. 16.1.4.5 Does Recalibration Occur at Decision Level? Fujisaki et al. (2004) advanced the hypothesis that recalibration might occur as late as at the decision level. According to this hypothesis, the effect of recalibration could be explained by a change in the response bias in the temporal order task. Fujisaki tested this hypothesis by testing the perception of simultaneity of his participants indirectly by presenting an auditory-induced visual illusion. As the perception of this illusion changed after the lag adaptation phase, he concluded that recalibration does not occur at the response level.
16.1.5 Outlook on Experiments This short review on studies addressing questions about the mechanisms of recalibration makes it clear that it is still too early to deduce any precise hypothesis at which neural level recalibration might operate. In the current explorative study, we began by searching for neural mechanisms of recalibration at the level of the primary sensory cortex. In the past decade, the primary sensory cortices have repeatedly been demonstrated to be involved in multisensory interactions (e.g., Cahill et al. 1996; Brosch et al. 2005; Bizley et al. 2007; Kayser et al. 2008; Musacchia and Schroeder 2009). The experimental paradigm for rodents resembled the previously described human studies on temporal recalibration: auditory and visual stimuli were presented repeatedly at a constant intermodal temporal onset asynchrony of 200 ms. We implanted one electrode into the primary auditory cortex and one electrode into the visual cortex of Mongolian gerbils, and during stimulation, local field potentials were recorded in the awake animal. Our main question of interest was whether the interaction patterns between auditory
306
The Neural Bases of Multisensory Processes
and visual cortices change during the course of continuous asynchronous presentation of auditory and visual stimuli. There is accumulating evidence that the synchronization dynamics between brain areas might reflect their mode of interaction (Bressler 1995, 1996). We examined directional influences between auditory and visual cortices by analyzing the local field potential data using the DTF (Kaminski and Blinowska 1991).
16.2 METHODS 16.2.1 Animals Data were obtained from eight adult male Mongolian gerbils (Meriones unguiculatus). All animal experiments were surveyed and approved by the animal care committee of the Land SachsenAnhalt.
16.2.2 Electrodes Electrodes were made of stainless steel wire (diameter, 185 µm) and were deinsulated only at the tip. The tip of the reference electrodes was bent into a small loop (diameter, 0.6 mm). The impedance of the recording electrodes was 1.5 MΩ (at 1 kHz).
16.2.3 Animal Preparation and Recording Electrodes were chronically implanted under deep ketamine anesthesia (xylazine, 2 mg/100 g body weight, i.p.; ketamine, 20 mg/100 g body weight, i.p.). One recording electrode was inserted into the right primary auditory cortex and one into the right visual cortex, at depths of 300 µm, using a microstepper. Two reference electrodes were positioned onto the dura mater over the region of the parietal and the frontal cortex, electrically connected, and served as a common frontoparietal reference. After the operation, animals were allowed to recover for 1 week before the recording sessions began. During the measurements, the animal was allowed to move freely in the recording box (20 × 30 cm). The measured local field potentials from auditory and visual cortices were digitized at a rate of 1000 Hz.
16.2.4 Stimuli Auditory and visual stimuli were presented at a constant intermodal stimulus onset asynchrony of 200 ms. The duration of both the auditory and the visual stimuli was 50 ms and the intertrial interval varied randomly between 1 and 2 s with a rectangular distribution of intervals in that range. Acoustic stimuli were tones presented from a loudspeaker located 30 cm above the animal. The tone frequency was chosen for each individual animal to match the frequency that evoked, in preparatory experiments, the strongest amplitude of local field potential at the recording site within the tonotopic map of primary auditory cortex (Ohl et al. 2000, 2001). The frequencies used ranged from 250 Hz to 4 kHz with the peak level of the tone stimuli varying between 60 dB (low frequencies) and 48 dB (high frequencies), measured by a Bruel und Kjaer sound level meter type). Visual stimuli were flashes presented from an LED lamp (9.6 cd/m2) located at the height of the eyes of the animal.
16.2.5 Experimental Protocol To be able to examine both short-term and long-term adaptation effects, animals were presented with asynchronous stimuli for 10 sessions with 750 stimulus presentations at each session. For four animals, the auditory stimuli were presented first, for the remaining four animals, the visual stimuli were presented first.
307
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
16.2.6 Data Preprocessing The local field potential of each trial was analyzed from 1 s before to 1 s after the first stimulus. The data of this time period were detrended separately for each trial and each channel. In addition, the temporal mean and the temporal standard deviation of the time period were determined for each trial and for each channel, and used for z-standardization. Amplifier clippings as they resulted from movement of the animals were identified by visual inspection. Only artifact-free trials were included into the analysis (~70–90% of the trials).
16.2.7 DTF: Mathematical Definition Directional influences between the auditory and the visual cortex were analyzed in single trials by estimating the DTF (Kaminski and Blinowska 1991; Kaminski et al. 2001; for comparison of the performance of the DTF with other spectral estimators, see Kus et al. 2004; Astolfi et al. 2007). The DTF is based on the concept of Granger causality. According to this concept, one time series can be called causal to a second one if its values can be used for the prediction of values of the second time series measured at later time points. This basic principle is typically mathematically represented in the formalism of autoregressive models (AR models). Let X1(t) be the time series data from a selectable channel 1, and X2(t) the data from a selectable channel 2: p
X1 (t ) =
∑A
1→1
p
( j) X1 (t − j) +
j =1
∑A
1→2
2→1
( j) X 2 (t − j) + E
(16.1)
( j) X 2 (t − j) + E
(16.2)
j =1
p
X 2 (t ) =
∑A p
( j)X1 (t − j) +
j =1
∑A
2→2
j =1
Here, the A(j) are the autoregressive coefficients at time lag j, p is the order of the autoregressive model, and E the prediction error. According to the concept of Granger causality, in Equation 16.1, the channel X2 is said to have a causal influence on channel X1 if the prediction error E can be reduced by including past measurements of channel X2 (for the influence of the channel X1 on the channel X2, see Equation 16.2). To investigate the spectral characteristics of interchannel interaction, the autoregressive coefficients in Equation 16.1 were Fourier-transformed; the transfer matrix was then obtained by matrix inversion: A1→1 ( f ) A2→1 ( f )
−1
=
A1→2 ( f ) A2→2 ( f )
H1→1 ( f ) H 2→1 ( f )
(16.3)
( j )e − i 2π fj when l = m
(16.4)
H1→2 ( f ) H 2→2 ( f )
where the components of the A(f) matrix are p
Al→m ( f ) = 1 −
∑A
l→m
j =1
with l being the number of the transmitting channel and m the number of the receiving channel
308
The Neural Bases of Multisensory Processes p
Al→m ( f ) = 0 −
∑A
l→m
( j )e − i 2π fj otherwise.
(16.5)
j =1
The DTF for the influence from a selectable channel 1 to a selectable channel 2, DTF1→2, is defined as
nDTF1→2 ( f ) = H1→2 ( f )2
(16.6)
In the case of only two channels, the DTF measures the predictability of the frequency response of a first channel from a second channel measured earlier in time. When, for example, X1 describes the local field potential from the auditory cortex, X2 the local field potential from the visual cortex, and the amplitude of the nDTF1→2 has high values in the beta band, this means that we are able to predict the beta response of the visual cortex from the beta response of the auditory cortex measured earlier in time. There are several possible situations of cross-cortical interaction that might underlie the modulation of DTF amplitudes (see, e.g., Kaminski et al. 2001; Cassidy and Brown 2003; Eichler 2006). See Section 16.4 for more details.
16.2.8 Estimation of Autoregressive Models We fitted bivariate autoregressive models to local field potential time series from auditory and visual cortices using the Burg method as this algorithm has been shown to provide accurate results (Marple 1987; Kay 1988; Schlögl 2003). We partitioned the time series data of single trials into 100-ms time windows that were stepped at intervals of 5 ms through each trial from 1 s before the first stimulus to 1 s after the first stimulus. Models were estimated separately for each time window of the single trials. Occasionally, the covariance matrix used for estimation of the AR coefficients turned out to be singular or close to singular, in these rare cases, the whole trial was not analyzed any further. In the present study, we used a modal order of 8, the sampling rate of 1000 Hz was used for model estimation. The model order was determined by the Akaike Information Criterion (Akaike 1974). After model estimation, the adequacy of the model was tested by analyzing the residuals (Lütkepohl 1993). Using this model order, the auto- and crosscovariance of the residuals was found to have values between 0.001% and 0.005% of the auto- and crosscovariance of the original data (data averaged from two animals here). In other words, the model was able to capture most of the covariance structure contained in the data. When DTFs were computed from the residuals, the single-trial spectra were almost flat, indicating that the noise contained in the residuals was close to white noise. The estimation of AR models requires the normality of the process. To analyze the extent to which normality assumption was fulfilled in our data, the residuals were inspected by plotting them as histograms and, in addition, a Lillie test was computed separately for the residuals of the single data windows. In about 80% of the data windows, the Lillie test confirmed the normality assumption. A second requirement for the estimation of the autoregressive models is the stationarity of the time series data. Generally, this assumption is better fulfilled with small data windows (Ding et al. 2000), although it is impossible to tell in advance at which data window a complex system like the brain will move to another state (Freeman 2000). A further reason why the use of small data windows is recommendable is that changes in the local field potential are captured at a higher temporal resolution. The spectral resolution of low frequencies does not seem to be a problem for small data windows when the spectral estimates are based on AR models (for a mathematical treatment of this issue, see, e.g., Marple 1987, p. 199f).
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
309
Using a high sampling rate ensures that the number of data points contained in the small time windows is sufficient for model estimation. For example, when we used a sampling rate of 500 Hz instead of 1000 Hz to estimate models from our time windows of 100 ms, the covariance of the residuals increased, signaling that the estimation had become worse (the autocovariance of the residuals of the auditory and visual channels at 1000 Hz were about 10% of the auto- and crosscovariance of the auditory and visual channels at 500 Hz). Importantly, when inspecting the spectra visually, they seemed to be quite alike, indicating that AR models were robust, to an extent, to a change in sampling rate. When using a data window of 200 ms with the same sampling rate of 500 Hz, the model estimation improved (the covariance of the residuals was 20–40% of the covariance of a model with a window of 100 ms), but at the expense of the temporal resolution.
16.2.9 Normalization of DTF Kaminski and Blinowska (1991) suggested normalization of the DTF relative to the structure that sends the signal, i.e., for the case of the directed transfer from the auditory channel to the visual channel: nDTFA→V ( f ) =
H A→ V ( f ) 2 k
∑H
M→V
(f)
(16.7)
2
M =1
In the two-channel case, the DTFA→V is divided by the sum of itself and the spectral autocovariance of the visual channel. Thus, when using this normalization, the amplitude of the nDTFA→V depends on the influence of the auditory channel on itself and, reciprocally, the amplitude of the nDTFV→A is dependent on the influence of the visual channel on itself. This is problematic in two ways: first, we cannot tell whether differences between the amplitude of the nDTFA→V and the amplitude of the nDTFV→A are because of differences in normalization or to differences in the strengths of crosscortical influences. Second, analysis of our data has shown that the auditory and the visual stimuli influenced both the amplitude of the local field potential and the spectral autocovariance of both auditory and visual channels. Thus, it is not clear whether changes in the amplitude of the nDTF after stimulation signal changes in the crosscortical interaction or changes in spectral autocovariance of the single channels. As the nonnormalized DTF is difficult to handle because of large differences in the amplitudes at different frequencies, we normalized the DTF in the following way: nDTFA→V ( f ) =
DTFA→V ( f ) n _ session n _ trials n _ windows
∑ ∑ ∑ 1
1
(
DTFA→V ( f ) / n _ windows * n _ trials * n _ session
(16.8)
)
1
with n_windows being the number of time windows of the prestimulus interval per trial, n_trials the number of trials per session, and n_session the number of sessions. Hence, the amplitude of the DTF estimated for each single time window of the single trials was divided by the average of the DTF of all time windows taken from the 1 s prestimulus interval of the single trials of all sessions.
16.2.10 Statistical Testing We assessed the statistical significance of differences in the amplitude of the nDTF using the bootstrap technique (e.g., Efron and Tibshirani 1993) to avoid being bound to assumptions about the
310
The Neural Bases of Multisensory Processes
empirical statistical error distribution of the nDTF (but see Eichler 2006, for an investigation of the statistical properties of the DTF). The general procedure was as follows: first, bootstrap samples were drawn from real data under the assumption that the null hypothesis was true. Then for each bootstrap sample, a chosen test statistic was computed. The values of the test statistic from all bootstrap samples formed a distribution of values of the test statistic under the assumption of the null hypothesis. Next, we determined from the bootstrap distribution of the test statistic the probability of finding values equal to or larger than the empirically observed one by chance. If this value was less than the preselected significance level, the null hypothesis was rejected. More specifically, in our first bootstrap test, we wanted to test the hypothesis of whether the nDTF has higher amplitude values in the poststimulus interval than in the prestimulus interval. Under the assumption of the null hypothesis, the nDTF amplitude values of the prestimulus and the poststimulus interval should not be different from each other. Thus, pairs of bootstrap samples were generated by taking single-trial nDTF amplitude values at random but with replacement from the prestimulus and from the poststimulus interval. For each of the sample pairs, the amplitudes were averaged across trials and the difference between the averages was computed separately for each pair. This procedure of drawing samples was repeated 1000 times, getting a distribution of differences between the average amplitudes. The resulting bootstrap distribution was then used to determine the probability of the real amplitude difference of the averages between the prestimulus and the poststimulus interval under the assumption of the null hypothesis. In a second bootstrap test, we assessed the significance of the slope of a line fitted to the data by linear regression analysis. We used the null hypothesis that the predictor variable (here, the number of stimulus presentations) and the response variable (here, the nDTF amplitude) are independent from each other. We generated bootstrap samples by randomly pairing the values of the predictor and observer variables. For each of these samples, a line was fitted by linear regression analysis and the slope was computed obtaining a distribution of slope values under the null hypothesis.
16.3 RESULTS 16.3.1 S timulus-Induced Changes in Single-Trial nDTF, Averaged across All Trials from All Sessions For a first inspection of the effect the audiovisual stimulation had on the nDTF, from the auditory to the visual cortex (nDTFA→V) and from the visual to the auditory cortex (nDTFV→A), we averaged nDTF amplitudes across all single trials of all sessions, separately for each time window from 1 s before to 1 s after the first stimulus. Figure 16.1 shows time-frequency plots of the nDTFA→V (left), which describes the predictability of the frequency response of the visual cortex based on the frequency response of the auditory cortex, and the nDTF V→A (right), which describes the predictability of the frequency response of the auditory cortex based on the frequency response of the visual cortex. Results from animals receiving the light stimulus first are presented in the upper two graphs and results from animals receiving the tone stimulus first are shown in the lower two graphs. Data from 200 ms before the first stimulus to 1 s after the first stimulus is shown here. Note that the abscissa indicates the start of a time window (window duration: 100 ms), so the data from time windows at 100 ms before the first stimulus are already influenced by effects occurring after the presentation of the first stimulus. The significance of the observed changes in the nDTF amplitude was assessed separately for each animal using Student’s t-test based on the bootstrap technique (see Methods). More precisely, we tested whether the amplitudes of the nDTF averaged across trials at different time points after the presentation of the first stimulus were significantly different from the nDTF amplitude of the prestimulus interval, averaged across trials and time from –1000 to 100 ms before the first stimulus. To compare the relative amplitudes of the nDTFA→V and the nDTFV→A, we tested whether the difference of the amplitudes of nDTFA→V and nDTFV→A averaged across trials at different time points
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony A V–DTF
100
1.6
Frequency [Hz]
80
1.4
60 40
0
(a)
80
1.4
80
60
1.2
60
40
1
40
Frequency [Hz]
0.8
100
0
0.2 0.4 0.6 0.8
A V–DTF – V A–DTF
80
0.6
0.5 0
60
100
1.8 1.6 1.4 1.2 1 0.8 0.2 0.4 0.6 0.8
A V–DTF – V A–DTF 0.2
80
0
60 40
20
–1
20
Time [s]
V A–DTF
0
–0.5
0.2 0.4 0.6 0.8
0.2 0.4 0.6 0.8
20
40
0
1 0
100
(b)
1.5
20
1.6
20
Frequency [Hz]
40
0.2 0.4 0.6 0.8
A V–DTF
100
(c)
1 0.6
2
80 60
0.8
20
V A–DTF
100
1.2
311
–0.2 –0.4 0
0.2 0.4 0.6 0.8 Time [s]
FIGURE 16.1 (a and b) nDTFA→V (left) and nDTF V→A (right), averaged across all trials from all sessions, separately for time windows from –0.2 to 0.9 s after start of first stimulus. (a) Animal receiving light first. (b) Animal receiving tone first. (c) Difference between averages (nDTFA→V – nDTFV→A). Animal receiving light first (left). Animal receiving tone first (right).
after the presentation of the first stimulus were significantly different from the difference of the amplitudes of nDTFA→V and nDTFV→A of the prestimulus interval. In the following we will describe only peaks of the amplitudes of nDTF, which deviated significantly (P < 0.01) from the average amplitude of prestimulus interval. 16.3.1.1 Animals Receiving Light Followed by Tone Stimulus (VA-Animals) At first sight, the response of the nDTFA→V closely resembled the response of the nDTFV→A. In animals receiving first the light stimulus and then the tone stimulus we observed two prominent positive peaks in both the nDTFA→V (Figure 16.1a, left) and the nDTFV→A (Figure 16.1a, right), the first one after the light stimulus started at about –20 ms and the second one after the tone stimulus began at about 151 ms. After the second peak, the amplitude of the nDTFA→V and the nDTFV→A dropped to slightly less than the prestimulus baseline values and returned very slowly to the prestimulus values within the next second.
312
The Neural Bases of Multisensory Processes
Even though the temporal development and the frequency spectra were roughly similar in the nDTFA→V and the nDTFV→A, there were small but important differences. First, there were stimulus-evoked differences in the amplitudes of the nDTFA→V and the nDTF V→A (Figure 16.1c, left, and the line plots in Figure 16.2, top). After the visual stimulus, the nDTF amplitude was significantly higher in the nDTFV→A than in the nDTFA→V, whereas after the auditory stimulus, the nDTFA→V reached higher values, but only at frequencies exceeding 30 Hz. Second, even though the peaks could be found at all frequency bands in the nDTF V→A, the first peak was strongest at a frequency of 1 Hz and at about 32 Hz, and the second peak at frequencies of 1 Hz and at about 40 Hz. In the nDTFA→V, the highest amplitude values after the first peak could be observed at 1 Hz and at about 35 Hz and after the second peak at 1 Hz and at about 45 Hz.
nDTF
16.3.1.2 Animals Receiving Tone Followed by Light Stimulus (AV-Animals) In animals receiving first the light stimulus and then the tone stimulus, three positive peaks developed after stimulation. As in the VA animals, the nDTFA→V and nDTFV→A were similar to each other (Figure 16.1b and the line plots in Figure 16.2, bottom). The first peak could be found between
95 Hz 85 Hz 75 Hz 65 Hz 55 Hz 45 Hz 35 Hz 25 Hz 15 Hz 5 Hz –200
0
200
400
600
800
600
800
nDTF
Time [ms]
95 Hz 85 Hz 75 Hz 65 Hz 55 Hz 45 Hz 35 Hz 25 Hz 15 Hz 5 Hz –200
0
200
400 Time [ms]
FIGURE 16.2 Top: representative nDTFV→A (dashed) and nDTFA→V (solid), averaged across all trials from all sessions, separately for all time windows from –200 to 900 ms after start of first stimulus, from an animal receiving light first, followed by tone stimulus. Bottom: data from an animal receiving tone first, followed by light stimulus.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
313
the tone and the light stimulus, at about –40 ms. The second and the third peaks occurred after the light stimulus at about 170 ms and 330 ms, respectively. And as in the VA animals, after the auditory stimulus (here the first stimulus), the amplitude of the nDTFA→V significantly exceeded the amplitude of the nDTF V→A for frequencies above 20 Hz in the AV animals, whereas after the visual stimulus, amplitudes were significantly higher in the nDTFV→A (Figure 16.1c, right). Thus, the sign of the difference between the nDTFA→V and the nDTFV→A depended on the type of the stimulus (auditory or visual) and not on the order of stimulus presentation. The peaks ran through all frequencies from 0 to 100 Hz. The first peak of the nDTFA→V was most pronounced at 1 Hz and at about 42 Hz, the second peak at 1 Hz, at about 32 Hz, and at 100 Hz. The first peak of the nDTFV→A reached their highest values at 1 Hz and at 35 Hz, the second peak had its highest amplitude at 1 Hz and at 28 Hz. For the third peak, the amplitude was most prominent at 1 Hz.
16.3.2 Development of Amplitude of nDTFA→V and nDTFV→A within Sessions To investigate the development of the effects within the sessions, we divided the 750 trials of each session into windows of 125 trials from the start to the end of each session. Averaging was done across the trials of each trial window, but separately for the time windows within the course of each trial. Trials from all sessions were included in the average. As for the majority of the animals, the nDTF amplitude increased or decreased fairly smoothly within the sessions, and we decided to characterize the effects by linear regression analysis. The slope of the regression line fitted to the observed data points was subjected to statistical testing using the bootstrap technique (for details, see Methods). 16.3.2.1 VA-Animals In Figure 16.3a and b, the development of the nDTF amplitude of the first and the second peaks within the sessions is depicted and averaged across all four animals that received the light stimulus first. Most of the effects could roughly be observed over the whole range of frequencies tested (in Figure 16.3, we selected nDTF peaks at a frequency of 40 Hz for illustration). Nevertheless, effects did not always reach significance at all frequencies tested (see Tables 16.1 and 16.2 for more detailed information on the development of peaks at other frequencies). After the first (visual) stimulus, the amplitude of the first peak increased in the nDTFA→V and decreased in the nDTFV→A (Figure 16.3a, left). At the beginning of the session, the amplitude was higher in the nDTFV→A than in the nDTFA→V, thus the amplitude difference between the nDTFA→V and the nDTFV→A decreased significantly over the session (Figure 16.3a, right). After the second (auditory) stimulus, the amplitude of the second peak increased both in the nDTFA→V and the nDTFV→A (Figure 16.3b, left). Importantly, the increase in the nDTFA→V exceeded the increase in the nDTFV→A, gradually increasing the difference between the nDTFA→V and the nDTFV→A (Figure 16.3b, right). 16.3.2.2 AV-Animals Similar to the nDTF development in VA-animals after the second (auditory) stimulus, in the AVanimals after the first (auditory) stimulus, the amplitude increased both in the nDTFA→V and the nDTFV→A (Figure 16.3c, left). The increase was more pronounced in nDTFA→V, further increasing the difference between the nDTFA→V and the nDTFV→A (Figure 16.3c, right). Interestingly, after the second (visual) stimulus, the behavior of the nDTF in the AV-animals did not resemble the behavior of the nDTF after the first (visual) stimulus in the VA-animals. In the AV-animals, the amplitude of the nDTFV→A increased after the visual stimulus, the amplitude of the nDTFA→V decreased slightly in some animals, whereas in other animals, an increase could be observed (Figure 16.3d, left; Table 16.1). After the visual stimulus, the amplitude of the nDTFV→A was already higher than the amplitude of the nDTFA→V at the beginning of the sessions,
314
The Neural Bases of Multisensory Processes VA animals: peak 1
1.4
A V–DTF V A–DTF
nDTF
1.3
0.1
1.2 1.1
(b)
0
A V–DTF V A–DTF
8
Difference (A V–DTF – V A–DTF)
0.1
–0.1
8
0
A V–DTF V A–DTF
2 4 6 Number of trial interval
8
Difference (A V–DTF – V A–DTF)
0.3 0.2 nDTF
nDTF
2 4 6 Number of trial interval AV animals: peak 1
0.9 0.8 0.7
0.1 0
0
2 4 6 Number of trial interval
–0.1
8
AV animals: peak 2
0.2
A V–DTF V A–DTF
0.15
0
2 4 6 Number of trial interval
8
Difference (A V–DTF – V A–DTF)
0 –0.05
nDTF
nDTF
2 4 6 Number of trial interval
0
1
0.1
–0.1
–0.15
0.05 0
0
0.2
1.2
0
(d)
–0.1
0.3
1.3
(c)
0
–0.2
8
VA animals: peak 2
1.6 1.5
nDTF
2 4 6 Number of trial interval
nDTF
1
Difference (A V–DTF – V A–DTF)
0.2 nDTF
(a)
0
2 4 6 Number of trial interval
8
–0.2
0
2 4 6 Number of trial interval
8
FIGURE 16.3 Development of nDTF peaks at 40 Hz within sessions averaged across nonoverlapping windows of 125 trials stepped through all sessions. (a and b) Animals receiving light first. (c and d) Animals receiving tone first. Left: development of average amplitude peak after first stimulus in nDTFA→V and nDTF V→A (a and c). Development of average amplitude peak after second stimulus in nDTFA→V and nDTFV→A (b and d). Right: amplitude of nDTFV→A peak subtracted from amplitude of nDTFA→V peak shown in left. Error bars denote standard error of mean, averaged across animals.
the difference between the nDTFA→V and the nDTFV→A further increased during the course of the sessions (Figure 16.3d, right).
16.3.3 Development of the Amplitude of nDTFA→V and nDTFV→A across Sessions To examine the effects of long-term adaptation, the nDTF amplitude of the first 100 trials was averaged separately for each session. The development of the amplitude averages across sessions
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
315
TABLE 16.1 P Values of Slope of a Regression Line Fitted to Peak Amplitudes of nDTF Averaged across Nonoverlapping Windows of 125 Trials Stepped through All Sessions A→V nDTF peak 1 Animals AV090 AV091 AV106 AV125 VA099 VA100 VA107 VA124
1 Hz <0.001 0.003a 0.02a 0.0a <0.001a 0.02a 0.004a 0.03a a
V→A nDTF peak 1
20 Hz
40 Hz
60 Hz
80 Hz
1 Hz
0.003 0.001a 0.001a <0.001a 0.001b 0.01a 0.001b <0.001a
0.00 0.001a <0.001a 0.04a 0.001b 0.04a 0.01b <0.001a
0.006 0.004a <0.001a 0.01a >0.05c 0.001a 0.01a 0.01a
0.003 0.002a <0.001a 0.001a 0.03a 0.001a 0.001a >0.05c
<0.001 0.001a 0.01a 0.02a <0.001b 0.02b >0.05c 0.01a
a
a
a
a
20 Hz b
A→V nDTF peak 2 Animals AV090 AV091 AV106 AV125 VA099 VA100 VA107 VA124
1 Hz <0.001 0.001a <0.001a <0.001a 0.001a 0.001a 0.001a >0.05c a
<0.001 0.001a <0.001a 0.02a <0.001b <0.001b 0.004b <0.001b b
40 Hz
60 Hz
80 Hz
<0.001 0.01a 0.002a 0.03a <0.001b 0.001b >0.05c <0.001b
>0.05 >0.05c 0.05a >0.05c <0.001b 0.002b >0.05c 0.01b
>0.001a >0.05c >0.05c >0.05c 0.001b 0.01b 0.01a 0.01b
b
c
V→A nDTF peak 2
20 Hz
40 Hz
60 Hz
80 Hz
1 Hz
20 Hz
40 Hz
60 Hz
80 Hz
0.05 0.001a >0.05c <0.001a <0.001a 0.001a 0.001a 0.01b
>0.05 0.002a 0.002a 0.001a <0.001a 0.001a 0.001a 0.001b
>0.05 0.001a <0.05c 0.003a <0.001a 0.001a 0.001a 0.01a
>0.05 <0.001a 0.001a <0.05c <0.001a 0.001a 0.001a >0.05c
0.01 <0.001a <0.001a 0.03a 0.02a >0.05c >0.05c 0.01a
0.03 0.001a <0.001a <0.001a 0.03a 0.002a 0.001a 0.02a
>0.05 0.002a 0.002a <0.001a 0.001a >0.05c >0.05c 0.001a
<0.001 0.006a <0.001a >0.05c >0.05c >0.05c >0.05c 0.001a
<0.001a 0.01b 0.004a >0.05c >0.05c 0.001a >0.05c >0.05c
c
c
c
a
a
c
a
Note: Upper table: results from the nDTF peak after the first stimulus. Bottom table, results from the nDTF peak after the second stimulus. Animal notation: AV, animals receiving tone first; VA, animals receiving the light first. a Slope is positive. b Slope is negative. c Nonsignificant results.
was examined by linear regression analysis and the significance of the slope was tested using the bootstrap technique. In the following, effects are reported for a chosen significance level of 0.05. Even though some significant trends could be observed, results were not consistent among animals. In the VA-group, one animal showed a decrease in the amplitude of the nDTFA→V at the beginning of the first stimulus, but an increase could be found only 20 ms after the beginning of the first stimulus. In a second animal, there was an increase in the amplitude of the nDTFA→V after the second stimulus. In the amplitude of the nDTF V→A of two VA-animals, decreases could be observed after the first and second stimulus, whereas in a third animal, an increase was found after the second stimulus. All these results could be observed for the majority of examined frequencies. In the nDTFA→V of the AV-animals, at many frequencies, no clear developmental trend could be observed, but at frequencies less than 10 Hz, there was an increase in amplitude both after the first and second stimulus in two animals, whereas in one animal, a decrease could be found after both stimuli. In the amplitude of the nDTF V→A, increases could be observed at various frequencies and time points after stimulation.
316
The Neural Bases of Multisensory Processes
TABLE 16.2 P Values of Slope of a Regression Line Fitted to Difference of Peak Amplitudes of nDTFV→A and nDTFA→V Averaged in Nonoverlapping Windows of 125 Trials Stepped through All Sessions Difference (A→V minus V→A): peak 1 Animals AV090 AV091 AV106 AV125 VA099 VA100 VA107
Difference (A→V minus V→A): peak 2
1 Hz
20 Hz
40 Hz
60 Hz
80 Hz
1 Hz
20 Hz
40 Hz
60 Hz
80 Hz
0.03 0.01a 0.008a >0.05c 0.002a 0.04a 0.01a
0.002 0.006a 0.03a >0.05c 0.005a 0.009a >0.05c
0.004 0.007a 0.04a 0.04a 0.002a 0.01a 0.04a
0.006 0.004a 0.03a 0.005a <0.001a 0.008a 0.02a
0.009 0.009a 0.02a 0.06a 0.002a 0.04a 0.04a
0.01 0.01b 0.02b 0.02b 0.002a 0.03a 0.01a
0.02 0.04b >0.05c 0.01b 0.001a 0.004a 0.06c
0.02 0.02b >0.05c 0.02b 0.002a 0.001a >0.05c
>0.05 >0.05c >0.05c 0.03b 0.002a 0.001a >0.05c
>0.05c 0.01a >0.05c 0.01b 0.002a 0.001a >0.05c
a
a
a
a
a
b
b
b
c
Note: Left, first nDTF peak. Right, second nDTF peak. Animal notation: AV, animals receiving tone first; VA, animals receiving the light first. a Slope is positive. b Slope is negative. c Nonsignificant results.
16.4 DISCUSSION The repeated presentation of pairs of auditory and visual stimuli, with random intervals between stimulus pairs but constant audiovisual stimulus onset asynchrony within each pair, led to robust changes in the interaction dynamics between the primary auditory and the primary visual cortex. Independent of the stimulus order, when an auditory stimulus was presented, the amplitude of the nDTFA→V exceeded the amplitude of the nDTF V→A, whereas after the visual stimulus, the amplitude of the nDTFV→A reached higher values. Moreover, within adaptation sessions, some of the observed changes in nDTF amplitudes showed clear dynamic trends, whereas across adaptation sessions, no coherent development could be observed. In the following we will discuss which processes might be evoked by the repeated asynchronous presentation of audiovisual stimuli and whether they might offer suitable explanations for the amplitude changes in the nDTF we observed. As paired-stimulus adaptation protocols, similar to the one used in the present study, have been shown to induce recalibration of temporal order judgment in humans (e.g., Fujisaki et al. 2004; Vroomen et al. 2004), we want to discuss whether some of the described effects on the directed information transfer could possibly underlie such recalibration functions. To prepare the discussion, some general considerations of the interpretation of nDTF amplitudes seem appropriate.
16.4.1 Interpretation of DTF-Amplitudes Long-range interaction processes have been frequently associated with coherent oscillatory activity between the cortices (Bressler 1995; Bressler et al. 1993; Roelfsema et al. 1997; Rodriguez et al. 1999; Varela et al. 2001). Moreover, it has been shown that the oscillatory activity in one cortical area can be predicted by earlier measurement of another cortical area using the DTF (Kaminski et al. 1997, 2001; Korzeniewska et al. 1997, 2003; Franaszczuk and Bergey 1998; Medvedev and Willoughby 1999; Liang et al. 2000), indicating that the oscillatory activity might signal directional influences between the cortices.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
317
However, as Cassidy and Brown (2003) have demonstrated in a series of simulation studies, there is no straightforward way to conclude from the information provided by the DTF to cross-cortical interactions. Specifically, from DTF amplitudes alone, we cannot tell whether the information flow is unidirectional, bidirectional, or even multidirectional, including additional brain areas. Let us consider the situation after the presentation of the auditory stimulus when the amplitude of the nDTFA→V attains higher values than the amplitude of the nDTFV→A. First, this result might indicate that there is unidirectional influence from the auditory to the visual cortex, with the size of the amplitude difference positively correlating with the delay in the information transfer. Second, this finding could also reflect a reciprocal influence between the auditory and visual cortices, but with the influence from the auditory cortex either larger in amplitude or lagged relative to the influence from the visual cortex. Third, additional unobserved structures might be involved, sending input slightly earlier to the auditory cortex than to the visual cortex.
16.4.2 Development of nDTF-Amplitude within Sessions The development of the nDTF after the auditory stimulus did not seem to depend strongly on the order of stimulus presentation. Independent of whether an auditory or a visual stimulus was presented first, after the auditory stimulus, the peak amplitude of both the nDTFA→V and nDTFV→A increased. Noteworthy, the increase was more pronounced in the nDTFA→V than in the nDTF V→A, further increasing the difference between the amplitudes of the nDTFA→V and the nDTFV→A. Using the interpretation scheme introduced above, under the assumption of unidirectional interaction, the influence from the auditory to the visual cortex not only increased in strength but also the lag with which the input is sent became larger with increasing number of stimulus repetitions. In case of bidirectional interaction, influences from both sides increased, but the influence from the auditory cortex became stronger relative to the influence from the visual cortex. Finally, in case of multidirectional interaction, the influence of a third structure in both the auditory and the visual cortex might become more pronounced, but at the same time, the temporal delay of input sent to the visual cortex relatively to the delay input sent to the auditory cortex is increased even further. All three interpretations have in common that not only the interaction gathered in but also the mode of the interaction changed. In contrast to the development of the nDTF after the auditory stimulus, the development of the nDTF after the visual stimulus clearly depended on the order of stimulus presentation. When the visual stimulus was presented first, contrary to expectations, the amplitude of the nDTF V→A decreased with increasing number of stimulus repetitions, whereas the amplitude of the nDTFA→V increased in the majority of the animals. Thus, assuming that unidirectional influence underlies our data, this finding might reflect that the visual cortex sends influences to the auditory cortex at increasingly shorter delays. In case of bidirectional interaction, the input from the visual cortex decreases whereas the input from the auditory cortex increases. Finally, under the assumption of multidirectional interaction, a hypothetical third structure might still send its input earlier to the visual cortex, but the delay becomes diminished with increasing number of stimulus repetitions. When the visual stimulus was presented as the second stimulus, the nDTF behaved similarly as after the auditory stimulus. More precisely, both the peak amplitude of the nDTFA→V and the nDTFV→A increased within the sessions. But importantly, now the increase was stronger in the nDTFV→A. To summarize, the characteristic developmental trend after the second stimulus was an increase in both nDTFA→V and nDTFV→A, with the increase stronger in the nDTF sending information from the structure the stimulus had been presented to, namely in the nDTF V→A after the visual stimulus and in the nDTFA→V after the auditory stimulus. After the first stimulus, no typical development of the nDTF can be outlined: the behavior of the nDTF clearly depended on the stimulus modality as the difference in nDTFA→V and nDTFV→A amplitudes increased for an auditory stimulus, but decreased for a visual stimulus.
318
The Neural Bases of Multisensory Processes
16.4.3 Audiovisual Stimulus Association as a Potential Cause of Observed Changes in nDTF-Amplitudes The cross-cortical interaction between auditory and visual cortices reflected in the peaks of the nDTF could simply be an indication that information is spread among the sensory cortices during the course of stimulus processing. However, we also have to take into account that the nDTF amplitudes increased within the sessions, signaling that the interaction between the auditory and the visual cortex intensified. In addition, after the visual stimulus, the behavior of the DTF differed strongly with the order of stimulus presentation. Each of these observations might be a sign that the auditory and the visual information became associated. This hypothesis is in accordance with the unity assumption (e.g., Bedford 2001; Welch 1999; Welch and Warren 1980), which states that two stimuli from different sensory modalities will be more likely regarded as deriving from the same event when they are presented, for example, in close temporal congruence. The increase in the nDTF after the second stimulus might indicate that stimuli are integrated after the second stimulus has been presented. The increase in the nDTF before the second stimulus might indicate the expectation of the second stimulus. Several other studies have demonstrated increases in coherent activity associated with anticipatory processing (e.g., Roelfsema et al. 1998; Von Stein et al. 2000; Fries et al. 2001; Liang et al. 2002). But on the other hand, our results on the development of the nDTF after the first stimulus varied strongly with the stimulus order, and it seems strange that the effect the expectation of an auditory stimulus has on the nDTF is quite different from the effect the expectation of a visual stimulus might have on the nDTF. To clarify whether the observed changes might have something to do with stimulus association or expectation processes, the repetition of this experiment with anesthetized animals might be helpful. To explore whether the nDTF amplitude is influenced by anticipatory processing, it might also be interesting to vary the likelihood with which a stimulus of a first modality is followed by a stimulus of a second modality (see Sutton et al. 1965, for an experiment examining the effect of stimulus uncertainty on local field potentials).
16.4.4 Changes in Lag Detection as a Potential Cause of Observed Changes in DTF-Amplitudes As we presented our stimuli constantly at the same lag, it does not seem far-fetched to assume that our stimulation alerted hypothetical lag detectors. There are already some studies on the neural correlates of synchronous and asynchronous stimulus presentation (Meredith et al. 1987; Bushara et al. 2001; Senkowski et al. 2007). Senkowski et al. (2007) examined the oscillatory gamma-band responses in the human EEG for different stimulus onset asynchronies of auditory and visual stimuli. They found clear evidence for multisensory interactions in the gamma-band response when stimuli were presented in very close temporal synchrony. In addition, they also found a very specific interaction effect over occipital areas when auditory inputs were leading visual input by 100 ± 25 ms, indicating that cortical responses could be specific for certain asynchronies.
16.4.5 Mechanisms of Recalibration: Some Preliminary Restrictions 16.4.5.1 Expectation and Lag Detection Experiments on the recalibration of temporal order judgment typically demonstrate a shift of the entire psychometric function (at many stimulus onset synchronies), despite the fact that only a single intermodal lag value has been used in the adaptation phase. In other words, a specific stimulus order does not seem to be necessary to be able to observe the change in temporal order perception, indicating that expectation processes are unlikely to play a major role in evoking recalibration
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
319
effects. In a similar way, a specific stimulus onset asynchrony between the stimuli does not seem to be required, speaking against a dominant role for lag-specific detection processes underlying the recalibration effect. 16.4.5.2 Processes after the Second Stimulus Even though the presentation of stimuli at a specific lag or in a specific order does not seem to be necessary to make the recalibration of temporal perception observable in behavior, it is still possible that the presentation of both an auditory and a visual stimulus is required. Under this hypothesis, the mechanisms of recalibration should come into play only after stimuli of both modalities have been presented. After the second stimulus, we could observe an increase in the difference of the amplitudes of the nDTFs in both AV and VA animals. We hypothesized that this increase might reflect an ongoing stimulus association. Vatakis and Spence (2007) demonstrated that subjects showed decreased temporal sensitivity, as measured by the JND, when an auditory and a visual speech stimulus belonged to the same speech event. Also, in some experiments on recalibration, an increase in the JND was observed (Navarra et al. 2005, 2006; Fujisaki et al. 2004). However, it is premature to conclude that stimulus association plays a role in recalibration experiments. First, an increase in JND after stimulus association could not be observed with different experimental conditions (Vatakis and Spence 2008). Second, as already discussed in the Introduction, recalibration does not seem to be stimulus-specific (Fujisaki et al. 2004; Navarra et al. 2005). 16.4.5.3 Speed of Processing The observation that neither a specific lag nor a specific stimulus order seemed to be required to observe a recalibration effect supports a further possibility: to observe a change in temporal perception, the presentation of a second stimulus might not be necessary at all. Temporal perception in different modalities is probably not recalibrated relative to each other but perception is simply speeded up or slowed down in one modality. In our data, we did not find any indication for an increase in the speed of stimulus processing. The latencies of the nDTF peaks did not change with increasing number of stimulus presentations, but one has to keep in mind here that there might not be a direct relationship between the speed of processing and the speed of perception measured in recalibration experiments. Fujisaki et al. (2004) investigated the role of the speed of sensory processing in recalibration. Specifically, they advanced the hypothesis that processing in one modality might be speeded up by drawing attention to that modality, but based on the results of their experiments, they concluded that attention and recalibration were independent. If there was a general increase in the speed of perception, a transfer of recalibration effects to modality pairs not presented in the adaptation phase should be easy to detect. Preliminary results indicate that the effects and mechanisms of recalibration are not that simple. In the study by Harrar and Harris (2008), after the adaptation with visuotactile stimulus pairs, the visual stimulus was perceived to occur later relative to an auditory stimulus, but surprisingly, there were no changes in the perception of temporal disparities when the visual stimulus was presented with a tactile stimulus during the testing phase.
16.5 CONCLUSIONS The repeated presentation of paired auditory and visual stimuli with constant intermodal onset asynchrony is known to recalibrate audiovisual temporal order judgment in humans. The aim of this study was to identify potential neural mechanisms that could underlie this recalibration in an animal model amenable to detailed electrophysiological analysis of neural mass activity. Using Mongolian gerbils, we found that prolonged presentation of paired auditory and visual stimuli
320
The Neural Bases of Multisensory Processes
caused characteristic changes in the neuronal interaction dynamics between the primary auditory cortex and the primary visual cortex, as evidenced by changes in the amplitude of the nDTF estimated from local field potentials recorded in both cortices. Specifically, changes in both the DTF from auditory to visual cortex (nDTFA→V) and from visual to auditory cortex (nDTF V→A) dynamically developed over the course of the adaptation trials. We discussed three types of processes that might have been induced by the repeated stimulation: stimulus association processes, lag detection processes, and changes in the speed of stimulus processing. Although all three processes could potentially have contributed to the observed changes in nDTF amplitudes, their relative roles for mediating psychophysical recalibration of temporal order judgment must remain speculative. Further clarification of this issue would require a behavioral test of the recalibration of temporal order judgment in combination with the electrophysiological analysis.
REFERENCES Akaike, H. 1974. A new look at statistical model identification. Transactions on Automatic Control 19:716–723. Alais, D. and S. Carlile. 2005. Synchronizing to real events: Subjective audiovisual alignment scales with perceived auditory depth and speed of sound. Proceedings of the National Academy of Science of the United States of America 102(6):2244–2247. Arnold, D.H., A. Johnston, and S. Nishida. 2005. Timing sight and sound. Vision Research 45:1275–1284. Astolfi, L., F. Cincotti, D. Mattia, M.G. Marciani, L.A. Baccala, F. de Vico Fallani, S. Salinari, M. Ursino, M. Zavaglia, L. Ding, J.C. Edgar, G.A. Miller, B. He, and F. Babiloni. 2007. Comparison of different cortical connectivity estimators for high-tech resolution EEG Recordings. Human Brain Mapping 28:143–157. Baylor, D.A., B.J. Nunn, and J.L. Schnapf. 1984. The photocurrent, noise and spectral sensitivity of rods of the monkey Macaca fascicularis. Journal of Physiology 357:575–607. Baylor, D.A., B.J. Nunn, and J.L. Schnapf. 1987. Spectral sensitivity of cones of the monkey Macaca fascicularis. Journal of Physiology 390:124–160. Bedford, F.L. 2001. Toward a general law of numerical/object identity. Current Psychology of Cognition 20(3–4):113–175. Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–2198. Bressler, S.L. 1995. Large scale cortical networks and cognition. Brain Research Reviews 20:288–304. Bressler, S.L. 1996. Interareal synchronization in the visual cortex. Behavioral Brain Research 76:37–49. Bressler, S.L., R. Coppola, and R. Nakamura. 1993. Episodic multiregional cortical coherence at multiple frequencies during visual task performance. Nature 366:153–156. Brosch, M., E. Selezneva, and H. Scheich. 2005. Nonauditory events of a behavioral procedure activate auditory cortex of highly trained monkeys. Journal of Neuroscience 25(29):6796–6806. Bushara, K.O., J. Grafman, and M. Hallet. 2001. Neural correlates of audio-visual stimulus onset asynchrony detection. The Journal of Neuroscience 21(1):300–304. Cahill, L., F.W. Ohl, and H. Scheich. 1996. Alternation of auditory cortex activity with a visual stimulus through conditioning: A 2-deoxyglucose analysis. Neurobiology of Learning and Memory 65(3):213–222. Cassidy, M., and P. Brown. 2003. Spectral phase estimates in the setting of multidirectional coupling. Journal of Neuroscience Methods 127:95–103. Cobbs, E.H., and E.N. Pugh Jr. 1987. Kinetics and components of the flash photocurrent of isolated retinal rods of the larval salamander, Ambystoma tigrinum. Journal of Physiology 394:529–572. Corey, D.P., and A.J. Hudspeth. 1979. Response latency of vertebrate hair cells. Biophysical Journal 26:499–506. Corey, D.P., and A.J. Hudspeth. 1983. Analysis of the microphonic potential of the bullfrog’s sacculus. Journal of Neuroscience 3:942–961. Crawford, A.C., and R. Fettiplace. 1985. The mechanical properties of ciliary bundles of turtle cochlear hair cells. Journal of Physiology 364:359–379. Crawford, A.C., M.G. Evans, and R. Fettiplace. 1991. The actions of calcium on the mechanoelectrical transducer current of turtle hair cells. Journal of Physiology 491:405–434. Ding, M., S.L. Bressler, W. Yang, and H. Liang. 2000. Short-window spectral analysis of cortical event-related potentials by adaptive autoregressive modelling: Data preprocessing, model validation, variability assessment. Biological Cybernetics 83:35–45.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
321
Eichler, M. 2006. On the evaluation of information flow in multivariate systems by the directed transfer function. Biological Cybernetics 94:469–482. Engel, G.R., and W.G. Dougherty 1971. Visual-auditory distance constancy. Nature 234:308. Efron, B., and R.J. Tibshirani 1993. An Introduction to the Bootstrap. Boca Raton, FL: Chapman and Hall/ CRC. Fain, G.L. 2003. Sensory Transduction. Sunderland: Sinauer Associates. Franaszczuk, P.J., and G.K. Bergey. 1998. Application of the directed transfer function method to mesial and lateral onset temporal lobe seizures. Brain Topography 11:13–21. Freeman,W.J. 2000. Neurodynamics: An Exploration in Mesoscopic Brain Dynamics. London: Springer Verlag. Fries, P., J.H. Reynolds, A.E. Rorie, and R. Desimone. 2001. Modulation of oscillatory neuronal synchronization by selective visual attention. Science 291:1560–1563. Fujisaki, W., and S. Nishida. 2005. Temporal frequency characteristics of synchrony-asynchrony discrimination of audio-visual signals. Experimental Brain Research 166:455–464. Fujisaki, W., and S. Nishida. 2008. Top-down feature based selection of matching feature for audio-visual synchrony discrimination. Neuroscience Letters 433:225–230. Fujisaki, W., S. Shinsuke, K. Makio, and S. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature Neuroscience 7(7):773. Fujisaki, W., A. Koene, D. Arnold, A. Johnston and S. Nishida. 2006. Visual search for a target changing in synchrony with an auditory signal. Proceedings of the Royal Society of London. Series B. Biological Sciences 273:865–874. Harrar, V., and L.R. Harris. 2005. Simultaneity constancy: Detecting events with touch and vision. Experimental Brain Research 166:465–473. Harrar, V., and L.R. Harris. 2008. The effects of exposure to asynchronous audio, visual, and tactile stimulus combination on the perception of simultaneity. Experimental Brain Research 186:517–524. Hanson, J.V.M., J. Heron, and D. Whitaker. 2008. Recalibration of perceived time across sensory modalities. Experimental Brain Research 185:347–352. Heron, J., D. Whitaker, P. McGraw, and K.V. Horoshenkov. 2007. Adaptation minimizes distance-related audiovisual delays. Journal of Vision 7(13):1–8. Hestrin, S., and J.I. Korenbrot. 1990. Activation kinetics of retinal cones and rods: Response to intense flashes of light. Journal of Neuroscience 10:1967–1973. Kaminski, M., and K.J. Blinowska. 1991. A new method for the description of the information flow in the brain structures. Biological Cybernetics 65:203–210. Kaminski, M., K.J. Blinowska, and W. Szelenberger. 1997. Topographic analysis of coherence and propagation of EEG activity during sleep and wakefulness. Electroencephalography Clinical Neurophysiology 102:216–277. Kaminski, M., M. Ding, W.A. Trucculo, and S.L. Bressler. 2001. Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance. Biological Cybernetics 85:145–157. Kay, S.M. 1987. Modern Spectral Estimation. Englewood Cliffs, NJ: Prentice Hall. Kayser, C., C. Petkov, and N.K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18:1560–1574. Keetels, M., and J. Vroomen. 2007. No effect of auditory–visual spatial disparity on temporal recalibration. Experimental Brain Research 182:559–565. Kopinska, A., and L.R. Harris. 2004. Simultaneity constancy. Perception 33:1049–1060. Korzeniewska, A., S. Kasicki, M. Kaminski, and K.J. Blinowska. 1997. Information flow between hippocampus and related structures during various types of rat’s behavior. Journal of Neuroscience Methods 73:49–60. Korzeniewska, A., M. Manczak, M. Kaminski, K.J. Blinowska, and S. Kasicki. 2003. Determination of information flow direction among brain structures by a modified directed transfer function (dDTF) method. Journal of Neuroscience Methods 125:195–207. Kus, R., M. Kaminski, and K.J. Blinowska. 2004. Determination of EEG activity propagation: Pairwise versus multichannel estimate. IEEE Transactions on Bio-Medical Engineering 51:1501–1510. Lewald, J., and R. Guski. 2004. Auditory-visual temporal integration as a function of distance: No compensation for sound-transmission time in human perception. Neuroscience Letters 357:119–122. Liang, H., M. Ding, R. Nakamura, and S.L. Bressler. 2000. Causal influences in primate cerebral cortex. Neuroreport 11(13):2875–2880.
322
The Neural Bases of Multisensory Processes
Liang, H., S.L. Bressler, M. Ding, W.A. Truccolo, and R. Nakamura. 2002. Synchronized activity in prefrontal cortex during anticipation of visuomotor processing. Neuroreport 13(16):2011–2015. Lütkepohl, H. 1993. Introduction to Multiple Time Series Analysis, 2nd ed. Berlin: Springer. Marple, S.L. 1987. Digital Spectral Analysis with Applications. Englewood Cliffs, NJ: Prentice Hall. Medvedev, A., and J.O. Willoughby. 1999. Autoregressive modeling of the EEG in systemic kainic acid-induced epileptogenesis. International Journal of Neuroscience 97:149–167. Meredith, M.A., J.W. Nemitz, and B.E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. The Journal of Neuroscience 7(10):3212–3229. Miyazaki, M., S. Yamamoto, S. Uchida, and S. Kitazawa. 2006. Bayesian calibration of simultaneity in tactile temporal order judgment. Nature Neuroscience 9:875–877. Musacchia, G., and C.E. Schroeder. 2009. Neural mechanisms, response dynamics and perceptual functions of multisensory interactions in auditory cortex. Hearing Research 285:72–79. Navarra, J., A. Vatakis, M. Zampini, S. Soto-Faraco, W. Humphreys, and C. Spence. 2005. Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain Research 25:499–507. Navarra, J., S. Soto-Faraco, and C. Spence. 2006. Adaptation to audiovisual asynchrony. Neuroscience Letters 431:72–76. Nickalls, R.W.D. 1996. The influences of target angular velocity on visual latency difference determined using the rotating Pulfirch effect. Vision Research 36:2865–2872. Ohl, F.W., H. Scheich, and W.J. Freeman. 2000. Topographic analysis of epidural pure-tone-evoked potentials in gerbil auditory cortex. Journal of Neurophysiology 83:3123–3132. Ohl, F.W., H. Scheich, and W.J. Freeman. 2001. Change in pattern of ongoing cortical activity with auditory learning. Nature 412:733–736. Posner, M.I., C.R.R. Snyder, and B.J. Davidson. 1980. Attention and the detection of signals. Journal of Experimental Psychology: General 109(2):160–174. Robson, J.G., S.M. Saszik, J. Ahmed, and L.J. Frishman. 2003. Rod and cone contributions to the a-wave of the electroretinogram of the macaque. Journal of Physiology 547:509–530. Rodriguez, E., N. Georg, J.P. Lachaux, J. Martinerie, B. Renault, and F.J. Varela. 1999. Perception’s shadow: Long-distance synchronization of neural activity. Nature 397:430–433. Roelfsema, P.R., A.K. Engel, P. König, and W. Singer. 1997. Visuomotor integration is associated with zero time-lag synchronization among cortical areas. Nature 385:157–161. Roelfsema, P.R., V.A.F. Lamme, and H. Spekreijse. 1998. Object based attention in the primary auditory cortex of the macaque monkey, Nature 395:377–381. Schlögl, A. 2006. A comparison of multivariate autoregressive estimators. Signal Processing 86:2426–2429. Senkowski, D., D. Talsma, M. Grigutsch, C.S. Herrmann, and M.G. Woldorff. 2007. Good times for multisensory integration: Effects of the precision of temporal synchrony as revealed by gamma band oscillations. Neuropsychologica 45:561–571. Stone, J.V. 2001. Where is now? Perception of simultaneity. Proceedings of the Royal Society of London. Series B. Biological Sciences 268:31–38. Sugita, Y., and Suzuki, Y. 2003. Audiovisual perception. Implicit evaluation of sound arrival time. Nature 421:911. Sutton, S., M. Braren, J. Subin, and E.R. John. 1965. Evoked potential correlates of stimulus uncertainty. Science 150:1178–1188 Varela, F., J. Lacheaux, E. Rodriguez, and J. Martinerie. 2001. The brain-web: Phase synchronization and largescale integration. Nature Reviews Neuroscience 2:229–239. Vatakis, A., and C. Spence. 2007. Crossmodal binding: Evaluating the influence of the ‘unity assumption’ using audiovisual speech stimuli. Perception & Psychophysics 69(5):744–56. Vatakis, A., and Spence, C. 2008. Evaluating the influence of the ‘unity assumption’ on the temporal perception of realistic audiovisual stimuli. Acta Psychologica 127:12–23. Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2007. Temporal recalibration during asynchronous audiovisual speech perception. Experimental Brain Research 181:173–181. Vatakis, A., J. Navarra, S. Soto-Faraco, and C. Spence. 2008. Audiovisual temporal adaptation of speech: Temporal order versus simultaneity judgments. Experimental Brain Research 185:521–529. Von Békésy, G. 1963. Interaction of paired sensory stimuli and conduction of peripheral nerves. Journal of Applied Physiology 18:1276–1284. Von Stein, A., C. Chiang, and P. König. 2000. Top-down processing mediated by interarea synchronization. Proceedings of the National Academy of Science of the United States of America 97:147148–147153.
Effects of Prolonged Exposure to Audiovisual Stimuli with Fixed Stimulus Onset Asynchrony
323
Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by exposure to audio-visual asynchrony. Cognitive Brain Research 22:32–35. Welch, R.B. 1999. Meaning, attention and the unity assumption in the intersensory bias of spatial and temporal perceptions. In Cognitive contributions to the perception of spatial and temporal events, ed. G. Achersleben, T. Bachmann, and J. Müsseler, 371–387. Amsterdam: Elsevier. Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 88:638–667. Wilson, J.A., and S.M. Anstis. 1996. Visual delay as a function of luminance. American Journal of Psychology 82:350–358.
17
Development of Multisensory Temporal Perception David J. Lewkowicz
CONTENTS 17.1 Introduction........................................................................................................................... 325 17.2 Perception of Multisensory Temporal Information and Its Coherence................................. 326 17.3 Developmental Emergence of Multisensory Perception: General Patterns and Effects of Experience......................................................................................................................... 327 17.4 Perception of Temporal Information in Infancy.................................................................... 330 17.5 Perception of A–V Temporal Synchrony............................................................................... 331 17.5.1 A–V Temporal Synchrony Threshold........................................................................ 331 17.5.2 Perception of A–V Speech Synchrony and Effects of Experience............................ 332 17.5.3 Binding of Nonnative Faces and Vocalizations......................................................... 334 17.6 Perception of Multisensory Temporal Sequences in Infancy................................................ 336 17.7 Speculations on Neural Mechanisms Underlying the Development of Multisensory Perception.............................................................................................................................. 338 References....................................................................................................................................... 339
17.1 INTRODUCTION The objects and events in our external environment provide us with a constant flow of multisensory information. Such an unrelenting flow of information might be potentially confusing if no mechanisms were available for its integration. Fortunately, however, sophisticated multisensory integration* mechanisms have evolved across the animal kingdom to solve this problem (Calvert et al. 2004; Ghazanfar and Schroeder 2006; Maier and Schneirla 1964; Marks 1978; Partan and Marler 1999; Rowe 1999; Stein and Meredith 1993; Stein and Stanford 2008; Welch and Warren 1980). These mechanisms enable mature organisms to integrate multisensory inputs and, in the process, make it possible for them to perceive the coherent nature of their multisensory world. The other chapters in this volume discuss the structural and functional characteristics of multisensory processing and integration mechanisms in adults. Here, I address the developmental question by asking (1) when do multisensory response mechanisms begin to emerge in development, and (2) what specific processes underlie their emergence? To answer these questions, I discuss our work on the development of multisensory processing of temporal information and focus primarily on human infants. I show that processing of multisensory temporal information, as well as the * Historically, the term “integration,” when used in the context of work on multisensory processing, has been used to refer to different processes by different researchers (Stein et al. 2010). For some, this term is reserved for cases in which sensory input in one modality changes the qualitative experience that one has in response to stimulation in another modality, as is the case in the McGurk effect (McGurk and MacDonald 1976). For others, it has come to be associated with neural and behavioral responsiveness to near-threshold stimulation in one modality either being enhanced or suppressed by stimulation in another modality (Stein and Stanford 2008). Finally, for some investigators, integration has simply meant the process that enables perceivers to detect and respond to the relational nature of multisensory stimulation and no assumptions were made about underlying perceptual or neural mechanisms. It is this last meaning that is used here.
325
326
The Neural Bases of Multisensory Processes
processing of other types of multisensory information, emerges gradually during the first year of life, argue that the rudimentary multisensory processing abilities found at the beginning of life reflect neural/behavioral immaturities and the relative lack of perceptual and sensorimotor experience, and provide evidence that the gradual improvement in multisensory processing ability reflects the interaction between behavioral and (implied) neural maturation and perceptual experience.
17.2 PERCEPTION OF MULTISENSORY TEMPORAL INFORMATION AND ITS COHERENCE The temporal dimension of our everyday experience is an inescapable and fundamental part of our perceptual and cognitive existence (Fraisse 1982; Greenfield 1991; Lashley 1951; Martin 1972; Nelson 1986, 2007). The temporal flow of stimulation provides a host of perceptual cues that observers can use to detect the coherence, global structure, and even the hidden meanings inherent in multisensory events. For example, when people speak, they produce a series of mouth movements and vocalizations. At a basic level, the onsets and offsets of mouth movements and the accompanying vocalizations are precisely synchronized. This allows observers to determine that the movements and vocalizations are part of a coherent multisensory event. Of course, detecting that multisensory inputs correspond in terms of their onsets and offsets is not terribly informative because it does not provide any information about the other key and overlapping characteristics of multisensory events. For example, in the case of audiovisual speech, synchrony does not provide information about the invariant durations of the audible and visible utterances nor about the correlated dynamic temporal and spectral patterns across audible and visible articulations that are normally available and used by adults (Munhall and Vatikiotis-Bateson 2004; Yehia et al. 1998). These latter perceptual cues inform the observer about the amodal invariance of the event and, thus, serve as another important basis for the perception of multisensory coherence.* Finally, at an even more global level, the temporal patterning (i.e., rhythm) of the audible and visible attributes of a continuous utterance (i.e., a string of words) not only can provide another important basis for the perception of multisensory coherence but can also provide cues to “hidden” meanings. The hidden meanings derive from the particular ordering of the different constituents (e.g., syllables, words, and phrases) and when those constituents are specified by multisensory attributes, their extraction can be facilitated by multisensory redundancy effects (Bahrick et al. 2004; Lewkowicz and Kraebel 2004). As might be expected, adults are highly sensitive to multisensory temporal information. This is evident from the results of studies showing that adults can perceive temporally based multisensory coherence (Gebhard and Mowbray 1959; Handel and Buffardi 1969; Myers et al. 1981; Shipley 1964; Welch et al. 1986). It is also evident from the results of other studies showing that adults’ responsiveness to the temporal aspects of stimulation in one sensory modality can influence their responsiveness to the temporal aspects of stimulation in another modality. For example, when adults hear a fluttering sound, their perception of a flickering light changes as a function of the frequency of the flutter; the flutter “drives” the flicker (Myers et al. 1981). Particularly interesting are findings showing that some forms of temporal intersensory interaction can produce illusions (Sekuler et al. 1997; Shams et al. 2000) or can influence the strength of illusions (Slutsky and Recanzone 2001). For * There is a functionally important distinction between intersensory cues such as duration, tempo, and rhythm, on the one hand, and intersensory temporal synchrony cues, on the other. The former are all amodal stimulus attributes because they can be specified independently in different modalities and, as a result, can be perceived even in the absence of temporal synchrony cues (e.g., even if the auditory and visual attributes of a speech utterance are not presented together, their equal duration can be perceived). In contrast, temporal synchrony is not an amodal perceptual cue because it cannot be specified independently in a single sensory modality; an observer must have access to the concurrent information in the different modalities to perceive it. Moreover, and especially important for developmental studies, infants might be able to perceive intersensory synchrony relations without being able to perceive the amodal cues that characterize the multisensory attributes (e.g., an infant might be able to perceive that a talking face and the vocalizations that it produces belong together but may not be able to detect the equal duration of the visible and audible articulations).
Development of Multisensory Temporal Perception
327
example, when adults see a single flash and hear two tones, they report two flashes even though they know that there is only a single flash (Shams et al. 2000). Similarly, when two identical objects are seen moving toward and then through each other and a brief sound is presented at the point of their coincidence, adults report that the objects bounce against each other rather than pass through one another (Sekuler et al. 1997). This “bounce” illusion emerges in infancy in that starting at 6 months of age, infants begin to exhibit evidence that they experience it as well (Scheier et al. 2003). Even though the various amodal and invariant temporal attributes are natural candidates for the perception of multisensory coherence, there are good a priori theoretical reasons to expect that intersensory temporal synchrony might play a particularly important role during the earliest stages of development (Gibson 1969; Lewkowicz 2000a; Thelen and Smith 1994) and that young infants may not perceive the kinds of higher-level amodal invariants mentioned earlier. One reason for this may be the fact that, unlike in the case of the detection of higher-level amodal invariants, it is relatively easy to detect multisensory temporal synchrony relations. All that is required is the detection of the concurrent onsets and offsets of stimulus energy across modalities. In contrast, the detection of amodal cues requires the ability to perceive the equivalence of some of the higher-level types of correlated patterns of information discussed earlier. Moreover, sometimes observers are even required to detect such patterns when they are not available concurrently and can do so too (Kamachi et al. 2003). Infants also exhibit this ability but, thus far, evidence indicates that they can do so only starting at 6 months of age (Pons et al. 2009) and no studies have shown that they can perform this kind of task earlier. Although young infants’ presumed inability to perceive amodal cues might seem like a serious limitation, it has been argued by some that developmental limitations actually serve an important function (Oppenheim 1981). With specific regard to multisensory functions, Turkewitz has argued that sensory limitations help infants organize their perceptual world in an orderly fashion while at the same time not overwhelming their system (Turkewitz 1994; Turkewitz and Kenny 1982). From this perspective, the ability to detect temporal synchrony cues very early in life makes it possible for young, immature, and inexperienced infants to first discover that multisensory inputs cohere together, albeit at a very low level. This, in turn, gives them an entrée into a multisensory world composed not only of the various higher-level amodal invariants mentioned earlier but other higherlevel nontemporal multisensory attributes such as gender, affect, and identity. Most theorists agree that the general processes that mediate this gradual improvement in multisensory processing ability are perceptual learning and differentiation in concert with infants’ everyday experience and sensorimotor interactions with their multisensory world. Extant empirical findings are generally consistent with the theoretical developmental pattern described above. For instance, young infants can detect the synchronous onsets of inanimate visual and auditory stimuli (Lewkowicz 1992a, 1992b, 1996) and rely on synchrony cues to perceive the amodal property of duration (Lewkowicz 1986). Likewise, starting at birth and thereafter, infants can detect the synchronous relationship between the audible and visible attributes of vocalizing faces (Lewkowicz 2000b, 2010; Lewkowicz and Ghazanfar 2006; Lewkowicz et al. 2010). Interestingly, however, when the multisensory temporal task is too complex (i.e., when it requires infants to detect which of two objects that are moving at different tempos corresponds to a synchronous sound) synchrony cues are not sufficient for the perception of multisensory coherence (Lewkowicz 1992a, 1994). Similarly, when the relationship between two moving objects and a sound that occurs during their coincidence is ambiguous (as is the case in the bounce illusion), 6- and 8-month-old infants perceive this relationship but 4-month-olds do not.
17.3 DEVELOPMENTAL EMERGENCE OF MULTISENSORY PERCEPTION: GENERAL PATTERNS AND EFFECTS OF EXPERIENCE As indicated above, data from studies of infant response to multisensory temporal information suggest that multisensory processing abilities improve during the first year of life. If that is the case,
328
The Neural Bases of Multisensory Processes
do these findings reflect a general developmental pattern? The answer is that the same pattern holds for infant perception of other types of multisensory perceptual cues. To make theoretical sense of the overall body of findings on the developmental emergence of multisensory perceptual abilities in infancy, it is helpful to first ask what the key theoretical questions are in this area. If, as indicated earlier, infants’ initial immaturity and relative lack of experience imposes serious limitations on their ability to integrate the myriad inputs that constantly bombard their perceptual systems, how do they go about integrating those inputs and how does this process get bootstrapped at the start of postnatal life? As already suggested, one possible mechanism is a synchrony detection mechanism that simply detects synchronous stimulus onsets and offsets across different modalities. This, in turn, presumably provides developing infants with the opportunity to gradually discover increasingly more complex multisensory coherence cues. Although the detection of multisensory synchrony is one possible specific mechanism that can mediate developmental change, other more general processes probably contribute to developmental change as well. Historically, these more general processes have been proposed in what appear to be two diametrically opposed theoretical views concerning the development of multisensory functions. One of these views holds that developmental differentiation is the process underlying developmental change, whereas the other holds that developmental integration is the key process. More specifically, the first, known as the developmental differentiation view, holds that infants come into the world prepared to detect certain amodal invariants and that this ability improves and broadens in scope as they grow (Gibson 1969; Thelen and Smith 1994; Werner 1973). According to the principal proponent of this theoretical view (Gibson 1969), the improvement and broadening is mediated by perceptual differentiation, learning, and the emergence of increasingly better stimulus detection abilities. The second, known as the developmental integration view, holds that infants come into the world with their different sensory systems essentially disconnected and that the senses gradually become functionally connected as a result of children’s active interaction with their world (Birch and Lefford 1963, 1967; Piaget 1952). One of the most interesting and important features of each of these theoretical views is that both assign central importance to developmental experience. A great deal of empirical evidence has been amassed since the time that the two principal theoretical views on the development of multisensory functions were proposed. It turns out that some of this evidence can be interpreted as consistent with the developmental differentiation view whereas some of it can be interpreted as consistent with the developmental integration view. Overall, then, it seems that both processes play a role in the developmental emergence of multisensory functions. The evidence that is consistent with the developmental differentiation view comes from studies showing that despite the fact that the infant nervous system is highly immature and, despite the fact that infants are perceptually inexperienced, infants exhibit some multisensory perceptual abilities from birth onward (Gardner et al. 1986; Lewkowicz et al. 2010; Lewkowicz and Turkewitz 1980, 1981; Slater et al. 1997, 1999). Importantly, however, and as indicated earlier, these abilities are relatively rudimentary. For example, newborns can detect multisensory synchrony cues and do so by detecting nothing more than stimulus energy onsets and offsets (Lewkowicz et al. 2010). In addition, newborns are able to detect audiovisual (A–V) intensity equivalence (Lewkowicz and Turkewitz 1980) and can associate arbitrary auditory and visual object attributes on the basis of their synchronous occurrence (Slater et al. 1997, 1999). Although impressive, these kinds of findings are not surprising given that there are ample opportunities for intersensory interactions—especially those involving the co-occurrence of sensations in different modalities—during fetal life and that these interactions are likely to provide the foundation for the kinds of rudimentary multisensory perceptual abilities found at birth (Turkewitz 1994). Other evidence from the body of empirical work amassed to date is consistent with the developmental integration view by indicating that multisensory perceptual abilities improve as infants grow and acquire perceptual experience (Bremner et al. 2008; Lewkowicz 1994, 2000a, 2002; Lickliter and Bahrick 2000; Walker-Andrews 1997). This evidence shows that older infants possess more sophisticated multisensory processing abilities than do younger infants. For example, young infants
Development of Multisensory Temporal Perception
329
can perceive multisensory synchrony cues (Bahrick 1983; Bahrick and Lickliter 2000; Lewkowicz 1992a,b, 1996, 2000b, 2003, 2010), amodal intensity (Lewkowicz and Turkewitz 1980), amodal duration (Lewkowicz 1986), and the multisensory invariance of isolated audible and visible phonemes (Brookes et al. 2001; Kuhl and Meltzoff 1982, 1984; Patterson and Werker 2003). In contrast, however, whereas younger infants do not, older infants (roughly older than 6 months of age) also exhibit the ability to perceive amodal affects produced by strangers (Walker-Andrews 1986) and amodal gender (Patterson and Werker 2002; Walker-Andrews et al. 1991), bind arbitrary modalityspecific cues (Bahrick 1994; Reardon and Bushnell 1988), integrate auditory and visual spatial cues in an adult-like manner (Neil et al. 2006), and integrate multisensory spatial bodily and external cues (Bremner et al. 2008). Considered together, this latter body of findings clearly shows that multisensory perceptual abilities improve over the first year of life. Thus, when all the extant empirical evidence is considered together, it is clear that developmental differentiation and developmental integration processes operate side-by-side in early human development and that both contribute to the emergence of multisensory perceptual abilities in infancy and probably beyond. If developmental differentiation and integration both contribute to the development of multisensory perception, what role might experience play in this process? As might be expected (Gibson 1969), evidence from studies of human infants indicates that experience plays a critical role in the development of multisensory functions. Until now, however, very little direct evidence for the effects of early experience was available at the human level except for two studies that together demonstrated that infant response to amodal affect information depends on the familiarity of the information. Thus, in the first study, Walker-Andrews (1986) found that 7-month-olds but not 5-month-olds detected amodal affect when the affect was produced by a stranger. In the second study, KahanaKalman and Walker-Andrews (2001) found that when the affect was produced by the infant’s own mother, infants as young as 3.5 months of age detected it. More recently, my colleagues and I have discovered a particularly intriguing and seemingly paradoxical effect of experience on the development of multisensory responsiveness. We have discovered that some multisensory perceptual functions are initially present early in life and then decline as infants age. This multisensory perceptual narrowing phenomenon was not predicted by either the developmental differentiation or the developmental integration view. In these recent studies, we have found that infants between birth and 6 months of age can match monkey faces and the vocalizations that they produce but that older infants no longer do so (Lewkowicz and Ghazanfar 2006; Lewkowicz et al. 2008, 2010). In addition, we have found that 6-month-old infants can match visible and audible phonemes regardless of whether these phonemes are functionally relevant in their own language or in other languages (Pons et al. 2009). Specifically, we found that 6-month-old Spanish-learning infants can match a visible /ba/ to an audible /ba/ and a visible /va/ to an audible /va/, whereas 11-month-old Spanish-learning infants no longer do so. In contrast, English-learning infants can make such matches at both ages. The failure of the older Spanish-learning infants to make the matches is correlated with the fact that the /ba/ – /va/ phonetic distinction is not phonemically functional in Spanish. This means that when older Spanish-learning infants have to choose between a face mouthing a /ba/ and a face mouthing a /va/ after having listened to one of these phonemes, they cannot choose the matching face because the phonemes are no longer distinct for them. Together, our findings on multisensory perceptual narrowing indicate that as infants grow and gain experience with vocalizing human faces and with native language audiovisual phonology, their ability to perceive cross-species and cross-language multisensory coherence declines because nonnative multisensory information is not relevant for everyday functioning. We have also explored the possible evolutionary origins of multisensory perceptual narrowing and, thus far, have found that it seems to be restricted to the human species. We tested young vervet monkeys, at ages when they are old enough to be past the point of narrowing, with the same vocalizing rhesus monkey faces that we presented in our initial infant studies and found that vervets do not exhibit multisensory perceptual narrowing (Zangenehpour et al. 2009). That is, the vervets matched rhesus monkey visible and audible vocalizations even though they were past the point when
330
The Neural Bases of Multisensory Processes
narrowing should have occurred. We interpreted this finding as reflecting the fact that monkey brains mature four times as fast as human brains do and that, as a result, young vervets are less open to the effects of early experience than are human infants. This interpretation suggests that experience interacts with the speed of neural growth and differentiation and that slower brain growth and differentiation is highly advantageous because it provides for greater developmental plasticity. The vervet monkey study demonstrates that the rate of neural growth plays an important role in the development of behavioral functions and provides yet another example illustrating this key developmental principle (Turkewitz and Kenny 1982). What about neural and experiential immaturity, especially at the beginning of postnatal and/or posthatching life? Do other organisms, besides humans, manifest relatively poor and immature multisensory processing functions? The answer is that they do. A number of studies have found that the kinds of immaturities and developmental changes observed in human infants are also found in the young of other species. Together, these studies have found that rats, cats, and monkeys exhibit relatively poor multisensory responsiveness early in life, that its emergence follows a pattern of gradual improvement, and that early experience plays a critical role in this process. For example, Wallace and Stein (1997, 2001) have found that multisensory cells in the superior colliculus of cats and rhesus monkeys, which normally integrate auditory and visual spatial cues in the adult, do not integrate in newborn cats and monkeys, and that integration only emerges gradually over the first weeks of life. Moreover, Wallace et al. (2006) have found that the appropriate alignment of the auditory and visual maps in the superior colliculus of the rat depends on their normal spatial coregistration. The same kinds of effects have been found in barn owls and ferrets, in which calibration of the precise spatial tuning of the neural map of auditory space depends critically on concurrent visual input (King et al. 1988; Knudsen and Brainard 1991). Finally, in bobwhite quail hatchlings, the ability to respond to the audible and visible attributes of the maternal hen after hatching depends on prehatching and posthatching experience with the auditory, visual, and tactile stimulation arising from the embryo’s own vocalizations, the maternal hen, and broodmates (Lickliter and Bahrick 1994; Lickliter et al. 1996). Taken together, the human and animal data indicate that the general developmental pattern consists of an initial emergence of low-level multisensory abilities, a subsequent experience-dependent improvement of emerging abilities, and finally, the emergence of higher-level multisensory abilities. This developmental pattern, especially in humans, appears to be due to the operation of developmental differentiation and developmental integration processes. Moreover, and most intriguing, our recent discovery of multisensory perceptual narrowing indicates that even though young infants possess relatively crude and low-level types of multisensory perceptual abilities (i.e., sensitivity to A–V synchrony relations), these abilities imbue them with much broader multisensory perceptual tuning than is the case in older infants. As indicated earlier, the distinct advantage of this kind of tuning is that it provides young infants with a way of bootstrapping their multisensory perceptual abilities at a time when they are too immature and inexperienced to extract higher-level amodal attributes. In the remainder of this chapter, I review results from our studies on infant response to multisensory temporal information as an example of the gradual emergence of multisensory functions. Moreover, I review additional evidence of the role of developmental differentiation and integration processes as well as of early experience in the emergence of multisensory responsiveness. Finally, I speculate on the neural mechanisms that might underlie the developmental emergence of multisensory perception and highlight the importance of studying the interaction between neural and behavioral growth and experience.
17.4 PERCEPTION OF TEMPORAL INFORMATION IN INFANCY As indicated earlier, the temporal dimension of stimulation is the multisensory attribute par excellence because it provides observers with various types of overlapping patterns of multisensory information. For infants, this means that they have a ready-made and powerful basis for coherent
Development of Multisensory Temporal Perception
331
and cognitively meaningful multisensory experiences. This, of course, assumes that they are sensitive to the temporal flow of information in each modality. Indeed, evidence indicates that infants are sensitive to temporal information at both the unisensory and multisensory levels. For example, it has been found that infants as young as 3 months of age can predict the occurrence of a visual stimulus at a particular location based on their prior experience with a temporally predictable pattern of spatiotemporally alternating visual stimuli (Canfield and Haith 1991; Canfield et al. 1997). Similarly, it has been found that 4-month-old infants can quickly learn to detect a “missing” visual stimulus after adaptation to a regular and predictable visual stimulus regimen (Colombo and Richman 2002). In the auditory modality, studies have shown that newborn infants (1) exhibit evidence of temporal anticipation when they hear a tone that is not followed by glucose—after the tone (CS) and the glucose (UCS) were paired during an initial conditioning phase (Clifton 1974) and (2) can distinguish between different classes of linguistic input on the basis of the rhythmic attributes of the auditory input (Nazzi and Ramus 2003). Finally, in the audiovisual domain, it has been found that 7-monthold infants can anticipate the impending presentation of an audiovisual event when they first hear a white noise stimulus that has previously reliably predicted the occurrence of the audiovisual event (Donohue and Berg 1991), and that infants’ duration discrimination improves between 6 and 10 months of age (Brannon et al. 2007). Together, these findings indicate that infants are generally sensitive to temporal information in the auditory and visual modalities.
17.5 PERCEPTION OF A–V TEMPORAL SYNCHRONY Earlier, it was indicated that the multisensory world consists of patterns of temporally coincident and amodally invariant information (Gibson 1966) and that infants are likely to respond to A–V temporal synchrony relations from an early age. There are two a priori reasons why this is the case. The perceptual basis for this has already been mentioned, namely, that the detection of temporal A–V synchrony is relatively easy because it only requires perception of synchronous energy onsets and offsets in different modalities. In addition, the neural mechanisms underlying the detection of intersensory temporal synchrony cues in adults are relatively widespread in the brain and are largely subcortical (Bushara et al. 2001). Given that at least some of these mechanisms are subcortical, this makes it likely that such mechanisms are also present and operational in the immature brain. Consistent with these expectations, results from behavioral studies have shown that, starting early in life, infants respond to A–V temporal synchrony and that this cue is primary for them. These results have revealed (1) that 6- and 8-month-old infants can match pulsing auditory and flashing static visual stimuli on the basis of their duration but only if the matching pair is also synchronous (Lewkowicz 1986); (2) that 4- and 8-month-old infants can match an impact sound to one of two bouncing visual stimuli on the basis of synchrony but not on the basis of tempo, regardless of whether the matching tempos are synchronous or not (Lewkowicz 1992a, 1994); (3) that 4- to 8-month-old infants can perceive A–V synchrony relations inherent in simple audiovisual events consisting of bouncing/sounding objects (Lewkowicz 1992b) as well as those inherent in vocalizing faces (Lewkowicz 2000b, 2003); and (d) that newborns (Lewkowicz et al. 2010) and 4- to 6-monthold infants (Lewkowicz and Ghazanfar 2006) can rely on A–V synchrony to match other species’ facial and vocal expressions.
17.5.1 A–V Temporal Synchrony Threshold Given the apparently primary importance of A–V temporal synchrony, Lewkowicz (1996) conducted a series of studies to investigate the threshold for the detection of A–V temporal asynchrony in 2-, 4-, 6-, and 8-month-old infants and compared it to that in adults tested in a similar manner. Infants were first habituated to a two-dimensional object that could be seen bouncing up and down on a computer monitor and an impact sound that occurred each time the object changed direction at the bottom of the monitor. They were then given a set of separate test trials during which the
332
The Neural Bases of Multisensory Processes
impact sound was presented 150, 250, and 350 ms before the object’s visible bounce (sound-first group) or 250, 350, or 450 ms after the visible bounce (sound-second group). Infants in the soundfirst group detected the 350 ms asynchrony, whereas infants in the sound-second group detected the 450 ms asynchrony (no age effects were found). Adults, who were tested in a similar task and with the same stimuli, detected an asynchrony of 80 ms in the sound-first condition and 112 ms in the sound-second condition. Conceptualizing these results in terms of an intersensory temporal contiguity window (ITCW), they indicate that the ITCW is wider in infants than it is in adults and that it decreases in size during development.
17.5.2 Perception of A–V Speech Synchrony and Effects of Experience In subsequent studies, we found that the ITCW is substantially larger for multisensory speech than for abstract nonspeech events. In the first of these studies (Lewkowicz 2000b), we habituated 4-, 6-, and 8-month-old infants to audiovisually synchronous syllables (/ba/ or /sha/) and then tested their response to audiovisually asynchronous versions of these syllables (sound-first condition only) and found that, regardless of age, infants only detected an asynchrony of 666 ms (in pilot work, we tested infants with much lower asynchronies but did not obtain discrimination). We then replicated the finding of such a high discrimination threshold in a subsequent study (Lewkowicz 2003) in which we found that 4- to 8-month-old infants detected an asynchrony of 633 ms. It should be noted that other than our pilot work, these two studies only tested infants with one degree of A–V asynchrony. In other words, we did not formally investigate the size of the ITCW for audiovisual speech events until more recently. We investigated the size of the ITCW in our most recent studies (Lewkowicz 2010). In addition, in these studies, we examined the effects of short-term experience on the detection of A–V temporal synchrony relations and the possible mechanism underlying the detection of A–V synchrony relations. To determine the size of the ITCW, in Experiment 1, we habituated 4- to 10-month-old infants to an audiovisually synchronous syllable and then tested for their ability to detect three increasingly greater levels of asynchrony (i.e., 366, 500, and 666 ms). Infants exhibited response recovery to the 666 ms asynchrony but not to the other two asynchronies, indicating that the threshold was located between 500 and 666 ms (see Figure 17.1). Prior studies in adults have shown that when they are first tested with audiovisually asynchronous events, they perceive them as asynchronous. If, however, they are first given short-term exposure to an asynchronous event and are tested again for detection of asynchrony, they now respond to such events as if they are synchronous (Fujisaki et al. 2004; Navarra et al. 2005; Vroomen et al. 2004). In other words, short-term adaptation to audiovisually asynchronous events appears to widen the ITCW in adults. One possible explanation for this adaptation effect is that it is partly due to an experience-dependent synchrony bias that develops during adults’ lifetime of experience with exclusively synchronous audiovisual events. This bias presumably leads to the formation of an audiovisual “unity assumption” (Welch and Warren 1980). If that is the case, then it might be that infants may not exhibit an adaptation effect because of their relatively lower overall experience with synchronous multisensory events and the absence of a unity assumption. More specifically, infants may not exhibit a widening of their ITCW after habituation to an asynchronous audiovisual event. If so, rather than fail to discriminate between the asynchronous event and those that are physically less synchronous, infants may actually exhibit a decrease in the size of the ITCW and exhibit even better discrimination. To test this possibility, in Experiment 2, we habituated a new group of 4- to 10-month-old infants to an asynchronous syllable (A–V asynchrony was 666 ms) and then tested them for the detection of decreasing levels of asynchrony (i.e., 500, 366, and 0 ms). As predicted, this time, infants not only discriminated between the 666 ms asynchrony and temporal synchrony (0 ms), but they also discriminated between the 666 ms asynchrony and an asynchrony of 366 ms (see Figure 17.2). That is, short-term adaptation with a discriminable A–V asynchrony produced a decrease, rather than an increase, in the size of the ITCW. These results show that in the absence
333
Development of Multisensory Temporal Perception
Mean duration of looking (s)
12
Fam-0 ms. Nov-366 ms. Nov-500 ms. Nov-666 ms.
*
10 8 6 4 2 0
Test Trials
FIGURE 17.1 Mean duration of looking during test trials in response to each of three different A–V temporal asynchronies after habituation to a synchronous audiovisual syllable. Error bars indicate standard error of mean and asterisk indicates that response recovery in that particular test trial was significantly higher than response obtained in the familiar test trial (Fam-0 ms.).
of a unity assumption, short-term exposure to an asynchronous multisensory event does not cause infants to treat it as synchronous but rather focuses their attention on the event’s temporal attributes and, in the process, sharpens their perception of A–V temporal relations. Finally, to investigate the mechanisms underlying A–V asynchrony detection, in Experiment 3, we habituated infants to a synchronous audiovisual syllable and then tested them again for the detection of asynchrony with audiovisual asynchronies of 366, 500, and 666 ms. This time, however, the test stimuli consisted of a visible syllable and a 400 Hz tone rather than the audible syllable.
Mean duration of looking (s)
12 10 8
Fam-666 ms. Nov-500 ms. Nov-366 ms. Nov-0 ms.
* *
6 4 2 0
Test Trials
FIGURE 17.2 Mean duration of looking during test trials in response to each of three different A–V temporal asynchronies after habituation to an asynchronous audiovisual syllable. Error bars indicate standard error of mean and asterisks indicate that response recovery in those particular test trials was significantly higher than response obtained in the familiar test trial (Fam-666 ms.).
334
The Neural Bases of Multisensory Processes
Mean duration of looking (s)
12
*
Fam-0 ms. Nov-366 ms. Nov-500 ms. Nov-666 ms.
10 8 6 4 2 0
Test Trials
FIGURE 17.3 Mean duration of looking during test trials in response to each of three different A–V temporal asynchronies after habituation to an audiovisual stimulus consisting of a visible syllable and a synchronous tone. Error bars indicate standard error of mean and asterisk indicates that response recovery in that particular test trial was significantly higher than response obtained in the familiar test trial (Fam-0 ms.).
Substituting the tone for the acoustic part of the syllable was done to determine whether the dynamic variations in the spectral energy inherent in the acoustic part of the audiovisual speech signal and/ or their correlation with the dynamic variations in gestural information contribute to infant detection of A–V speech synchrony relations. Once again, infants detected the 666 ms asynchrony but not the two lower ones (see Figure 17.3). The fact that these findings replicated the findings from Experiment 1 indicates that infants do not rely on acoustic spectral energy nor on its correlation with the dynamic variations in the gestural information to detect A–V speech synchrony relations. Rather, it appears that infants attend primarily to energy onsets and offsets when processing A–V speech synchrony relations, suggesting that detection of such relations is not likely to require the operation of higher-level neural mechanisms.
17.5.3 Binding of Nonnative Faces and Vocalizations Given that energy onsets and offsets provide infants with sufficient information regarding the temporal alignment of auditory and visual inputs, the higher-level perceptual features of the stimulation in each modality are probably irrelevant to them. This is especially likely early in life where the nervous system and the sensory systems are highly immature and inexperienced. As a result, it is possible that young infants might perceive the faces and vocalizations of other species as belonging together as long as they are synchronous. We tested this idea by showing side-by-side videos of the same monkey’s face producing two different visible calls on each side to groups of 4-, 6-, 8-, and 10-month-old infants (Lewkowicz and Ghazanfar 2006). During the two initial preference trials, infants saw the faces in silence, whereas during the next two trials, infants saw the same faces and heard the audible call that matched one of the two visible calls. The different calls (a coo and a grunt) differed in their durations and, as a result, the matching visible and audible calls corresponded in terms of their onsets and offsets as well as their durations. In contrast, the nonmatching ones only corresponded in terms of their onsets. We expected that infants would look longer at the visible call that matched the audible call if they perceived the temporal synchrony that bound them. Indeed, we found that the two younger groups of infants matched the corresponding faces and vocalizations but that the two older groups did not. These results indicate that young infants can rely on A–V
Development of Multisensory Temporal Perception
335
synchrony relations to perceive even nonnative facial gestures and accompanying vocalizations as coherent entities. The older infants no longer do so for two related reasons. First, they gradually shift their attention to higher-level perceptual features as a function of increasing neural growth, maturation of their perceptual systems, and increasing perceptual experience all acting together to make it possible for them to extract such features. Second, their exclusive and massive experience with human faces and vocalizations narrows their perceptual expertise to ecologically relevant signals. In other words, as infants grow and as they acquire experience with vocalizing faces, they learn to extract more complex features (e.g., gender, affect, and identity), rendering low-level synchrony relations much less relevant. In addition, as infants grow, they acquire exclusive experience with human faces and vocalizations and, as a result, become increasingly more specialized. As they specialize, they stop responding to the faces and vocalizations of other species. Because the matching faces and vocalizations corresponded not only in terms of onset and offset synchrony but in terms of duration as well, the obvious question is whether amodal duration might have contributed to multisensory matching. To investigate this question, we repeated the Lewkowicz and Ghazanfar (2006) procedures in a subsequent study (Lewkowicz et al. 2008), except that this time, we presented the monkey audible calls out of synchrony with respect to both visible calls. This meant that the corresponding visible and audible calls were now only related in terms of their duration. Results yielded no matching in either the 4- to 6-month-old or the 8- to 10-month-old infants, indicating that A–V temporal synchrony mediated successful matching in the younger infants. The fact that the younger infants did not match in this study, despite the fact that the corresponding faces and vocalizations corresponded in their durations, shows that duration did not mediate matching in the original study. This is consistent with previous findings that infants do not match equal-duration auditory and visual inputs unless they are also synchronous (Lewkowicz 1986). If A–V temporal synchrony mediates intersensory matching in young infants, and if responsiveness to this multisensory cue depends on a basic and relatively low-level process, then it is possible that cross-species multisensory matching emerges very early in development. To determine if that is the case, we asked whether newborns also might be able to match monkey faces and vocalizations (Lewkowicz et al. 2010). In Experiment 1 of this study, we used the identical stimulus materials and testing procedures used by Lewkowicz and Ghazanfar (2006), and found that newborns also matched visible and audible monkey calls. We then investigated whether successful matching reflected matching of the synchronous onsets and offsets of the audible and visible calls. If so, then newborns should be able to make the matches even when some of the identity information is removed. Thus, we repeated Experiment 1, except that rather than present the natural call, we presented a complex tone in Experiment 2. To preserve the critical temporal features of the audible call, we ensured that the tone had the same duration as the natural call and that its onsets and offsets were synchronous with the matching visible call. Despite the absence of acoustic identity information and the absence of a correlation between the dynamic variations in facial gesture information and the amplitude and formant structure inherent in the natural audible call, newborns still performed successful intersensory matching. This indicates that newborns’ ability to make cross-species matches in Experiment 1 was based on their sensitivity to the temporally synchronous onsets and offsets of the matching faces and vocalizations and that it was not based on identity information nor on the dynamic correlation between the visible and audible call features. Together, the positive findings of cross-species intersensory matching in newborns and 4- to 6-month-old infants demonstrate that young infants are sensitive to a basic feature of their perceptual world, namely, stimulus energy onsets and offsets. This basic perceptual sensitivity bootstraps newborns’ entry into the world of multisensory objects and events and enables them to perceive them as coherent entities, regardless of their specific identity. This sensitivity is especially potent when the visual information is dynamic. When it is not, infants do not begin to bind the auditory and visual attributes of multisensory objects, such as color/shape and pitch, or color and taste until the second half of the first year of life. The pervasive and fundamental role that A–V temporal synchrony plays in infant perceptual response to multisensory attributes suggests that sensitivity to this
336
The Neural Bases of Multisensory Processes
intersensory perceptual cue reflects the operation of a fundamental early perceptual mechanism. That is, as indicated earlier, even though sensitivity to A–V temporal synchrony is mediated by relatively basic and low-level processing mechanisms, it provides infants with a powerful initial perceptual tool for gradually discovering that multisensory objects are characterized by many other forms of intersensory invariance. For example, once infants start to bind the audible and visible attributes of talking faces, they are in a position to discover that faces and the vocalizations that accompany them could also be specified by common duration, tempo, and rhythm, as well as by higher-level amodal and invariant attributes such as affect, gender, and identity.
17.6 PERCEPTION OF MULTISENSORY TEMPORAL SEQUENCES IN INFANCY Multisensory objects often participate in complex actions that are sequentially organized over time. For example, when people speak, they simultaneously produce sequences of vocal sounds and correlated facial gestures. The syntactically prescribed order of the syllables and words imbues utterances with specific meanings. Unless infants master the ability to extract the sequential structure from such an event, they will not be able to acquire language. Because this ability is so fundamental to adaptive perceptual and cognitive functioning, we have investigated its developmental emergence. When we began these studies, there was little, if any, empirical evidence on infant perception of multisensory temporal sequences to guide our initial exploration of this issue. Prior theoretical views claimed that sequence learning is an innate ability (Greenfield 1991; Nelson 1986) but neither of these views specified what they meant by sequence learning abilities nor what infants should be capable of doing in this regard. Indeed, recent empirical research on infant pattern and sequence perception has contradicted the claim that this ability is innate and, if anything, has shown that sequence perception and learning is a very complex skill that consists of several component skills and that it takes several years to reach adult levels of proficiency (Gulya and Colombo 2004; Thomas and Nelson 2001). Although no studies have investigated sequence perception and learning at birth, studies have shown that different sequence perception abilities, including the ability to perceive and learn adjacent and distant statistical relations, simple sequential rules, and ordinal position information, emerge at different points in infancy. Thus, beginning as early as 2 months of age, infants can learn adjacent statistical relations that link a series of looming visual shapes (Kirkham et al. 2002; Marcovitch and Lewkowicz 2009), by 8 months, they can learn the statistical relations that link adjacent static object features (Fiser and Aslin 2002) as well as adjacent nonsense words in a stream of sounds (Saffran et al. 1996), and by 15 months, they begin to exhibit the ability to learn distant statistical relations (Gómez and Maye 2005). Moreover, although infants as young as 5 months of age can learn simple abstract temporal rules such as one specifying the order (e.g., AAB vs. ABB) of distinct elements consisting of abstract objects and accompanying speech sounds (Frank et al. 2009), only 7.5-month-old infants can learn such rules when they are instantiated by nonsense syllables (Marcus et al. 1999, 2007) and only 11-month-olds can learn simple rules instantiated by looming objects (Johnson et al. 2009). Finally, it is not until 9 months of age that infants can track the ordinal position of a particular syllable in a string of syllables (Gerken 2006). It is important to note that most of the studies of infant sequence learning have presented unisensory stimuli even though most of our daily perceptual experiences are multisensory in nature. As a result, we investigated whether the developmental pattern found thus far in the development of sequence perception and learning differs for multisensory sequences. To do so, we provided infants with an opportunity to learn a single audiovisual sequence consisting of distinct moving objects and their impact sounds, whereas in others we allowed infants to learn a set of different sequences in which each one was composed of different objects and impact sounds. Regardless of whether infants had to learn a single sequence or multiple ones, during the habituation phase, they could see the objects appear one after another at the top of a computer monitor and then move down toward a ramp at the bottom of the stimulus display monitor. When the objects reached the ramp, they
Development of Multisensory Temporal Perception
337
made an impact sound, turned to the right, and moved off to the side and disappeared. This cycle was repeated for the duration of each habituation trial. After habituation, infants were given test trials during which the order of sequence elements was changed in some way and the question was whether they detected the change. In an initial study (Lewkowicz 2004), we asked whether infants can learn a sequence composed of three moving/impacting objects and, if so, what aspects of that sequence they encoded. Results indicated that 4-month-old infants detected serial order changes only when the changes were specified concurrently by audible and visible attributes during the learning as well as the test phase and only when the impact part of the event—a local event feature that was not informative about sequential order—was blocked from view. In contrast, 8-month-old infants detected order changes regardless of whether they were specified by unisensory or bisensory attributes and whether they could see the impact or not. In sum, younger infants required multisensory redundancy to detect the serial order changes whereas older infants did not. A follow-up study (Lewkowicz 2008) replicated the earlier findings, ruled out primacy effects, extended the earlier findings by showing that even 3-month-old infants can perceive and discriminate three-element dynamic audiovisual sequences and that they also rely on multisensory redundancy for successful learning and discrimination. In addition, this study showed that object motion plays an important role in that infants exhibited less robust responsiveness to audiovisual sequences consisting of looming rather than explicitly moving objects. Because the changes in our two initial studies involved changes in the order of a particular object/ impact sound as well as its statistical relations vis-à-vis the other sequence elements, we investigated the separate role of each of these sequential attributes in our most recent work (Lewkowicz and Berent 2009). Here, we investigated directly whether 4-month-old infants could track the statistical relations among specific sequence elements (e.g., AB, BC), and/or whether they could also encode abstract ordinal position information (e.g., that B is the second element in a sequence such as ABCD). Thus, across three experiments, we habituated infants to sequences of four moving/ sounding objects in which three of the objects and their sounds varied in their ordinal position but in which the position of one target object/sound remained invariant (e.g., ABCD, CBDA). Figure 17.4 shows an example of one of these sequences and how they moved. We then tested whether the infants detected a change in the target’s position. We found that infants detected an ordinal position change only when it disrupted the statistical relations between adjacent elements, but not when the statistical relations were controlled. Together, these findings indicate that 4-month-old infants learn the order of sequence elements by tracking their statistical relations but not their invariant ordinal position. When these findings are combined with the previously reviewed findings on sequence
FIGURE 17.4 One of three different sequences presented during the habituation phase of the sequence learning experiment (actual objects presented are shown). Each object made a distinct impact sound when it came in contact with the black ramp. Across three different sequences, the triangle was the target stimulus and, thus, for one group of infants, the target remained in second ordinal position during habituation phase and then changed to third ordinal position in the test trials.
338
The Neural Bases of Multisensory Processes
learning in infancy, they show that different and increasingly more complex temporal sequence learning abilities emerge during infancy. For example, they suggest that the ability to perceive and learn the invariant ordinal position of a sequence element emerges sometime after 4 months of age. When it emerges and what mediates its emergence is currently an open question, as are the questions about the emergence of the other more complex sequence perception and learning skills.
17.7 SPECULATIONS ON NEURAL MECHANISMS UNDERLYING THE DEVELOPMENT OF MULTISENSORY PERCEPTION It is now abundantly clear that some basic multisensory processing abilities are present early in human development, and that as infants grow and as they acquire perceptual experience, these abilities improve. As indicated earlier, this general developmental pattern is consistent with the two classic theoretical views because the core predictions that both views make is that multisensory functions improve with development. Unfortunately, both views were silent about the possible neural mechanisms underlying the developmental emergence of multisensory processing. For example, although Gibson (1969) proposed that infants are sensitive to perceptual structure and the amodal invariants that are inherent in the structured stimulus array from birth onward, her insistence that the information is already integrated in the external perceptual array can be interpreted to mean that the nervous system does not play a significant role in integration. Of course, this assumption does not square with the results from modern neurobiological studies, which clearly show that the brain plays a crucial role in this process. Consequently, a more complete theoretical framework for conceptualizing the development of multisensory processing is one that not only acknowledges that the external stimulus array is highly structured but one that also admits that the perception of that structure is intimately dependent on neural mechanisms that have evolved to detect that structure (Ghazanfar and Schroeder 2006; Stein and Stanford 2008). In other words, perception of multisensory coherence at any point in development is the joint product of the infant’s ability to detect increasingly greater stimulus structure—because of the cumulative effects of sensory/perceptual experience and learning—and to the increasing elaboration of neural structures and their functional properties. The latter may not only permit the integration of multisensory inputs but sometimes may actually induce integral perception even when stimulation in the external sensory array is only unisensory (Romei et al. 2009). Like Gibson’s ecological view of multisensory perceptual development, the developmental integration view also failed to specify the underlying neural mechanisms that mediate the long-term effects of experience with the multisensory world and, thus, is subject to similar limitations. What possible neural mechanisms might mediate multisensory processing in early development? Traditionally, it has been assumed that the neural mechanisms that mediate multisensory processing are hierarchically organized with initial analysis being sensory-specific and only later analysis being multisensory (presumably once the information arrives in the classic cortical association areas). This hierarchical processing model has recently been challenged by findings showing that multisensory interactions in the primary cortical areas begin to occur as early as 40 to 50 ms after stimulation (Giard and Peronnet 1999; Molholm et al. 2002). Moreover, it has been suggested that multisensory interactions are not only mediated by feedback connections from higher-level cortical areas onto lower level areas but that they are also mediated by feedforward and lateral connections from lower-level primary cortical areas (Foxe and Schroeder 2005). As a result, there is a growing consensus that multisensory interactions occur all along the neuraxis, that multisensory integration mechanisms are widespread in the primate neocortex, and that this is what makes the perception of multisensory coherence possible (Ghazanfar and Schroeder 2006). This conclusion is supported by findings showing that traditionally unisensory areas actually contain neurons that respond to stimulation in other modalities. For example, responsiveness in the auditory cortex has been shown to be modulated by visual input in humans (Calvert et al. 1999), monkeys (Ghazanfar et al. 2005), ferrets (Bizley et al. 2007), and rats (Wallace et al. 2004).
Development of Multisensory Temporal Perception
339
If multisensory interactions begin to occur right after the sensory input stage and before sensory elaboration has occurred, and if such interactions continue to occur as the information ascends the neural pathways to the traditional association areas of the cortex, then this resolves a critical problem. From the standpoint of the adult brain, it solves the problem of having to wait until the higher-order cortical areas can extract the various types of relations inherent in multisensory input. This way, the observer can begin to perform a veridical scene analysis and arrive at a coherent multisensory experience shortly after input arrives at the sensory organs (Foxe and Schroeder 2005). From the standpoint of the immature infant brain, the adult findings raise some interesting possibilities. For example, because these early neural interactions are of a relatively low level, they are likely to occur very early in human development and can interact with any other low level subcortical integration mechanisms. Whether this scenario is correct is currently unknown and awaits further investigation. As shown here, behavioral findings from human infants support these conjectures in that starting at birth, human infants are capable of multisensory perception. Thus, the question is no longer whether such mechanisms operate but rather what is their nature and where in the brain are such mechanisms operational. Another interesting question is whether the heterochronous emergence of heterogeneous multisensory perceptual skills that has been found in behavioral infant studies (Lewkowicz 2002) is reflected in the operation of distinct neural mechanisms emerging at different times and in different regions of the brain. There is little doubt that the neural mechanisms underlying multisensory processing are likely to be quite rudimentary in early human development. The central nervous system as well as the different sensory systems are immature and young infants are perceptually and cognitively inexperienced. This is the case despite the fact that the tactile, vestibular, chemical, and auditory modalities begin to function before birth (Gottlieb 1971) and despite the fact that this provides fetuses with some sensory experience and some opportunity for intersensory interaction (Turkewitz 1994). Consequently, newborn infants are relatively unprepared for the onslaught of new multisensory input that also, for the first time, includes visual information. In addition, newborns are greatly limited by the immature nature of their different sensory systems (Kellman and Arterberry 1998). That is, their visual limitations include poor spatial and temporal resolution and poor sensitivity to contrast, orientation, motion, depth, and color. Their auditory limitations include much higher thresholds compared to adults and include higher absolute frequency, frequency resolution, and temporal resolution thresholds. Obviously, these basic sensory functions improve rapidly over the first months of life, but there is little doubt that they initially impose limitations on infant perception and probably account for some of the developmental changes found in the development of multisensory responsiveness. The question for future studies is: How do infants overcome these limitations? The work reviewed here suggests that the answer lies in the complex interactions between neural and behavioral levels of organization and in the daily experiences that infants have in their normal ecological setting. Because developmental change is driven by such interactions (Gottlieb et al. 2006), the challenge for future studies is to explicate these interactions.
REFERENCES Bahrick, L.E. 1983. Infants’ perception of substance and temporal synchrony in multimodal events. Infant Behavior & Development 6:429–51. Bahrick, L.E. 1994. The development of infants’ sensitivity to arbitrary intermodal relations. Ecological Psychology 6:111–23. Bahrick, L.E., and R. Lickliter. 2000. Intersensory redundancy guides attentional selectivity and perceptual learning in infancy. Developmental Psychology 36:190–201. Bahrick, L.E., R. Lickliter, and R. Flom. 2004. Intersensory redundancy guides the development of selective attention, perception, and cognition in infancy. Current Directions in Psychological Science 13:99–102. Birch, H.G., and A. Lefford. 1963. Intersensory development in children. Monographs of the Society for Research in Child Development 25. Birch, H.G., and A. Lefford. 1967. Visual differentiation, intersensory integration, and voluntary motor control. Monographs of the Society for Research in Child Development 32:1–87.
340
The Neural Bases of Multisensory Processes
Bizley, J.K., F.R. Nodal, V.M. Bajo, I. Nelken, and A.J. King. 2007. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex 17:2172–89. Brannon, E.M., S. Suanda, and K. Libertus. 2007. Temporal discrimination increases in precision over development and parallels the development of numerosity discrimination. Developmental Science 10:770–7. Bremner, A.J., N.P. Holmes, and C. Spence. 2008. Infants lost in (peripersonal) space? Trends in Cognitive Sciences 12:298–305. Brookes, H., A. Slater, P.C. Quinn et al. 2001. Three-month-old infants learn arbitrary auditory-visual pairings between voices and faces. Infant & Child Development 10:75–82. Bushara, K.O., J. Grafman, and M. Hallett. 2001. Neural correlates of auditory-visual stimulus onset asynchrony detection. Journal of Neuroscience 21:300–4. Calvert, G.A., M.J. Brammer, E.T. Bullmore et al. 1999. Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport: For Rapid Communication of Neuroscience Research 10:2619–23. Calvert, G.A., C. Spence, and B. Stein (eds.). 2004. The Handbook of Multisensory Processes. Cambridge, MA: MIT Press. Canfield, R.L., and M.M. Haith. 1991. Young infants’ visual expectations for symmetric and asymmetric stimulus sequences. Developmental Psychology 27:198–208. Canfield, R.L., E.G. Smith, M.P. Brezsnyak, and K.L. Snow. 1997. Information processing through the first year of life: A longitudinal study using the visual expectation paradigm. Monographs of the Society for Research in Child Development 62:v–vi, 1–145. Clifton, R.K. 1974. Heart rate conditioning in the newborn infant. Journal of Experimental Child Psychology 18:9–21. Colombo, J., and W.A. Richman. 2002. Infant timekeeping: Attention and temporal estimation in 4-month-olds. Psychological Science 13:475–9. Donohue, R.L., and W.K. Berg. 1991. Infant heart-rate responses to temporally predictable and unpredictable events. Developmental Psychology 27:59–66. Fiser, J., and R.N. Aslin. 2002. Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences of the United States of America 99:15822–6. Foxe, J.J., and C.E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical processing. Neuroreport 16:419. Fraisse, P. 1982. The adaptation of the child to time. In W.J. Friedman (ed.), The developmental psychology of time, 113–40. New York: Academic Press. Frank, M.C., J.A. Slemmer, G.F. Marcus, and S.P. Johnson. 2009. Information from multiple modalities helps 5-month-olds learn abstract rules. Developmental Science 12:504–9. Fujisaki, W., S. Shimojo, M. Kashino, and S.Y. Nishida. 2004. Recalibration of audiovisual simultaneity. Nature Neuroscience 7:773–8. Gardner, J.M., D.J. Lewkowicz, S.A. Rose, and B.Z. Karmel. 1986. Effects of visual and auditory stimulation on subsequent visual preferences in neonates. International Journal of Behavioral Development 9:251–63. Gebhard, J.W., and G.H. Mowbray. 1959. On discriminating the rate of visual flicker and auditory flutter. American Journal of Psychology 72:521–9. Gerken, L. 2006. Decisions, decisions: Infant language learning when multiple generalizations are possible. Cognition 98:B67–74. Ghazanfar, A.A., and C.E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10:278–85. Epub 2006 May 18. Ghazanfar, A.A., J.X. Maier, K.L. Hoffman, and N.K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25:5004–12. Giard, M.H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11:473–90. Gibson, J.J. 1966. The senses considered as perceptual systems. Boston: Houghton-Mifflin. Gibson, E.J. 1969. Principles of perceptual learning and development. New York: Appleton. Gómez, R.L., and J. Maye. 2005. The developmental trajectory of nonadjacent dependency learning. Infancy 7:183–206. Gottlieb, G. 1971. Ontogenesis of sensory function in birds and mammals. In The biopsychology of development, ed. E. Tobach, L.R. Aronson, and E. Shaw, 67–128. New York: Academic Press. Gottlieb, G., D. Wahlsten, and R. Lickliter. 2006. The significance of biology for human development: A developmental psychobiological systems view. In Handbook of child psychology, ed. R. Lerner, 210–57. New York: Wiley.
Development of Multisensory Temporal Perception
341
Greenfield, P.M. 1991. Language, tools and brain: The ontogeny and phylogeny of hierarchically organized sequential behavior. Behavioral and Brain Sciences 14:531–95. Gulya, M., and M. Colombo. 2004. The ontogeny of serial-order behavior in humans (Homo sapiens): Representation of a list. Journal of Comparative Psychology 118:71–81. Handel, S., and L. Buffardi. 1969. Using several modalities to perceive one temporal pattern. Quarterly Journal of Experimental Psychology 21:256–66. Johnson, S.P., K.J. Fernandes, M.C. Frank et al. 2009. Abstract rule learning for visual sequences in 8- and 11-month-olds. Infancy 14:2–18. Kahana-Kalman, R., and A.S. Walker-Andrews. 2001. The role of person familiarity in young infants’ perception of emotional expressions. Child Development 72:352–69. Kamachi, M., H. Hill, K. Lander, and E. Vatikiotis-Bateson. 2003. Putting the face to the voice: Matching identity across modality. Current Biology 13:1709–14. Kellman, P.J., and M.E. Arterberry. 1998. The cradle of knowledge: Development of perception in infancy. Cambridge, MA: MIT Press. King, A.J., M.E. Hutchings, D.R. Moore, and C. Blakemore. 1988. Developmental plasticity in the visual and auditory representations in the mammalian superior colliculus. Nature 332:73–6. Kirkham, N.Z., J.A. Slemmer, and S.P. Johnson. 2002. Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition 83:B35–42. Knudsen, E.I., and M.S. Brainard. 1991. Visual instruction of the neural map of auditory space in the developing optic tectum. Science 253:85–7. Kuhl, P.K., and A.N. Meltzoff. 1982. The bimodal perception of speech in infancy. Science 218:1138–41. Kuhl, P.K., and A.N. Meltzoff. 1984. The intermodal representation of speech in infants. Infant Behavior & Development 7:361–81. Lashley, K.S. 1951. The problem of serial order in behavior. In Cerebral mechanisms in behavior: The Hixon symposium, ed. L.A. Jeffress, 123–47. New York: Wiley. Lewkowicz, D.J. 1986. Developmental changes in infants’ bisensory response to synchronous durations. Infant Behavior & Development 9:335–53. Lewkowicz, D.J. 1992a. Infants’ response to temporally based intersensory equivalence: The effect of synchronous sounds on visual preferences for moving stimuli. Infant Behavior & Development 15:297–324. Lewkowicz, D.J. 1992b. Infants’ responsiveness to the auditory and visual attributes of a sounding/moving stimulus. Perception & Psychophysics 52:519–28. Lewkowicz, D.J. 1994. Limitations on infants’ response to rate-based auditory-visual relations. Developmental Psychology 30:880–92. Lewkowicz, D.J. 1996. Perception of auditory-visual temporal synchrony in human infants. Journal of Experimental Psychology: Human Perception & Performance 22:1094–106. Lewkowicz, D.J. 2000a. The development of intersensory temporal perception: An epigenetic systems/limitations view. Psychological Bulletin 126:281–308. Lewkowicz, D.J. 2000b. Infants’ perception of the audible, visible and bimodal attributes of multimodal syllables. Child Development 71:1241–57. Lewkowicz, D.J. 2002. Heterogeneity and heterochrony in the development of intersensory perception. Cognitive Brain Research 14:41–63. Lewkowicz, D.J. 2003. Learning and discrimination of audiovisual events in human infants: The hierarchical relation between intersensory temporal synchrony and rhythmic pattern cues. Developmental Psychology 39:795–804. Lewkowicz, D.J. 2004. Perception of serial order in infants. Developmental Science 7:175–84. Lewkowicz, D.J. 2008. Perception of dynamic and static audiovisual sequences in 3- and 4-month-old infants. Child Development 79:1538–54. Lewkowicz, D.J. 2010. Infant perception of audio-visual speech synchrony. Developmental Psychology 46:66–77. Lewkowicz, D.J., and I. Berent. 2009. Sequence learning in 4-month-old infants: Do infants represent ordinal information? Child Development 80:1811–23. Lewkowicz, D., and K. Kraebel. 2004. The value of multisensory redundancy in the development of intersensory perception. The Handbook of Multisensory Processes: 655–78. Cambridge, MA: MIT Press. Lewkowicz, D.J., and A.A. Ghazanfar. 2006. The decline of cross-species intersensory perception in human infants. Proceedings of the National Academy of Sciences of the United States of America 103:6771–4. Lewkowicz, D.J., and G. Turkewitz. 1980. Cross-modal equivalence in early infancy: Auditory–visual intensity matching. Developmental Psychology 16:597–607. Lewkowicz, D.J., and G. Turkewitz. 1981. Intersensory interaction in newborns: Modification of visual preferences following exposure to sound. Child Development 52:827–32.
342
The Neural Bases of Multisensory Processes
Lewkowicz, D.J., R. Sowinski, and S. Place. 2008. The decline of cross-species intersensory perception in human infants: Underlying mechanisms and its developmental persistence. Brain Research 1242:291–302. Lewkowicz, D.J., I. Leo, and F. Simion. 2010. Intersensory perception at birth: Newborns match non-human primate faces and voices. Infancy 15:46–60. Lickliter, R., and L.E. Bahrick. 2000. The development of infant intersensory perception: Advantages of a comparative convergent-operations approach. Psychological Bulletin 126:260–80. Lickliter, R., and H. Banker. 1994. Prenatal components of intersensory development in precocial birds. In Development of intersensory perception: Comparative perspectives, ed. D.J. Lewkowicz and R. Lickliter, 59–80. Norwood, NJ: Lawrence Erlbaum Associates, Inc. Lickliter, R., D.J. Lewkowicz, and R.F. Columbus. 1996. Intersensory experience and early perceptual development: The role of spatial contiguity in bobwhite quail chicks’ responsiveness to multimodal maternal cues. Developmental Psychobiology 29:403–16. Maier, N.R.F., and T.C. Schneirla. 1964. Principles of animal psychology. New York: Dover Publications. Marcovitch, S., and D.J. Lewkowicz. 2009. Sequence learning in infancy: The independent contributions of conditional probability and pair frequency information. Developmental Science 12:1020–5. Marcus, G.F., S. Vijayan, S. Rao, and P. Vishton. 1999. Rule learning by seven-month-old infants. Science 283:77–80. Marcus, G.F., K.J. Fernandes, and S.P. Johnson. 2007. Infant rule learning facilitated by speech. Psychological Science 18:387–91. Marks, L. 1978. The unity of the senses. New York: Academic Press. Martin, J.G. 1972. Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review 79:487–509. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264:229–39. Molholm, S., W. Ritter, M.M. Murray et al. 2002. Multisensory auditory–visual interactions during early sensory processing in humans: A high-density electrical mapping study. Cognitive Brain Research 14:115–28. Munhall, K.G., and E. Vatikiotis-Bateson. 2004. Spatial and temporal constraints on audiovisual speech perception. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 177–88. Cambridge, MA: MIT Press. Myers, A.K., B. Cotton, and H.A. Hilp. 1981. Matching the rate of concurrent tone bursts and light flashes as a function of flash surround luminance. Perception & Psychophysics 30(1):33–8. Navarra, J., A. Vatakis, M. Zampini et al. 2005. Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain Research 25:499–507. Nazzi, T., and F. Ramus. 2003. Perception and acquisition of linguistic rhythm by infants. Speech Communication 41:233–43. Neil, P.A., C. Chee-Ruiter, C. Scheier, D.J. Lewkowicz, and S. Shimojo. 2006. Development of multisensory spatial integration and perception in humans. Developmental Science 9:454–64. Nelson, K. 1986. Event knowledge: Structure and function in development. Hillsdale, NJ: Lawrence Erlbaum Associates. Nelson, K. 2007. Young minds in social worlds. Cambridge, MA: Harvard Univ. Press. Oppenheim, R.W. 1981. Ontogenetic adaptations and retrogressive processes in the development of the nervous system and behavior: A neuroembryological perspective. In Maturation and development: Biological and psychological perspectives, ed. K.J. Connolly and H.F.R. Prechtl, 73–109. Philadelphia, PA: Lippincott. Partan, S., and P. Marler. 1999. Communication goes multimodal. Science 283:1272–3. Patterson, M.L., and J.F. Werker. 2002. Infants’ ability to match dynamic phonetic and gender information in the face and voice. Journal of Experimental Child Psychology 81:93–115. Patterson, M.L., and J.F. Werker. 2003. Two-month-old infants match phonetic information in lips and voice. Developmental Science 6(2):191–6. Piaget, J. 1952. The origins of intelligence in children. New York: International Universities Press. Pons, F., D.J. Lewkowicz, S. Soto-Faraco, and N. Sebastián-Gallés. 2009. Narrowing of intersensory speech perception in infancy. Proceedings of the National Academy of Sciences of the United States of America 106:10598–602. Reardon, P., and E.W. Bushnell. 1988. Infants’ sensitivity to arbitrary pairings of color and taste. Infant Behavior and Development 11:245–50. Romei, V., M.M. Murray, C. Cappe, and G. Thut. 2009. Preperceptual and stimulus-selective enhancement of low-level human visual cortex excitability by sounds. Current Biology 19:1799–805. Rowe, C. 1999. Receiver psychology and the evolution of multicomponent signals. Animal Behaviour 58:921–31.
Development of Multisensory Temporal Perception
343
Saffran, J.R., R.N. Aslin, and E.L. Newport. 1996. Statistical learning by 8-month-old infants. Science 274:1926–8. Scheier, C., D.J. Lewkowicz, and S. Shimojo. 2003. Sound induces perceptual reorganization of an ambiguous motion display in human infants. Developmental Science 6:233–44. Sekuler, R., A.B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature 385:308. Shams, L., Y. Kamitani, and S. Shimojo. 2000. What you see is what you hear. Nature 408(6814):788. Shipley, T. 1964. Auditory flutter-driving of visual flicker. Science 145:1328–30. Slater, A., E. Brown, and M. Badenoch. 1997. Intermodal perception at birth: Newborn infants’ memory for arbitrary auditory–visual pairings. Early Development & Parenting 6:99–104. Slater, A., P.C. Quinn, E. Brown, and R. Hayes. 1999. Intermodal perception at birth: Intersensory redundancy guides newborn infants’ learning of arbitrary auditory–visual pairings. Developmental Science 2:333–8. Slutsky, D.A., and G.H. Recanzone. 2001. Temporal and spatial dependency of the ventriloquism effect. Neuroreport 12:7–10. Stein, B.E., and M.A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press. Stein, B.E., and T.R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews. Neuroscience 9:255–66. Stein, B.E., D. Burr, C. Constantinidis et al. 2010. Semantic confusion regarding the development of multisensory integration: A practical solution. European Journal of Neuroscience 31:1713–20. Thelen, E., and L.B. Smith. 1994. A dynamic systems approach to the development of cognition and action. Cambridge, MA: MIT Press. Thomas, K.M., and C.A. Nelson. 2001. Serial reaction time learning in preschool- and school-age children. Journal of Experimental Child Psychology 79:364–87. Turkewitz, G. 1994. Sources of order for intersensory functioning. In The development of intersensory perception: Comparative perspectives, ed. D.J. Lewkowicz and R. Lickliter, 3–17. Hillsdale, NJ: Lawrence Erlbaum Associates. Turkewitz, G., and P.A. Kenny. 1982. Limitations on input as a basis for neural organization and perceptual development: A preliminary theoretical statement. Developmental Psychobiology 15:357–68. Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson. 2004. Recalibration of temporal order perception by exposure to audio-visual asynchrony. Cognitive Brain Research 22:32–5. Walker-Andrews, A.S. 1986. Intermodal perception of expressive behaviors: Relation of eye and voice? Developmental Psychology 22:373–7. Walker-Andrews, A.S. 1997. Infants’ perception of expressive behaviors: Differentiation of multimodal information. Psychological Bulletin 121:437–56. Walker-Andrews, A.S., L.E. Bahrick, S.S. Raglioni, and I. Diaz. 1991. Infants’ bimodal perception of gender. Ecological Psychology 3:55–75. Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience 17:2429–44. Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior colliculus. Journal of Neuroscience 21:8886–94. Wallace, M.T., R. Ramachandran, and B.E. Stein. 2004. A revised view of sensory cortical parcellation. Proceedings of the National Academy of Sciences of the United States of America 101:2167–72. Wallace, M.T., B.E. Stein, and R. Ramachandran. 2006. Early experience determines how the senses will interact: A revised view of sensory cortical parcellation. Journal of Neurophysiology 101:2167–72. Welch, R.B., and D.H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 88:638–67. Welch, R.B., L.D. Duttenhurt, and D.H. Warren. 1986. Contributions of audition and vision to temporal rate perception. Perception & Psychophysics 39:294–300. Werner, H. 1973. Comparative psychology of mental development. New York: International Universities Press. Yehia, H., P. Rubin, and E. Vatikiotis-Bateson. 1998. Quantitative association of vocal-tract and facial behavior. Speech Communication 26:23–43. Zangenehpour, S., A.A. Ghazanfar, D.J. Lewkowicz, and R.J. Zatorre. 2009. Heterochrony and cross-species intersensory matching by infant vervet monkeys. PLoS ONE 4:e4302.
18
Multisensory Integration Develops Late in Humans David Burr and Monica Gori
CONTENTS 18.1 18.2 18.3 18.4 18.5
Development of Multimodal Perception in Infancy and Childhood..................................... 345 Neurophysiological Evidence for Development of Multimodal Integration.......................... 347 Development of Cue Integration in Spatial Navigation......................................................... 348 Development of Audiovisual Cue Integration....................................................................... 349 Sensory Experience and Deprivation Influence Development of Multisensory Integration.............................................................................................................................. 350 18.6 Development of Visuo-Haptic Integration............................................................................. 351 18.7 Calibration by Cross-Modal Comparison?............................................................................ 355 18.8 Haptic Discrimination in Blind and Low-Vision Children: Disruption of CrossSensory Calibration?.............................................................................................................. 356 18.9 Concluding Remarks: Evidence of Late Multisensory Development.................................... 357 Acknowledgment............................................................................................................................ 358 References....................................................................................................................................... 358
18.1 DEVELOPMENT OF MULTIMODAL PERCEPTION IN INFANCY AND CHILDHOOD From birth, we interact with the world through our senses, which provide complementary information about the environment. To perceive and interact with a coherent world, our brain has to merge information from the different senses as efficiently as possible. Because the same environmental property may be signaled by more than one sense, the brain must integrate redundant signals of a particular property (such as the size and shape of an object held in the hand), which can result in a more precise estimate than either individual estimate. Much behavioral, electrophysiological, and neuroimaging evidence has shown that signals from the different senses related to the same event, congruent in space and time, increase the accuracy and precision of its encoding well beyond what would be possible from independent estimates from individual senses. Several recent studies have suggested that human adults integrate redundant information in a statistically optimal fashion (e.g., Alais and Burr 2004; Ernst and Banks 2002; Trommershäuser et al. in press). An important question is whether this optimal multisensory integration is present at birth, or whether (and if so when) it develops during childhood. Early development of multisensory integration could be useful for the developing brain, but may also bring fresh challenges, given the dramatic changes that the human brain and body undergo during this period. The clear advantages of multisensory integration may come at a cost to the developing organism. In fact, as we will see later in this chapter, many multisensory functions appear only late in development, well after the maturation of individual senses. Sensory systems are not mature at birth, but become increasingly refined during development. The brain has to continuously update its mapping between sensory and motor correspondence and to take these changes into account. This is a very protracted process, with cognitive changes and 345
346
The Neural Bases of Multisensory Processes
neural reorganization lasting well into early adolescence (Paus 2005). A further complication is that different senses develop at different rates: first touch, followed by vestibular, chemical, and auditory (all beginning to function before birth), and finally vision (Gottlieb 1971). The differences in development rates could exacerbate the challenges for cross-modal integration and calibrating, needing to take into account growing limbs, eye length, interocular distances, etc. Some sensory properties, like contrast sensitivity, visual acuity, binocular vision, color perception, and some kinds of visual motion perception mature rapidly to reach near adult-like levels within 8 to 12 months of age (for a review, see Atkinson 2000). Similarly, young infants can explore, manipulate, and discriminate the form of objects haptically, analyzing and coding tactile and weight information, during a period when their hands are undergoing rapid changes (Streri 2003; Streri et al. 2000, 2004; Striano and Bushnell 2005). On the other hand, not all perceptual skills develop early. For example, auditory frequency discrimination (Olsho 1984; Olsho et al. 1988), temporal discrimination (Trehub et al. 1995), and basic speech abilities all improve during infancy (Jusczyk et al. 1998). Also, projective size and shape are not noticed or understood until at least 7 years of age, and evidence suggests that even visual acuity and contrast sensitivity continue to improve slightly up until 5 to 6 years of age (Brown et al. 1987). Other attributes, such as the use of binocular cues to control prehensile movements (Watt et al. 2003) and the development of complex form and motion perception (Del Viva et al. 2006; Ellemberg et al. 1999, 2004; Kovács et al. 1999; Lewis et al. 2004) continue until 8 to 14 years of age. Object manipulation also continues to improve until 8 to 14 years (Rentschler et al. 2004), and tactile object recognition in blind and sighted children does not develop until 5 to 6 years (Morrongiello et al. 1994). Many other complex and experience-dependent capacities, such as facilitation of speech perception in noise (e.g., Elliott 1979; Johnson 2000), have been reported to be immature throughout childhood. All these studies suggest that there is a difference not only in the developmental rates of different sensory systems, but also in the development of different aspects within each sensory system, all potential obstacles for the development of cue integration. The development of multimodal perceptual abilities in human infants has been studied with various techniques, such as habituation and preferential looking. Many studies suggest that some multisensory processes, such as cross-modal facilitation, cross-modal transfer, and multisensory matching are present to some degree at an early age (e.g., Streri 2003; Lewkowicz 2000, for review). Young infants can match signals between different sensory modalities (Dodd 1979; Lewkowicz and Turkewitz 1981) and detect equivalence in the amodal properties of objects across the senses (e.g., Patterson and Werker 2002; Rose 1981). For example, they can match faces with voices (Bahrick 2001) and visual and auditory motion signals (Lewkowicz 1992) on the basis of their synchrony. By 3 to 5 months of age, they can discriminate audiovisual changes in tempo and rhythm (Bahrick et al. 2002; Bahrick and Lickliter 2000), from 4 months of age, they can match visual and tactile form properties (Rose and Ruff 1987), and at about 6 months of age, they can do duration-based matches (Lewkowicz 1986). Young infants seem to be able to benefit from multimodal redundancy of information across senses (Bahrick and Lickliter 2000, 2004; Bahrick et al. 2002; Lewkowicz 1988a, 1996; Neil et al. 2006). There is also evidence for cross-modal facilitation, in which stimuli in one modality increases the responsiveness to stimuli in other modalities (Lewkowicz and Lickliter 1994; Lickliter et al. 1996; Morrongiello et al. 1998). However, not all forms of facilitation develop early. Infants do not exhibit multisensory facilitation of reflexive head and eye movements for spatial localization until about 8 months of age (Neil et al. 2006), and multisensory coactivation during a simple audiovisual detection task does not occur until 8 years of age in most children (Barutchu et al. 2009, 2010). Recent studies suggest that human infants can transfer information gleaned from one sense to another (e.g., Streri 2003; Streri et al. 2004). For example, 1-month-old infants can visually recognize an object they have previously explored orally (Gibson and Walker 1984; Meltzoff and Borton 1979) and 2-month-old infants can visually recognize an object they have previously felt (Rose 1981; Streri et al. 2008). However, many of these studies show an asymmetry in the transfer (Sann
Multisensory Integration Develops Late in Humans
347
and Streri 2007; Streri 2003; Streri et al. 2008) or a partial dominance of one modality over another (Lewkowicz 1988a, 1988b), supporting the idea that, even when multimodal skills are present, they are not necessarily fully mature. Recent results (Bremner et al. 2008a, 2008b) on the representation of peripersonal space support the presence of two distinct mechanisms in sensory integration with different developmental trends: the first, relying principally on visual information, is present during the first 6 months; the second, incorporating information of hand and body posture with visual, develops only after 6.5 months of age. Over the past years, the majority of multisensory studies in infants and children have investigated the development of multisensory matching, transfer, and facilitation abilities, whereas few of those have investigated the development of multisensory integration. Those few that did investigate multisensory integration in school-age children point to unimodal dominance rather than integration abilities (Hatwell 1987; Klein 1966; McGurk and Power 1980; Misceo et al. 1999).
18.2 NEUROPHYSIOLOGICAL EVIDENCE FOR DEVELOPMENT OF MULTIMODAL INTEGRATION There is now firm neurophysiological evidence for multimodal integration. Many studies have demonstrated that the midbrain structure superior colliculus is involved in integrating information between modalities and in initializing and controlling the localization and orientation of motor responses (Stein et al. 1993). This structure is highly sensitive to input from the association cortex (Stein 2005), and the inactivation of this input impairs the integration of multisensory signals (Jiang and Stein 2003). Maturation of multisensory responses depends strongly on environmental experience (Wallace and Stein 2007): after visual deprivation (Wallace et al. 2004), the responses of multisensory neurons are atypical, and fail to show multisensory integration. A typically developed superior colliculus is structured in layers. Neurons in the superficial layers are unisensory, whereas those in the deeper layers respond to the combination of visual, auditory, and tactile stimuli (Stein et al. 2009b). Neurons related to a specific sensory modality have their own spatial map that is spatially registered with the maps of the neurons involved in the processing of other modalities (Stein et al. 1993, 2009b). These multisensory neurons respond to spatiotemporally coincident multisensory stimuli with a multisensory enhancement (more impulses than evoked by the strongest stimulus; Meredith and Stein 1986). Multisensory enhancement has been observed in several different species of animals (in the superior colliculus of cat, hamster, guinea pig, and monkeys as well as in the cortex of cat and monkey (Meredith and Stein 1986; Stein et al. 2009a, 2009b; Wilkinson et al. 1996) and functional magnetic resonance imaging and behavioral studies support the existence of similar processes in humans (e.g., Macaluso and Driver 2004). Multimodal responses of collicular neurons are not present at birth but develop late in cats and monkeys (Stein et al. 1973; Wallace and Stein 1997, 2001). For example, in the cat superior colliculus, neurons are somatosensory at birth (Stein et al. 1973), whereas auditory and visual neurons appear only postnatally. Initially, these neurons respond well to either somatic or auditory or visual signals. Enhanced multisensory responses emerge many weeks later, and its development depends on both experience and input from association cortex (Wallace and Stein 2001). Behavioral data suggest that the visual modality is principally involved in the processing of the spatial domain and the auditory system in the temporal domain. Most neurophysiological studies have investigated spatial rather than temporal processing. However, development of temporal properties may be interesting, as the temporal patterns of stimulation can be perceived in the uterus before birth by the vestibular, tactile, and auditory senses. Indeed, neurophysiological studies suggest that somatosensory–audio multisensory neurons develop a few days after birth whereas multisensory neurons that also modulate visual information only appear a few weeks later (Stein et al. 1973). Thus, integration of temporal attributes of perception could develop before spatial attributes (such as location or orientation), which are not typically available prenatally.
348
The Neural Bases of Multisensory Processes
18.3 DEVELOPMENT OF CUE INTEGRATION IN SPATIAL NAVIGATION When do human infants start integrating multisensory signals, and when does the integration become statistically optimal? Nardini et al. (2008) studied the reliance on multiple spatial cues for short-range navigation in children and adults. Navigation depends on both visual landmarks and self-generated cues, such as vestibular and proprioceptive signals generated from the movement of the organism in the environment. To measure and quantify the ability of adults and children to integrate this information, they first measured the precision for each modality and then observed the improvement in the bimodal condition. The subjects (adults and children aged 4 to 5 and 7 to 8 years) walked in a dark room with peripherally illuminated landmarks and collected a series of objects (1, 2, and 3 in Figure 18.1a). After a delay, they replaced the objects. Subjects were provided with two cues to navigation, visual landmarks (“moon,” “lightning bolt,” and “star” in Figure 18.1a) and self-motion. They recorded the distance between the participant’s responses and the correct location as well as root mean square errors for each condition, both for the two unimodal conditions—with the room in darkness (no landmarks; SM) and with visual landmarks present (LM) but subjects (a)
(b) 1 3
Start
1R 3
2
Start SM (self-motion) LM (landmarks) SM+LM
100
Mean SD (cm)
(c)
1
2
80 60 40 20 0
Mean SD (cm) (measured)/ mean predicted SD (model)
(d)
4-5 yr.
7-8 yr.
Group
Prediction, integration model ±1 s.e. 100 4-5 yr.
7-8 yr.
Adult Prediction, alternation model ±1 s.e.
Adult
90 80 70 60 50 40 30 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Mean relative LM proximity (measured)/LM weight or probability (model)
FIGURE 18.1 (See color insert.) Use of multiple cues for navigation in adults and children. (a) Representation of room in which subject performed the task in nonconflictual condition. Starting from “start,” subject picked up three numbered objects in sequence. Three visual landmarks (a “moon,” a “lightning bolt,” and a “star”) were also present in the room. (b) Representation of room in which subject performed the task in conflictual condition. Here, landmarks were rotated around the subject from white to colored position of 15°. (c) Mean standard deviation (SD) of participant responses for three different conditions. (d) Curves report the means of functions that predict mean standard deviation (SD ±1 SE) from integration model (in green) or alternation model (in pink) for different age groups. (Reproduced from Nardini, M. et al., Curr. Biol., 18, 689–693, 2008. With permission.)
Multisensory Integration Develops Late in Humans
349
disoriented—and with both cues present (SM + LM). Figure 18.1c shows a clear developmental trend in the unimodal performance, with mean mislocalization thresholds decreasing with age, suggesting that navigation improves during development. More interestingly, whereas adults take advantage of multiple cue integration, the children do not. SM + LM thresholds were higher than LM thresholds for children in both age groups, whereas the adults showed lower thresholds in the two-cue condition (evidence of cross-sensory fusion). Nardini et al. (2008) also measured navigation in a conflict condition (Figure 18.1b), in which landmarks were rotated by 15° after the participants had collected the objects. They considered two models, one in which the cues were weighted by the inverse of variance and integrated (green line in Figure 18.1d), and one in which subjects alternate between the two cues (pink line in Figure 18.1d). Although the integration model predicted adult performance in the conflict condition, 4- to 5- and 7- to 8-year-olds followed the alternation model rather than the integration model. Although adults clearly integrate optimally multiple cues for navigation, young children do not, alternating between cues from trial to trial. These results suggest that the development of the two individual spatial representations occur before they are integrated within a common unique reference frame. This study suggests that optimal multisensory integration of spatial cues for short-range navigation occurs late during development.
18.4 DEVELOPMENT OF AUDIOVISUAL CUE INTEGRATION Audiovisual integration is fundamental for many tasks, such as orientation toward novel stimuli and understanding speech in noisy environments. As the auditory system starts to develop before vision, commencing in utero, it is interesting to examine when the two senses are integrated. Neil et al. (2006) measured audiovisual facilitation of spatial location in adults and 1- to 10-monthold infants, by comparing the response latency and accuracy of head and eye turns toward unimodal (visual or auditory) and bimodal stimuli. Subjects were required to orient toward a stimulus (a red vertical line or a sustained burst of white noise, or both) presented at one of five different locations. For all stimuli, orientation latencies decreased steadily with age, from about 900 ms at 0 to 2 months to 200 ms for adults. The response to the bimodal stimulus was faster than for either unimodal stimulus at all ages, but only for adults and for 8- to 10-month-old infants was the “race model” (the standard probability summation model of reaction times) consistently violated, implying neural integration. For young infants, the results were well-explained by independent probability summation, without any evidence that the audiovisual signals were combined in any physiological way. Only after 8 to 10 months did the faster bimodal response suggest that behavioral summation had occurred. Although multisensory facilitation for audiovisual reflexive eye and head movements for spatial localization has been found to develop at about 10 months of age (Neil et al. 2006), recent findings (Barutchu et al. 2009, 2010) report a different developmental trend for multisensory facilitation of visual–audio not reflexive motor responses. Barutchu et al. (2009) studied the motor reaction times during audiovisual detection task and found that multisensory facilitation is still immature by 10 to 11 years of age. In fact, only at around 7 years of age did the facilitation start to become consistent with the coactivation model (Barutchu et al. 2009). These authors suggest that the difference observed in these two trends can depend on the development of the process being facilitated by multisensory integration. Thus, the maturity of processes being facilitated during eye and head reflexive movements precedes the maturity of the processes being facilitated during more complex detection motor tasks (Barutchu et al. 2009) or speech perception (Massaro 1987). Also, Tremblay et al. (2007) showed that different audiovisual illusions seem to develop at different rates. They investigated the development of visuo–audio abilities for two different tasks: one for speech illusion and one for nonspeech illusion. They found that although audiovisual speech illusions varied as a function of age and does not develop until 10 years of age, nonspeech illusions were the same across ages, and already present at 5 years of age. Later in the chapter, we shall suggest a different interpretation of these results, one of “crossmodal calibration,” which we believe could stabilize at different ages for different tasks.
350
The Neural Bases of Multisensory Processes
18.5 SENSORY EXPERIENCE AND DEPRIVATION INFLUENCE DEVELOPMENT OF MULTISENSORY INTEGRATION Animal studies have shown that deprivation of cross-modal cues compromises the development of normal multisensory responses. For example, Wallace et al. (2004) found that cats deprived of audiovisual and visuo–tactile experience showed no multisensory response enhancement in the superior colliculus. Similarly, patients with specific sensory deficits, such as congenital deafness or blindness later restored by surgery techniques, are ideal models to investigate the effects of sensory experience on multisensory integration in humans. For example, Putzar et al. (2007) tested patients born with dense congenital binocular cataracts (removed at 2 or 3 months) on a nonverbal audiovisual task as well as audiovisual speech perception. This group actually performed better than a control group on the nonverbal task, where they were required to make temporal judgments of visual stimuli presented together with auditory distractors, suggesting that the visuo–auditory “binding” was weaker in patients who had been visually deprived for the first few months of life. Similarly, they performed worse than controls in the speech experiment, where a fusion between spatial and temporal visuo– auditory perceptual aspects assisted the task. These results highlight the importance of adequate sensory input during early life for the development of multisensory interactions (see also Gori et al. 2010; Hotting and Roder 2009; Röder et al. 2004, 2007). Also, auditory deprivation can influence the perception of multisensory stimuli, notably speech perception, which involves the interaction of temporal and spatial visual and audio signals. The clearest example of this is the McGurk effect (McGurk and Power 1980): subjects listening to a spoken phoneme (e.g., /pa/) and watching a speaker pronounce another phoneme (such as /ka/) will report hearing an in-between phoneme, /ta/. This compelling illusion occurs both for adults and young children (e.g., Bergeson and Pisoni 2003). Schorr et al. (2005) took advantage of this robust illusion to study bimodal fusion in children born deaf, with hearing restored by cochlear implants. They first replicated the illusion in a control group of children with normal hearing, of whom 57% showed bimodal fusion on at least 70% of trials, perceiving /ta/ when /ka/ was pronounced and /pa/ observed on video (Figure 18.2). Of those who did not show fusion, the majority showed a clear 100
Subjects (%)
80 60 40 20 0
AV
A
V
Controls
AV
A
V
Implants < 30 m
AV
A
V
Implants > 30 m
FIGURE 18.2 McGurk effect in children with cochlear implants compared with age-matched controls. Phoneme /pa/ was played to subjects while they observed a video of lips pronouncing /ka/, and reported the phoneme they perceived. Black bars show percentage of each group to report fusion (/ka/) on at least 70% of trials; light gray bars show auditory dominance (/pa/) and dark gray bars show visual dominance (/ka/). For controls, more than half showed bimodal fusion (McGurk effect), and of those that did not, most showed auditory dominance. Also, for children with early cochlear implants (before 30 months of age), majority show fusion, but those that did not showed visual dominance. For children with later implants, almost all showed visual dominance.
351
Multisensory Integration Develops Late in Humans
auditory dominance. Among the group who had implants at an early age (before 30 months), a similar proportion (56%) perceived the fused phoneme, suggesting that bimodal fusion was occurring. However, the majority of those who did not perceive the fused phoneme perceived the visual /ka/ rather than the auditory /pa/ that the control children perceive. For late implants, however, only one showed cross-modal fusion, all the others showed visual dominance. These results suggest that cross-modal fusion is not innate, but needs to be learned. The group of hearing-restored children who received the implant after 30 months of age showed no evidence of cross-modal fusion, with the visual phoneme dominating perception. Those with early implants demonstrate a remarkable plasticity in acquiring bimodal fusion, suggesting that there is a sensitive period for the development of bimodal integration of speech. It is interesting that in normal-hearing children, sound dominates the multimodal perception, whereas vision dominated in all the cochlea-implanted children, both early and late implants. It is possible that the dominance can be explained by reliability-based integration. Speech is a complex temporal task in audition and spatiotemporal task in vision. Although performance has not yet been measured (to our knowledge), it is reasonable to suppose that in normal-hearing children, the auditory perception is more precise, explaining the dominance. What about the cochlea-implanted children? Is their auditory precision worse than visual precision, so the visual dominance is the result of ideal fusion? Or is the auditory perception actually better than visual perception at this task, so the visual dominance is not the most optimal solution? In this case, it may be that vision remains the most robust sense, even if not the most precise. This would be interesting to investigate, perhaps in a simplified situation, as has been done for visuo-haptic judgments (see following section).
18.6 DEVELOPMENT OF VISUO-HAPTIC INTEGRATION One of the earliest studies to investigate the capacity of integrated information between perceptual systems was that of Ernst and Banks (2002), who investigated the integration of visual and haptic estimates of size in human adults. Their results were consistent with a simple but powerful model that proposes that visual and haptic inputs are combined in an optimal fashion, maximize the precision of the final estimate (see also chapter by Marc Ernst). This maximum likelihood estimate (MLE) model combines sensory information by summing the independent estimates from each modality, after weighting the estimates by their reliability, in turn, inversely proportional to the variance of the presumed underlying noise distribution.
ŜVH = wVŜV + wH Ŝ H
(18.1)
where ŜVH is the combined visual and haptic estimate, estimate and ŜV and ŜH the independent haptic and visual estimates. The weights w V and wH sum to unity and are inversely proportional to the variance (σ 2) of the presumed underlying noise distribution.
(
)
(
)
wV = σ V−2 σ H−2 + σ V−2 , wH = σ H−2 σ H−2 + σ V−2
(18.2)
The MLE prediction for the visuo-haptic threshold (σVH) is given by
−2 σ VH = σ V−2 + σ H−2
(18.3)
where σ V and σ H are the visual and haptic unimodal thresholds. The improvement is greatest ( 2 ) when σ V = σ H. This model has been spectacularly successful in predicting human multimodal integration for various tasks, including visuo-haptic size judgments (Ernst and Banks 2002), audiovisual position
352
The Neural Bases of Multisensory Processes
judgments (Alais and Burr 2004), and visual–tactile integration of sequence of events (Bresciani and Ernst 2007). Gori et al. (2008) adapted the technique to study the development of reliabilitybased cross-sensory integration of two aspects of form perception: size and orientation discrimination. The size discrimination task (top left icon of Figure 18.3) was a low-technology, child-friendly adaptation of Ernst and Banks’ technique (Ernst and Banks 2002), where visual and haptic information were placed in conflict with each other to investigate which dominates perception under various degrees of visual degradation. The stimuli were physical blocks of variable height, displayed in (a)
(d)
Haptic standard
(b) 1.0
(e)
1.0
10 Years
Proportion “steeper”
Proportion “taller”
0.0 MLE prediction Haptic standard
(c)
5 Years
0.0 MLE prediction
(f ) 1.0
Visual standard
5 Years
0.5
0.5
0.0
8 Years
0.5
0.5
1.0
Visual standard
–6
–3
0
3
6
0.0
–12 –6
0
6
12
MLE prediction
MLE prediction
Relative probe size (mm)
Relative probe orientation (deg)
FIGURE 18.3 (See color insert.) Development of cross-modal integration for size and orientation discrimination. Illustration of experimental setup for size (a) and orientation (d) discrimination. Sample psychometric functions for four children, with varying degrees of cross-modal conflict. (b and c) Size discriminations: SB age 10.2 (b); DV age 5.5 (c); (e and f) orientation discrimination: AR age 8.7 (e); GF age 5.7 (f). Lower colorcoded arrows show MLE predictions, calculated from threshold measurements (Equation 18.1). Black-dashed horizontal lines show 50% performance point, intersecting with curves at their PSE (shown by short vertical bars). Upper color-coded arrows indicate size of haptic standard in size condition (b and c) and orientation of visual standard in orientation condition (e and f). Older children generally follow the adult pattern, whereas 5-year-olds were dominated by haptic information for size task, and visual information for orientation task. For size judgment, amount of conflict was 0 for red symbols, +3 mm (where plus means vision was larger) for blue symbols, and –3 mm for green symbols. For orientation, same colors refer to 0° and ±4°.
Multisensory Integration Develops Late in Humans
353
front of an occluding screen for visual judgments, behind the screen for haptic judgments, or both in front and behind for bimodal judgments. All trials involved a two-alternative forced-choice task in which the subject judged whether a standard block seemed taller or shorter than a probe of variable height. For the single-modality trials, one stimulus was the standard, always 55 mm high, the other the probe, of variable height. The proportion of trials in which the probe was judged taller than the standard was computed for each probe height, yielding psychometric functions. The crucial condition was the dual-modality condition, in which visual and haptic sizes of the standard were in conflict, with the visual block 55 + Δ mm and the haptic block 55 – Δ mm (Δ = 0 or ±3 mm). The probe was composed of congruent visual and haptic stimuli of variable heights (48–62 mm). After validating the technique with adults, demonstrating that optimal cross-modal integration also occurred under these conditions, we measured haptic, visual, and bimodal visuo-haptic size discrimination in 5- to 10-year-old children. Figure 18.3 shows sample psychometric functions for the dual-modality measurements, fitted with cumulative Gaussian functions whose median estimates the point of subjective equality (PSE) between the probe and standard. The pattern of results for the 10-year-old (Figure 18.3b) was very much like those for the adult: negative values of Δ caused the curves to shift leftward, positive values caused them to shift rightward. That is, to say, the curves followed the visual standard, suggesting that visual information was dominating the match, as the MLE model suggests it should, as the visual thresholds were lower than the haptic thresholds. This is consistent with the MLE model (indicated by color-coded arrows below the abscissa): the visual judgment was more precise, and should therefore dominate. For the 5-year-olds (Figure 18.3c), however, the results were completely different: the psychometric functions shifted in the direction opposite to that of the 10-year-olds, following the bias of the haptic stimulus. The predictions (color-coded arrows under the abscissa) are similar for both the 5- and 10-year-olds, as for both groups of children, visual thresholds were much lower than haptic thresholds, so the visual stimuli should dominate: but for the 5-year-olds, the reverse holds, with the haptic standard dominating the match. These data show that for size judgments, touch dominates over vision. But is this universally true? We repeated the experiments with another spatial task, orientation discrimination, another basic spatial task that could, in principle, be computed by neural hardware of the primary visual cortex (Hubel and Wiesel 1968). Subjects were required to discriminate which bar of a dual presentation (standard and probe) was rotated more counterclockwise. As with the size discriminations, we first measured thresholds in each separate modality, then visuo-haptically, by varying degrees of conflict (Δ = 0 or ±4°). Figure 18.3e and F show sample psychometric functions for the dual-modality measurements for a 5- and 8-year-old child. As with the size judgments, the pattern of results for the 8-year-old was very much like those for the adult, with the functions of the three different conflicts (Figure 18.3e) falling very much together, as predicted from the single modality thresholds by the MLE model (arrows under the abscissa). Again, however, the pattern of results for the 5-year-old was quite different (Figure 18.3f). Although the MLE model predicts similar curves for the three conflict conditions, the psychometric functions very closely followed the visual standards (indicated by the arrows above the graphs), the exact opposite pattern to that observed for size discrimination. Figure 18.4 reports PSEs for children in all ages for the three conflict conditions, plotted as a function of the MLE predictions from single-modality discrimination thresholds. If the MLE prediction held, the data should fall along the black-dotted equality line (like in the bottom graph that reports the adults’ results). For adults this was so, for both size and orientation. However, at 5 years of age, the story was quite different. For the size discriminations (Figure 18.4a), not only do the measured PSEs not follow the MLE predictions, they varied inversely with Δ (following the haptic standard), lining up almost orthogonal to the equality line. Similarly, the data for the 6-year-olds do not follow the prediction, but there is a tendency for the data to be more scattered rather than ordered orthogonal to the prediction line. By 8 years of age, the data begin
354
The Neural Bases of Multisensory Processes (a) Size
(b) Orientation 5Y
6 3 0 –3 –6
6Y
6 3 0 –3 –6
4 2 0 –2 –4
8Y
6 3 0 –3 –6
4 2 0 –2 –4
10Y
4 2 0 –2 –4
4 2 0 –2 –4
Adults
–4 –2 0 2 4
PSE measured (deg)
PSE measured (mm)
4 2 0 –2 –4
6 3 0 –3 –6
–6 –3 0 3 6
(mm) (deg) Prediction from thresholds
FIGURE 18.4 (See color insert.) Summary data showing PSEs for all subjects for all conflict conditions, plotted against predictions, for size (a) and orientation (b) discriminations. Different colors refer to different subjects within each age group. Symbol shapes refer to level of cross-sensory conflict (Δ): squares, 3 mm or 4°; circles, –3 mm or –4°; upright triangles, 0; diamonds, 2 mm; inverted triangles, –2 mm. Closed symbols refer to no-blur condition for size judgments, and vertical orientation judgments; open symbols to modest blur (screen at 19 cm) or oblique orientations; cross in symbols to heavy blur (screen at 39 cm).
to follow the prediction, and by age 10, the data falls along it well, similar to the adult pattern of results. Figure 18.5a shows how thresholds vary with age for the various conditions. For both tasks, visual and haptic thresholds decreased steadily up till 10 years (orientation more so than size). The light-blue symbols show the thresholds predicted from the MLE model (Equation 18.3). For the adults, the predicted improvement was close to the best single-modality threshold, and indeed, the dual-modality thresholds were never worse than the best single-modality threshold. For the 5-year-old children, the results were quite different, with the dual-modality thresholds following the worst thresholds. For the size judgment, they follow the haptic thresholds, not only much higher than the MLE predictions, but twice the best single-modality (visual) thresholds. This result shows not only that integration was not optimal, it was not even a close approximation, like “winner take all.” Indeed, it shows a “loser take all” strategy. This reinforces the PSE data in showing that these young children do not integrate cross-modally in a way that benefits perceptual discrimination. Figure 18.5b plots the development of theoretical (violet symbols) and observed (black symbols) visual and haptic weights. For both size and orientation judgments, the theoretical haptic weights (calculated from thresholds) were fairly constant over age, 0.2 to 0.3 for size and 0.3 to 0.4 for
355
Multisensory Integration Develops Late in Humans
Haptic weight
30 10
3
5
3
1.0
10 Adult Blur
3
10
Adult
Thresholds (deg)
10
1 (b)
Haptic Vision MLE Cross-modal
2
0.0
PSEs Thresholds
0.5
0.5
Visual weight
Thresholds (mm)
(a)
1.0
0.0 3
10
Adult
3
10
Adult
Age (y)
FIGURE 18.5 (See color insert.) Development of thresholds and visuo-haptic weights. Average thresholds (geometric means) for haptic (red symbols), visual (green), and visuo-haptic (dark blue) size and orientation discrimination, together with average MLE predictions (light blue), as a function of age. Predictions were calculated individually for each subject and then averaged. Tick-labeled “blur” shows thresholds for visual stimuli blurred by a translucent screen 19 cm from blocks. Error bars are ±1 SEM. Haptic and visual weights for size and orientation discrimination, derived from thresholds via MLE model (violet circles) or from PSE values (black squares). Weights were calculated individually for each subject, and then averaged. After 8 to 10 years, the two estimates converged, suggesting that the system then integrates in a statistically optimal manner.
orientation. However, the haptic weights necessary to predict the 5-year-old PSE size data are 0.6 to 0.8, far, far greater than the prediction, implying that these young children give far more weight to touch for size judgments than is optimal. Similarly, the haptic weights necessary to predict the orientation judgments are around 0, far less than the prediction, suggesting that these children base orientation judgments almost entirely on visual information. In neither case does anything like optimal cue combination occur.
18.7 CALIBRATION BY CROSS-MODAL COMPARISON? Our experiments showed that before 8 years of age, children do not integrate information between senses, but one sense dominates the other. Which sense dominates depends on the situation: for size judgments, touch dominates; for orientation vision, neither seems to act as the “gold standard.” Given the overwhelming body of evidence for optimal integration in adults, that children do not integrate in an optimal manner was not to be expected, and suggests that multisensory interaction in infants is fundamentally different from that in adults. How could it differ? Although most recent work on multisensory interactions has concentrated on sensory fusion, the efficient combination of information from all the senses, an equally important but somewhat neglected potential function is calibration. In his 300-year-old “Essay towards a new theory of vision,” Bishop George Berkeley (1709) correctly observed that vision has no direct access to attributes such as distance, solidarity, or “bigness.” These can be acquired visually only after they have been associated with touch (proposition 45): in other words, “touch educates vision,” perhaps better expressed as “touch calibrates vision.” Calibration is probably necessary at all ages, but during the early years of life, when
356
The Neural Bases of Multisensory Processes
High precision Low accuracy
Low precision High accuracy
FIGURE 18.6 Accuracy and precision. Accuracy is defined as closeness of a measurement to its true physical value (its veracity), whereas precision is degree of reproducibility or repeatability between measurements, usually measured as standard deviation of distribution. “Target analogy” shows high precision but poor accuracy (left), and good average accuracy but poor precision (right). The archer would correct his or her aim by calibrating sights of the bow. Similarly, perceptual systems can correct for a bias by cross-calibration between senses.
children are effectively “learning to see,” calibration may be expected to be more important. It is during these years that limbs are growing rapidly, eye length and eye separation are increasing, all necessitating constant recalibration between sight and touch. Indeed, many studies suggest that the first 8 years in humans corresponds to the critical period of plasticity in humans for many attributes, for many properties such as binocular vision (Banks et al. 1975) and acquiring accent-free language (Doupe and Kuhl 1999). So before 8 years of age, calibration may be more important than integration. The advantages of fusing sensory information are probably more than offset by those of keeping the evolving system calibrated and using one system to calibrate another precludes the fusion of the two. Therefore, if we accept Berkeley’s ideas that vision must be calibrated by touch might explain why size discrimination thresholds are dominated by touch, even though touch is less precise than vision. But why are orientation thresholds dominated by vision? Perhaps Berkeley was not quite right, and touch does not always calibrate vision, but the more robust sense for a particular task is the calibrator. In the same way that the more precise sense has the highest weights for sensory fusion, perhaps the more accurate sense is used for calibration. The more accurate need not be the more precise, but is probably the more robust. Accuracy is defined in absolute terms, as the distance from physical reality, whereas precision is a relative measure, related to the reliability or repeatability of the results (see Figure 18.6). It is therefore reasonable that for size, touch will be more accurate, as vision cannot code it directly, but only by a complex calculation of retinal size and estimate of distance. Orientation, on the other hand, is coded directly by primary visual cortex (Hubel and Wiesel 1968), and calculated from touch only indirectly via complex coordinate transforms.
18.8 HAPTIC DISCRIMINATION IN BLIND AND LOW-VISION CHILDREN: DISRUPTION OF CROSS-SENSORY CALIBRATION? If the idea of calibration is correct, then early deficits in one sense should affect the function of other senses that rely on it for calibration. Specifically, haptic impairment should lead to poor visual discrimination of size and visual impairment to poor haptic discrimination of orientation. We have tested and verified the latter of these predictions (Gori et al. 2010). In 17 congenitally visually impaired children (aged 5–19 years), we measured haptic discrimination thresholds for both orientation and size, and found that orientation, but not size, thresholds were impaired. Figure 18.7 plots size against orientation thresholds, both normalized by age-matched normally sighted children.
357
Normalized size thresholds
Multisensory Integration Develops Late in Humans
1
0.3 0.3
1
10
Normalized orientation thresholds
FIGURE 18.7 Thresholds for orientation discrimination, normalized by age-matched controls, plotted against normalized size thresholds, for 17 unsighted or low-vision children aged between 5 and 18 years. Most points lie in lower-right quadrant, implying better size and poorer orientation discrimination. Arrows refer to group averages, 2.2 ± 0.3 for orientation and 0.8 ± 0.06 for size. Star in lower-left quadrant is the acquired low-vision child. (Reprinted from Gori, M. et al., Curr. Biol., 20, 223–5, 2010. With permission.)
Orientation discrimination thresholds were all worse than the age-matched controls (>1), on average twice as high, whereas size discrimination thresholds were generally better than the controls (<1). Interestingly, one child with an acquired visual impairment (star symbol) showed a completely different pattern of results, with no orientation deficit. Although we have only one such subject, we presume that his fine orientation thresholds result from the early visual experience (before 2½ years of age), which may have been sufficient for the visual system to calibrate touch. Many previous studies have examined haptic perception in the visually impaired, with seemingly contradictory results: some studies show the performance of blind and low-vision subjects to be as good or better than normally sighted controls, in tasks such as size discrimination with a cane (Sunanto and Nakata 1998), haptic object exploration and recognition, and tactile recognition of two-dimensional angles and gratings (Morrongiello et al. 1994); whereas other tasks including haptic orientation discrimination (Alary et al. 2009; Postma et al. 2008), visual spatial imagination (Noordzij et al. 2007), and representation and updating of spatial information (Pasqualotto and Newell 2007) have shown impairments. Visually impaired children had particular difficulties with rotated object arrays (Ungar et al. 1995). Most recently, Bülthoff and colleagues have shown that congenitally blind subjects are worse than both blindfolded sighted and acquired-blind subjects at haptic recognition of faces (Dopjans et al. 2009). It is possible that the key to understanding the discrepancy in the literature is whether the haptic task may have required an early cross-modal visual calibration. However, early exposure to vision seems to be sufficient to calibrate the developing haptic system, suggesting that the sensitive period for damage is shorter than that for normal development. This is consistent with other evidence for multiple sensitive periods, such as global motion perception (Lewis and Maurer 2005). The suggestion that specific perceptual tasks may require cross-modal calibration during development could have practical implications, possibly leading to improvements in rehabilitation programs. Where cross-sensory calibration has been compromised, for example by blindness, it may be possible to train people to use some form of “internal” calibration, or to calibrate by another modality such as sound.
18.9 CONCLUDING REMARKS: EVIDENCE OF LATE MULTISENSORY DEVELOPMENT To perceive a coherent world, we need to combine signals from our five sensory systems, signals that can be complementary or redundant. In adults, redundant signals from various sensory systems— vision, audition, and touch—are usually integrated in an optimal manner, improving the precision
358
The Neural Bases of Multisensory Processes
of the individual estimates. In the past few years, a great interest has emerged in when and how these functions develop in children and young animals. Many studies, both in children and animal models, suggest that multisensory integration does not occur at birth, but develops over time. Some basic forms of integration, such as reflexive orienting toward an audiovisual signal, develop quite early (Neil et al. 2006); some others, such as integration of visual-haptic signals for orientation and size (Gori et al. 2008), and self-generated cues during navigation (Nardini et al. 2008) develop only after 8 years of age. Similarly, whereas orientating reflexes benefit from cue-integration by 8 months (Neil et al. 2006), nonreflexive motor responses to bimodal stimuli continue to develop throughout childhood (Barutchu et al. 2009, 2010). Some have suggested that the late development might be because multisensory integration requires higher-order cognitive processes, including attention, reach a certain level of maturity or, alternatively, until all motor processes reach maturity (Barutchu et al. 2010), which does not occur until late adolescence (e.g., Betts et al. 2006; Kanaka et al. 2008; Smith and Chatterjee 2008). However, it is far from clear what complex cognitive processes are involved in simple size and orientation discriminations, and processes such as attention have been shown to operate at very low levels, including V1 and A1 (Gandhi et al. 1999; Woldorff et al. 1993). We suggest that anatomical and physiological differences in maturation rates could pose particular challenges for development, as could the need for the senses to continually recalibrate, to take into account growing limbs, eye length, interocular distances, etc. If cross-sensory calibration were more fundamental during development than for mature individuals, this would explain the lack of integration, as the use of one sense to calibrate the other necessarily precludes the integration of redundant information. Calibration does not always occur in the same direction (such as touch educating vision) but, in general, the more robust sense for a particular task calibrates the less robust. The haptic system, which has the more immediate information about size, seems to calibrate vision, which has no absolute size information and must scale for distance. On the other hand, for orientation discrimination, the visual system, which has specialized detectors tuned for orientation, seems to calibrate touch. Indeed, congenitally blind or low-vision children show a strong deficit for haptic orientation judgments and are consistent with the possibility that the deficit results from an early failure to calibrate. Cross-sensory calibration can explain many curious results, such as the fact that before integration, the dominance is task dependent, visual for orientation, haptic for size. Similar results have been observed with audiovisual integration: audiovisual speech illusions do not seem to develop until 10 years of age, whereas illusions not involving speech are mature by age 5 (Tremblay et al. 2007). Along the same lines, it can also explain the asymmetries in task performances in subjects with different sensorial deficits (Gori et al. 2010; Putzar et al. 2007; Schorr et al. 2005). All these results suggest that whereas the different sensory systems of infants and children are clearly interconnected, multimodal perception may be not fully developed until quite late. Crosssensory calibration may be a useful strategy to allow our brain to take into account the dramatic anatomical and sensorial changes during early life, as well as keeping our senses robustly calibrated through life’s trials and tribulations.
ACKNOWLEDGMENT This research was supported by the Italian Ministry of Universities and Research, EC project “STANIB” (FP7 ERC), EC project “RobotCub” (FP6-4270), and Istituto David Chiossone Onlus.
REFERENCES Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14:257–62.
Multisensory Integration Develops Late in Humans
359
Alary, F., M. Duquette, R. Goldstein, C. Elaine Chapman, P. Voss, V. La Buissonniere-Ariza, and F. Lepore. 2009. Tactile acuity in the blind: A closer look reveals superiority over the sighted in some but not all cutaneous tasks. Neuropsychologia 47:2037–43. Atkinson, J. 2000. The developing visual brain. New York: Oxford Univ. Press. Bahrick, L.E. 2001. Increasing specificity in perceptual development: Infants’ detection of nested levels of multimodal stimulation. Journal of Experimental Child Psychology 79:253–70. Bahrick, L.E., R. Flom, and R. Lickliter. 2002. Intersensory redundancy facilitates discrimination of tempo in 3-month-old infants. Developmental Psychobiology 41:352–63. Bahrick, L.E., and R. Lickliter. 2000. Intersensory redundancy guides attentional selectivity and perceptual learning in infancy. Developmental Psychology 36:190–201. Bahrick, L.E., and R. Lickliter. 2004. Infants’ perception of rhythm and tempo in unimodal and multimodal stimulation: a developmental test of the intersensory redundancy hypothesis. Cognitive, Affective & Behavioral Neuroscience 4:137–47. Banks, M.S., R.N. Aslin, and R.D. Letson. 1975. Sensitive period for the development of human binocular vision. Science 190:675–7. Barutchu, A., D.P. Crewther, and S.G. Crewther. 2009. The race that precedes coactivation: development of multisensory facilitation in children. Developmental Science 12:464–73. Barutchu, A., J. Danaher, S.G. Crewther, H. Innes-Brown, M.N. Shivdasani, and A.G. Paolini. 2010. Audiovisual integration in noise by children and adults. Journal of Experimental Child Psychology 105:38–50. Bergeson, T.R., and D.B. Pisoni. 2003. Audiovisual speech perception in deaf adults and children following cochlear implantation. In Handbook of multisensory integration, ed. G. Calvert, C. Spence, and B.E. Stein, 749–772. Cambridge, MA: MIT Press. Berkeley, G. 1709. An essay towards a new theory of vision. 1963. Indianapolis, IN: Bobbs-Merrill. Betts, J., J. McKay, P. Maruff, and V. Anderson. 2006. The development of sustained attention in children: The effect of age and task load. Child Neuropsychology 12:205–21. Bremner, A.J., N.P. Holmes, and C. Spence. 2008a. Infants lost in (peripersonal) space? Trends in Cognitive Sciences 12:298–305. Bremner, A.J., D. Mareschal, S. Lloyd-Fox, and C. Spence. 2008b. Spatial localization of touch in the first year of life: Early influence of a visual spatial code and the development of remapping across changes in limb position. Journal of Experimental Psychology. General 137:149–62. Bresciani, J.P., and M.O. Ernst. 2007. Signal reliability modulates auditory–tactile integration for event counting. Neuroreport 18:1157–61. Brown, A.M., V. Dobson, and J. Maier. 1987. Visual acuity of human infants at scotopic, mesopic and photopic luminances. Vision Research 27:1845–58. Del Viva, M.M., R. Igliozzi, R. Tancredi, and D. Brizzolara. 2006. Spatial and motion integration in children with autism. Vision Research 46:1242–52. Dodd, B. 1979. Lip reading in infants: Attention to speech presented in- and out-of-synchrony. Cognitive Psychology 11:478–84. Dopjans, L., C. Wallraven, and H.H. Bülthoff. 2009. Visual experience supports haptic face recognition: Evidence from the early- and late-blind. 10th International Multisensory Research Forum (IMRF), New York City, The City College of New York. Doupe, A.J., and P.K. Kuhl. 1999. Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience 22:567–631. Ellemberg, D., T.L. Lewis, D. Maurer, C.H. Lui, and H.P. Brent. 1999. Spatial and temporal vision in patients treated for bilateral congenital cataracts. Vision Research 39:3480–9. Ellemberg, D., T.L. Lewis, M. Dirks, D. Maurer, T. Ledgeway, J.P. Guillemot, and F. Lepore. 2004. Putting order into the development of sensitivity to global motion. Vision Research 44:2403–11. Elliott, L.L. 1979. Performance of children aged 9 to 17 years on a test of speech intelligibility in noise using sentence material with controlled word predictability. Journal of the Acoustical Society of America 66:651–3. Ernst, M.O., and M.S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415:429–33. Gandhi, S.P., D.J. Heeger, and G.M. Boynton. 1999. Spatial attention affects brain activity in human primary visual cortex. Proceedings of the National Academy of Sciences of the United States of America 96:3314–9. Gibson, E.J., and A.S. Walker. 1984. Development of knowledge of visual-tactual affordances of substance. Child Development 55:453–60.
360
The Neural Bases of Multisensory Processes
Gori, M., M.M. Del Viva, G. Sandini, and D.C. Burr. 2008. Young children do not integrate visual and haptic form information. Current Biology 18:694–8. Gori, M., G. Sandini, C. Martinoli, and D. Burr. 2010. Poor haptic orientation discrimination in nonsighted children may reflect disruption of cross-sensory calibration. Current Biology 20:223–5. Gottlieb, G. 1971. Development of species identification in birds: An inquiry into the prenatal determinants of perception. Chicago: Univ. of Chicago Press. Hatwell, Y. 1987. Motor and cognitive functions of the hand in infancy and childhood. International Journal of Behavioural Development 10:509–26. Hotting, K., and B. Roder. 2009. Auditory and auditory–tactile processing in congenitally blind humans. Hearing Research 258:165–74. Hubel, D.H., and T.N. Wiesel. 1968. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology 195:215–43. Jiang, W., and B.E. Stein. 2003. Cortex controls multisensory depression in superior colliculus. Journal of Neurophysiology 90:2123–35. Johnson, C.E. 2000. Children’s phoneme identification in reverberation and noise. Journal of Speech, Language, and Hearing Research 43:144–57. Jusczyk, P., D. Houston, and M. Goodman. 1998. Speech perception during the first year. In Perceptual development: Visual, Auditory, and Speech Perception in Infancy, ed. A. Slater. Psychology Press. Kanaka, N., T. Matsuda, Y. Tomimoto, Y. Noda, E. Matsushima, M. Matsuura, and T. Kojima. 2008. Measurement of development of cognitive and attention functions in children using continuous performance test. Psychiatry and Clinical Neurosciences 62:135–41. Klein, R.E. 1966. A developmental study of perception under condition of conflicting cues. Dissertation abstract. Kovács, I., P. Kozma, A. Fehér, and G. Benedek. 1999. Late maturation of visual spatial integration in humans. Proceedings of the National Academy of Sciences of the United States of America 96, 12204–9. Lewis, T.L., and D. Maurer. 2005. Multiple sensitive periods in human visual development: Evidence from visually deprived children. Developmental Psychobiology 46:163–83. Lewis, T.L., D. Ellemberg, D. Maurer, J.-P. Guillemot, and F. Lepore. 2004. Motion perception in 5-year-olds: Immaturity is related to hypothesized complexity of cortical processing. Journal of Vision 4:30–30a. Lewkowicz, D.J. 1986. Developmental changes in infants’ bisensory response to synchronous durations. Infant Behavior and Development 163:180–8. Lewkowicz, D.J. 1988a. Sensory dominance in infants: 1. Six-month-old infants’ response to auditory–visual compounds. Developmental Psychology 24:155–71. Lewkowicz, D.J. 1988b. Sensory dominance in infants: 2. Ten-month-old infants’ response to auditory-visual compounds. Developmental Psychology 24:172–82. Lewkowicz, D.J. 1992. Infants’ responsiveness to the auditory and visual attributes of a sounding/moving stimulus. Perception & Psychophysics 52:519–28. Lewkowicz, D.J. 1996. Perception of auditory–visual temporal synchrony in human infants. Journal of Experimental Psychology. Human Perception and Performance 22:1094–106. Lewkowicz, D.J. 2000. The development of intersensory temporal perception: An epigenetic systems/limitations view. Psychological Bulletin 126:281–308. Lewkowicz, D.J., and R. Lickliter (ed.). 1994. The development of intersensory perception: Comparative perspectives. Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Lewkowicz, D.J., and G. Turkewitz. 1981. Intersensory interaction in newborns: Modification of visual preferences following exposure to sound. Child Development 52:827–32. Lickliter, R., D.J. Lewkowicz, and R.F. Columbus. 1996. Intersensory experience and early perceptual development: The role of spatial contiguity in bobwhite quail chicks’ responsiveness to multimodal maternal cues. Developmental Psychobiology 29:403–16. Macaluso, E., and J. Driver. 2004. Neuroimaging studies of cross-modal integration for emotion. In The Handbook of Multisensory Processes, ed. G.A. Calvet, C. Spence, and B.E. Stein, 529–48. Cambridge, MA: MIT Press. Massaro, D.W. 1987. Speech perception by ear and eye: A paradigm for psychological inquiry. Program in experimental psychology. Hillsdale, NJ: Laurence Erlbaum Associates. McGurk, H., and R.P. Power. 1980. Intermodal coordination in young children: Vision and touch. Developmental Psychology 16:679–80. Meltzoff, A.N., and R.W. Borton. 1979. Intermodal matching by human neonates. Nature 282:403–4. Meredith, M.A., and B.E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56:640–62. Misceo, G.F., W.A. Hershberger, and R.L. Mancini. 1999. Haptic estimates of discordant visual-haptic size vary developmentally. Perception & Psychophysics 61:608–14.
Multisensory Integration Develops Late in Humans
361
Morrongiello, B.A., G.K. Humphrey, B. Timney, J. Choi, and P.T. Rocca. 1994. Tactual object exploration and recognition in blind and sighted children. Perception 23:833–48. Morrongiello, B.A., K.D. Fenwick, and G. Chance. 1998. Cross-modal learning in newborn infants: Inferences about properties of auditory–visual events. Infant Behavior and Development 21:543–54. Nardini, M., P. Jones, R. Bedford, and O. Braddick. 2008. Development of cue integration in human navigation. Current Biology 18:689–93. Neil, P.A., C. Chee-Ruiter, C. Scheier, D.J. Lewkowicz, and S. Shimojo. 2006. Development of multisensory spatial integration and perception in humans. Developmental Science 9:454–64. Noordzij, M.L., S. Zuidhoek, and A. Postma. 2007. The influence of visual experience on visual and spatial imagery. Perception 36:101–12. Olsho, L.W. 1984. Infant frequency discrimination as a function of frequency. Infant Behavior and Development 7:27–35. Olsho, L.W., E.G. Koch, E.A. Carter, C.F. Halpin, and N.B. Spetner. 1988. Pure-tone sensitivity of human infants. Journal of the Acoustical Society of America 84:1316–24. Pasqualotto, A., and F.N. Newell. 2007. The role of visual experience on the representation and updating of novel haptic scenes. Brain and Cognition 65:184–94. Patterson, M.L., and J.F. Werker. 2002. Infants’ ability to match dynamic phonetic and gender information in the face and voice. Journal of Experimental Child Psychology 81:93–115. Paus, T. 2005. Mapping brain development and aggression. Canadian Child and Adolescent Psychiatry Review 14:10–5. Postma, A., S. Zuidhoek, M.L. Noordzij, and A.M. Kappers. 2008. Haptic orientation perception benefits from visual experience: Evidence from early-blind, late-blind, and sighted people. Perception & Psychophysics 70:1197–206. Putzar, L., I. Goerendt, K. Lange, F. Rosler, and B. Roder. 2007. Early visual deprivation impairs multisensory interactions in humans. Nature Neuroscience 10:1243–5. Rentschler, I., M. Jüttner, E. Osman, A. Müller, and T. Caelli. 2004. Development of configural 3D object recognition. Behavioural Brain Research 149:107–11. Rose, S.A. 1981. Developmental changes in infants’ retention of visual stimuli. Child Development 52:227–33. Rose, S.A., and H.A. Ruff. 1987. Cross-modal abilities in human infants. In Handbook of Infant Development, ed. J.D. Osofsky, 318–62. New York: Wiley. Röder, B., F. Rosler, and C. Spence. 2004. Early vision impairs tactile perception in the blind. Current Biology 14:121–4. Röder, B., A. Kusmierek, C. Spence, and T. Schicke. 2007. Developmental vision determines the reference frame for the multisensory control of action. Proceedings of the National Academy of Sciences of the United States of America 104:4753–8. Sann, C., and A. Streri. 2007. Perception of object shape and texture in human newborns: evidence from crossmodal transfer tasks. Developmental Science 10:399–410. Schorr, E.A., N.A. Fox, V. van Wassenhove, and E.I. Knudsen. 2005. Auditory-visual fusion in speech perception in children with cochlear implants. Proceedings of the National Academy of Sciences of the United States of America 102:18748–50. Smith, S.E., and A. Chatterjee. 2008. Visuospatial attention in children. Archives of Neurology 65:1284–8. Stein, B.E. 2005. The development of a dialogue between cortex and midbrain to integrate multisensory information. Experimental Brain Research 166:305–15. Stein, B.E., E. Labos, and L. Kruger. 1973. Sequence of changes in properties of neurons of superior colliculus of the kitten during maturation. Journal of Neurophysiology 36:667–79. Stein, B.E., M.A. Meredith, and M.T. Wallace. 1993. The visually responsive neuron and beyond: multisensory integration in cat and monkey. Progress in Brain Research 95:79–90. Stein, B.E., T.J. Perrault, T.R. Stanford, and B.A. Rowland. 2009a. Postnatal experiences influence how the brain integrates information from different senses. Frontiers in Integrative Neuroscience 3:21. Stein, B.E., T.R. Stanford, and B.A. Rowland. 2009b. The neural basis of multisensory integration in the midbrain: Its organization and maturation. Hearing Research 258:4–15. Streri, A. 2003. Cross-modal recognition of shape from hand to eyes in human newborns. Somatosensory & Motor Research 20:13–8. Streri, A., M. Lhote, and S. Dutilleul. 2000. Haptic perception in newborns. Developmental Science 3:319–27. Streri, A., E. Gentaz, E. Spelke, and G. van de Walle. 2004. Infants’ haptic perception of object unity in rotating displays. Quarterly Journal of Experimental Psychology A 57:523–38. Streri, A., C. Lemoine, and E. Devouche. 2008. Development of inter-manual transfer of shape information in infancy. Developmental Psychobiology 50:70–6.
362
The Neural Bases of Multisensory Processes
Striano, T., and E. Bushnell. 2005. Haptic perception of material properties by 3-month-old infants. Infant Behavior and Development 28:266–89. Sunanto, J., and H. Nakata. 1998. Indirect tactual discrimination of heights by blind and blindfolded sighted subjects. Perceptual and Motor Skills 86:383–6. Trehub, S.E., B.A. Schneider, and J.L. Henderson. 1995. Gap detection in infants, children, and adults. Journal of the Acoustical Society of America 98:2532–41. Tremblay, C., F. Champoux, P. Voss, B.A. Bacon, F. Lepore, and H. Theoret. 2007. Speech and non-speech audio-visual illusions: a developmental study. PLoS One 2, e742. Trommershäuser, J., M. Landy, and K. Körding (eds.) (in press). Sensory cue integration. New York: Oxford Univ. Press. Ungar, S., M. Blades, and C. Spencer. 1995. Mental rotation of a tactile layout by young visually impaired children. Perception 24:891–900. Wallace, M.T., and B.E. Stein. 1997. Development of multisensory neurons and multisensory integration in cat superior colliculus. Journal of Neuroscience 17:2429–44. Wallace, M.T., and B.E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior colliculus. Journal of Neuroscience 21:8886–94. Wallace, M.T., and B.E. Stein. 2007. Early experience determines how the senses will interact. Journal of Neurophysiology 97:921–6. Wallace, M.T., T.J. Perrault Jr., W.D. Hairtston, and B.E. Stein. 2004. Visual experience is necessary for the development of multisensory integration. Journal of Neuroscience 24:9580–4. Watt, S.J., M.F. Bradshaw, T.J. Clarke, and K.M. Elliot. 2003. Binocular vision and prehension in middle childhood. Neuropsychologia 41:415–20. Wilkinson, L.K., M.A. Meredith, and B.E. Stein. 1996. The role of anterior ectosylvian cortex in cross-modality orientation and approach behavior. Experimental Brain Research 112:1–10. Woldorff, M.G., C.C. Gallen, S.A. Hampson, S.A. Hillyard, C. Pantev, D. Sobel, and F.E. Bloom. 1993. Modulation of early sensory processing in human auditory cortex during auditory selective attention. Proceedings of the National Academy of Sciences of the United States of America 90:8722–6.
19
Phonetic Recalibration in Audiovisual Speech Jean Vroomen and Martijn Baart
CONTENTS 19.1 19.2 19.3 19.4
Introduction........................................................................................................................... 363 A Short Historical Background on Audiovisual Speech Aftereffects...................................364 Seminal Study on Lip-Read–Induced Recalibration............................................................. 365 Other Differences between Recalibration and Selective Speech Adaptation . ..................... 367 19.4.1 Buildup...................................................................................................................... 367 19.4.2 Dissipation................................................................................................................. 368 19.4.3 Recalibration in “Speech” versus “Nonspeech” Mode.............................................. 368 19.5 Stability of Recalibration over Time..................................................................................... 369 19.5.1 Basic Phenomenon of Lexically Induced Recalibration............................................ 369 19.5.2 Lip-Read–Induced versus Lexically Induced Recalibration..................................... 370 19.6 Developmental Aspects......................................................................................................... 372 19.7 Computational Mechanisms.................................................................................................. 373 19.8 Neural Mechanisms............................................................................................................... 374 19.9 Conclusion............................................................................................................................. 376 Acknowledgments........................................................................................................................... 376 References....................................................................................................................................... 376
19.1 INTRODUCTION In the literature on cross-modal perception, there are two important findings that most researchers in this area will know about, although only few have ever made a connection between the two. The first is that perceiving speech is not solely an auditory, but rather a multisensory phenomenon. As many readers know by now, seeing a speaker deliver a statement can help decode the spoken message. The most famous experimental demonstration of the multisensory nature of speech is the socalled McGurk illusion: when perceivers are presented an auditory syllable /ba/ dubbed onto a face articulating /ga/, they report “hearing” /da/ (McGurk and MacDonald 1976). The second finding goes back more than 100 years ago to Stratton (1896). He performed experiments with goggles and prisms that radically changed his visual field, thereby creating a conflict between vision and pro prioception. What he experienced is that after wearing prisms for a couple of days, he adapted to the upside-down visual world and he learned to move along in it quite well. According to Stratton, the visual world had changed as it sometimes appeared to him as if it was “right side up,” although others such as Held (1965) argued later that it was rather the sensory–motor system that was adapted. What these two seemingly different phenomena have in common is that in both cases an artificial conflict between the senses is created about an event that should yield congruent data under normal circumstances. Thus, in the McGurk illusion, there is a conflict between the auditory system that hears the syllable /ba/ and the visual system that sees the face of a speaker saying /ga/, in the prism case there is a conflict between proprioception that may feel the hand going upward and the 363
364
The Neural Bases of Multisensory Processes
visual system that sees the same hand going downward. In 2003, the commonality between these two phenomena led us (Bertelson et al. 2003) to question whether one might also observe longterm adaptation effects with audiovisual speech as reported by Stratton for prism adaptation. To be more specific, to the best of our knowledge, nobody had ever examined whether auditory speech perception would adapt as a consequence of exposure to the audiovisual conflict present in McGurk stimuli. This was rather surprising given that the original paper by McGurk and MacDonald is one of the most widely cited papers in this research area (more than 1500 citations by January 2009). Admittedly though, on first sight it may look as a somewhat exotic enterprise to examine whether listeners adapt to speech sounds induced by exposure to an audiovisual conflict. After all, why would adaptation to a video of an artificially dubbed speaker be of importance? Experimental psychologists should rather spend their time on fundamental aspects of perception and cognition that remain constant across individuals, cultures, and time, and not on matters that are flexible and adjustable. And, indeed, the dominant approach in speech research did just that by focusing on the information available in the speech signal, the idea being that there must be acoustic invariants in the signal that are extracted during perception. On second thought though, it has turned out to be extremely difficult to find a set of acoustic invariant parameters that work for all contexts, cultures, and speakers, and the question we addressed might open an alternative view: Rather than searching for acoustic invariants, it might be equally fruitful to examine whether and how listeners adjust their phoneme boundaries so as to accommodate the variation they hear. In 2003, we (Bertelson et al. 2003) reported that phonetic recalibration induced by McGurk-like stimuli can indeed be observed. We termed the phenomenon “recalibration” in analogy with the much better known “spatial recalibration,” as we considered it a readjustment or a fine-tuning of an already existing phonetic representation. In the same year, and in complete independence, Norris et al. (2003) reported a very similar phenomenon they named “perceptual learning in speech.” The basic procedure in both studies was very similar: Listeners were presented with a phonetically ambiguous speech sound and another source of contextual information that disambiguated that sound. In our study, we presented listeners a sound halfway between /b/ and /d/ with as context the video of a synchronized face that articulated /b/ or /d/ (in short, lip-read information), whereas in the study of Norris et al. (2003), an ambiguous /s/-/f/ sound was heard embedded in the context of an f- or s-biasing word (e.g., “witlo-s/f” was an f-biasing context because “witlof” is a word in Dutch meaning “chicory,” but “witlos” is not a Dutch word). Recalibration (or perceptual learning) was subsequently measured in an auditory-only identification test in which participants identified members of a speech continuum. Recalibration manifested itself as a shift in phonetic categorization toward the contextually defined speech environment. Listeners thus increased their report of sounds consistent with the context they had received before, so more /b/ responses after exposure to lip-read /b/ rather than lip-read /d/, and more /f/ responses after exposure to f-biasing words rather than /s/-biasing words. Presumably, this shift reflected an adjustment of the phoneme boundary that had helped listeners to understand speech better in the prevailing input environment. After these seminal reports, there have been a number of studies that examined phonetic recalibration in more detail (Baart and Vroomen 2010a, 2010b; Cutler et al. 2008; Eisner and McQueen 2005, 2006; Jesse and McQueen 2007; Kraljic et al. 2008a, 2008b; Kraljic and Samuel 2005, 2006, 2007; McQueen et al. 2006a, 2006b; Sjerps and McQueen 2010; Stevens 2007; van Linden and Vroomen 2007, 2008; Vroomen and Baart 2009a, 2009b; Vroomen et al. 2004, 2007). In what follows, we will provide an overview of this literature and, given the topic of this book, we will focus on the audiovisual case.
19.2 A SHORT HISTORICAL BACKGROUND ON AUDIOVISUAL SPEECH AFTEREFFECTS Audiovisual speech has been extensively studied in recent decades ever since seminal reports came out that lip-read information is of help in noisy environments (Sumby and Pollack 1954) and, given
Phonetic Recalibration in Audiovisual Speech
365
appropriate dubbings, can change the auditory percept (McGurk and MacDonald 1976). More recently, audiovisual speech has served in functional magnetic resonance imaging (fMRI) studies as an ideal stimulus for studying the neural substrates of multisensory integration (Calvert and Campbell 2003). Surprisingly though, until 2003 there were only three studies that had focused on auditory aftereffects as a consequence of exposure to audiovisual speech, despite the fact that aftereffects were extensively studied in the late 1970s, and are again nowadays. Roberts and Summerfield (1981) were the first to study the aftereffects of audiovisual speech, although they were not searching for recalibration, but for “selective speech adaptation,” which is basically a contrastive effect. The main question of their study was whether selective speech adaptation takes place at a phonetic level of processing, as originally proposed by Eimas and Corbit (1973), or at a more peripheral acoustic level. Selective speech adaptation differs from recalibration in that it does not depend on an (intersensory) conflict, but rather on the repeated presentation of an acoustically nonambiguous sound that reduces report of sounds similar to the repeating one. For example, hearing /ba/ many times reduces subsequent report of /ba/ on a /ba/–/da/ test continuum. Eimas and Corbit (1973) argued that selective speech adaptation reflects the neural fatigue of hypothetical “linguistic feature detectors,” but this viewpoint was not left unchallenged by others claiming that it reflects a mere shift in criterion (Diehl 1981; Diehl et al. 1978, 1980) or a combination of both (Samuel 1986), or possibly that even more qualitatively different levels of analyses are involved (Samuel and Kat 1996). Still, others (Sawusch 1977) showed that the size of selective speech adaptation depends on the degree of spectral overlap between the adapter and test sound, and that most— although not all—of the effect is acoustic rather than phonetic. Roberts and Summerfield (1981) found a clever way to disentangle the acoustic from the phonetic contribution using McGurk-like stimuli. They dubbed a canonical auditory /b/ (a “good” acoustic example) onto the video of lip-read /b/ to create an audiovisual congruent adapter and also dubbed the auditory /b/ onto a lip-read /g/ to create a compound stimulus intended to be perceived as /d/. Results showed that repeated exposure to the congruent audiovisual adapter induced similar contrastive aftereffects on a /b/–/d/ test continuum (i.e., fewer /b/ responses) as the incongruent adapter AbVg, even though the two adapters were perceived differently. This led the authors to conclude that selective speech adaptation mainly depends on the acoustic quality of the stimulus, and not the perceived or lip-read one. Saldaña and Rosenblum (1994) and Shigeno (2002) later replicated these results with different adapters. Saldaña and Rosenblum compared auditory-only adapters with audiovisual ones (auditory /b/ paired with visual /v/, a compound stimulus perceived mostly as /v/), and found, as Roberts and Summerfield did, that the two adapters again behaved similarly, as in both cases fewer /b/ responses were obtained at the test. Similar results were also found by Shigeno (2002) using AbVg as adapter, and by us (unpublished) demonstrating that selective speech adaptation depends, to a large extent, on repeated exposure to nonambiguous sounds.
19.3 SEMINAL STUDY ON LIP-READ–INDUCED RECALIBRATION Bertelson et al. (2003) also studied the aftereffects of audiovisual incongruent speech; however, their focus was not on selective speech adaptation, but on recalibration. Their study was inspired by previous work on aftereffects of the “ventriloquist illusion.” In the ventriloquist illusion, the apparent location of a target sound is shifted toward a visually displaced distracter that moves or flashes in synchrony with that sound (Bermant and Welch 1976; Bertelson and Aschersleben 1998; Bertelson and Radeau 1981; Klemm 1909). Besides this immediate bias in sound localization, one can also observe aftereffects following a prolonged exposure to a ventriloquized sound (Bertelson et al. 2006; Radeau and Bertelson 1974, 1976, 1977). For the ventriloquist situation, it was known that the location of target sounds was shifted toward the visual distracter seen during the preceding exposure phase. These aftereffects were similar to the ones following exposure to discordant visual and proprioceptive information—as when the apparent location of a hand is displaced through a
366
The Neural Bases of Multisensory Processes
prism (Welch and Warren 1986)—and they all showed that exposure to spatially conflicting inputs recalibrates processing in the respective modalities in a way that reduces the conflict. Despite the fact that immediate biases and recalibration effects had been demonstrated for spatial conflict situations, the existing evidence was less complete for conflicts regarding audiovisual speech. Here, immediate biases were well known (the McGurk effect) as well as selective speech adaptation, but recalibration had not been demonstrated. Bertelson et al. (2003) hypothesized that a slight variation in the paradigm introduced by Roberts and Summerfield (1981) might nevertheless produce these effects, thus revealing recalibration. The key factor was the ambiguity of the adapter sound. Rather than using a conventional McGurk-like stimulus containing a canonical (and incongruent) sound, Bertelson et al. (2003) used an ambiguous sound. They created a synthetic sound halfway between /aba/ and /ada/ (henceforth A? for auditory ambiguous) and dubbed it onto the corresponding video of a speaker pronouncing /aba/ or /ada/ (A?Vb and A?Vd, respectively). Participants were shortly exposed to either A?Vb or A?Vd, and then tested on identification of A?, and the two neighbor tokens on the auditory continuum A? −1 and A? +1. Each exposure block contained eight adapters (either A?Vb or A?Vd) immediately followed by six test trials. These exposuretest blocks were repeated many times, and participants were thus biased toward both /b/ and /d/ in randomly ordered blocks (a within-subjects factor). Results showed that listeners quickly learned to label the ambiguous sound in accordance with the lip-read information they were exposed to shortly before. Listeners thus gave more /aba/ responses after exposure to A?Vb than after exposure to A?Vd, and this was taken as the major sign of recalibration (see Figure 19.1, left panel). In a crucial control experiment, Bertelson et al. (2003) extended these findings by incorporating audiovisual congruent adapters AbVb and AdVd. These adapters were not expected to induce recalibration because there was no conflict between sound and vision. Rather, they were expected to induce selective speech adaptation due to the nonambiguous nature of the sound. As shown in Figure 19.1, right panel, these adapters indeed induced selective speech adaptation, and there were thus less /aba/ responses after exposure to AbVb than AdVd, an effect in the opposite direction of recalibration. The attractiveness of these control stimuli was that participants could not distinguish them from the ones with an ambiguous sound that induced recalibration. This was confirmed in an identification test in which A?Vb and AbVb were perceived as /b/, and A?Vd and AdVd as /d/ on nearly
Proportion of 'b' responses
1.00
A?Vaba A?Vada
1.00
0.80
0.80
0.60
0.60
0.40
0.40
0.20
0.20
/A?/-1
/A?/ Test token
/A?/+1
AVaba AVada
/A?/-1
/A?/ Test token
/A?/+1
FIGURE 19.1 Percentage of /aba/ responses as a function of auditory test token. Left panel: After exposure to audiovisual adapters with ambiguous sounds, A?Vaba or A?Vada, there were more responses consistent with the adapter (recalibration). Right panel: After exposure to audiovisual adapters with non-ambiguous sounds, AVaba or AVada, there were fewer responses consistent with the adapter (selective speech adaptation). (Results on auditory tests adapted from Bertelson, P. et al., Psychol. Sci., 14, 6, 592–597, 2003; Exp. 2.)
Phonetic Recalibration in Audiovisual Speech
367
100% the trials. Moreover, even when participants were explicitly asked to discriminate AbVb from A?Vb, and AdVd from A?Vd, they performed at chance level because there was a strong immediate bias by the lip-read information that captured the identity of the sound (Vroomen et al. 2004). These findings imply that the difference in aftereffects induced by adapters with ambiguous versus nonambiguous sounds cannot be ascribed to some (unknown) explicit strategy of the listeners, because listeners simply could not know whether they were actually hearing adapters with ambiguous sounds (causing recalibration) or nonambiguous sounds (causing selective speech adaptation). This confirms the sensory, rather than strategic, nature of the phenomenon. Lip-read–induced recalibration of speech was thus demonstrated, and appeared to be contingent upon exposure to an ambiguous sound and another source of information that disambiguated that sound. Selective speech adaptation, on the other hand, occurred in the absence of an intersensory conflict, and mainly depended on repeated presentation of an acoustically clear sound. These two forms of aftereffects had been studied before in other perceptual domains, but always in isolation. Recalibration was earlier demonstrated for the ventriloquist situation and analogous intramodal conflicts such as between different cues to visual depth (see reviews by Epstein 1975 and Wallach 1968), whereas contrastive aftereffects where already well known for color, curvature (Gibson 1933), size (Blakemore and Sutton 1969) and motion (Anstis 1986; Anstis et al. 1998).
19.4 OTHER DIFFERENCES BETWEEN RECALIBRATION AND SELECTIVE SPEECH ADAPTATION After the first report, several follow-up studies appeared examining differences in the manifestation of lip-read–induced recalibration and selective speech adaptation. Besides that the two phenomena differed in the direction of their aftereffects, differences were found in their buildup, dissipation, and the processing mode in which they occur (i.e., “speech mode” versus “nonspeech mode”).
19.4.1 Buildup To examine the buildup of recalibration and selective speech adaptation, Vroomen et al. (2007) presented the four previously used audiovisual adapters (A?Vb, A?Vd, AbVb, and AdVd) in a continuous series of exposure trials, and inserted test trials after 1, 2, 4, 8, 16, 32, 64, 128, and 256 exposures. The aftereffects of adapters with ambiguous sounds (A?Vb and A?Vd) were already at ceiling after only eight exposure trials (the level of exposure used in the original study) and then, surprisingly, after 32 exposure trials fell off with prolonged exposure (128 and 256 trials). Aftereffects of adapters with nonambiguous sounds AbVb and AdVd were again contrastive and the effect linearly increased with the (log-)number of exposure trials. The latter fitted well with the idea that selective speech adaptation reflects an accumulative process, but there was no apparent reason why a learning effect such as recalibration would reverse at some point. The authors suggested that two processes might be involved here: selective speech adaptation running in parallel with recalibration and eventually taking over. Recalibration would then dominate the observed aftereffects in the early stages of exposure, whereas selective speech adaptation would become manifest later on. Such a phenomenon was indeed observed when data of an “early” study (i.e., one before the initial reports on phonetic recalibration) by Samuel (2001) were reanalyzed. Samuel exposed his participants to massive repeated presentations of an ambiguous /s/–/∫/ sound in the context of either an /s/-final word (e.g., /bronchiti?/, from bronchitis), or a /∫/-final one (e.g., /demoli?/, from demolish). In this situation, one might expect recalibration to take place. However, in post-tests involving identification of the ambiguous /s/–/∫/ sound, Samuel obtained contrastive aftereffects indicative of selective speech adaptation, so less /s/ responses after exposure to /bronchiti?/ than /demoli?/ (and thus an effect in the opposite direction later reported by Norris et al. 2003). This made him conclude that a lexically restored phoneme produces selective speech adaptation similar to a nonambiguous
368
The Neural Bases of Multisensory Processes
sound. Others, though—including Samuel—would report in later years recalibration effects using the same kinds of stimuli (Kraljic and Samuel 2005; Norris et al. 2003; van Linden and Vroomen 2007). To examine this potential conflict in more detail, Samuel allowed us to reanalyze the data from his 2001 study as a function of number of exposures blocks (Vroomen et al. 2007). His experiment consisted of 24 exposure blocks, each containing 32 adapters. Contrastive aftereffects were indeed observed for the majority of blocks following block 3, showing the reported dominant role of selective speech adaptation. Crucially, though, a significant recalibration effect was obtained (so more /s/ responses after exposure to /bronchiti?/ than /demoli?/) in the first block of 32 exposure trials, which, in the overall analyses, was swamped by selective adaptation in later blocks. Thus, the same succession of aftereffects dominated early by recalibration and later by selective adaptation was already present in Samuel’s data. The same pattern may therefore occur generally during prolonged exposure to various sorts of conflict situations involving ambiguous sounds.
19.4.2 Dissipation A study by Vroomen et al. (2004) focused on how long recalibration and selective speech adaptation effects last over time. Participants were again exposed to A?Vb, A?Vd, AdVd, or AbVb, but rather than using multiple blocks of eight adapters and six test trials in a within-subject design (as in the original study), participants were now exposed to only one of the four adapters (a between-subject factor) in three similar blocks consisting of 50 exposure trials followed by 60 test trials. The recal ibration effect turned out to be very short-lived and lasted only about six test trials, whereas the selective speech adaptation effect was observed even after 60 test trials. The results again confirmed that the two phenomena were different from each other. Surprisingly, though, lip-read–induced recalibration turned out to be rather short-lived, a finding to which we will return later.
19.4.3 Recalibration in “Speech” versus “Nonspeech” Mode The basic notion underlying recalibration is that it occurs to the extent that there is a (moderate) conflict between two information sources that refer to the same external event (for speech, a particular phoneme or gesture). Using sine-wave speech (SWS), one can manipulate whether a sound is assigned to a speech sound (for short, a phoneme) or not, and thus whether recalibration occurs. In SWS, the natural richness of speech sounds is reduced, and an identical sound can be perceived as speech or nonspeech depending on the listener’s perceptual mode (Remez et al. 1981). Tuomainen et al. (2005) demonstrated that when SWS sounds are delivered in combination with lip-read speech, listeners who are in speech mode show almost similar intersensory integration as when presented with natural speech (i.e., lip-read information strongly biases phoneme identification), but listeners who do not know the SWS tokens are derived from speech (nonspeech mode) show no, or only negligible, integration. Using these audiovisual SWS stimuli, we reasoned that recalibration should only occur for listeners in speech mode (Vroomen and Baart 2009a). To demonstrate this, participants were first trained to distinguish the SWS tokens /omso/ and /onso/ that were the two extremes of a seven-step continuum. Participants in the speech group labeled the tokens as /omso/ or /onso/, whereas the nonspeech group labeled the same sounds as “1” and “2”. Listeners were then shortly exposed to the adapters A?Vomso and A?Vonso (to examine recalibration), and AomsoVomso and AonsoVonso (to examine selective speech adaptation), and then tested on the three most ambiguous SWS tokens that were identified as /omso/ or /onso/ in the speech group, and as “1” or “2” in the nonspeech group. As shown in Figure 19.2, recalibration only occurred for listeners in speech mode (the upper left panel), but not in nonspeech mode (lower left panel), whereas selective speech adaptation occurred likewise in speech and nonspeech mode (right panels). Attributing the auditory and visual signal to the same event was thus of crucial importance for recalibration, whereas selective speech adaptation did not depend on the interpretation of the signal.
369
Phonetic Recalibration in Audiovisual Speech Exposure to adapters with ambiguous component/A?/ (recalibration)
Speech mode
Proportion of 'n' responses
1.00
V/omso/ V/onso/
0.80
0.40
0.40
0.20
0.20 /A?-1/
/A?/
Auditory token
/A?+1/
V/omso/ V/onso/
0.80
0.00
0.40
0.40
0.20
0.20
/A?/
Auditory token
/A?+1/
/A?/
Auditory token
0.00
/A?+1/
V/omso/ V/onso/
0.80 0.60
/A?-1/
/A?-1/
1.00
0.60
0.00
V/omso/ V/onso/
0.80 0.60
1.00
Proportion of '2' responses
1.00
0.60
0.00
Non speech mode
Exposure to unambiguous adapters (selective speech adaptation)
/A?-1/
/A?/
Auditory token
/A?+1/
FIGURE 19.2 Curves represent mean proportion of /onso/ responses as a function of auditory test tokens of continuum after exposure to auditory ambiguous adapters A?Vonso and A?Vomso (left panels), and auditory non-ambiguous adapters AonsoVonso and AomsoVomso (right panels). Upper panels show performance of speech group; lower panels show performance of non-speech group. Error bars = 1 SEM. (Adapted from Vroomen, J., and Baart, M., Cognition, 110, 2, 254–259, 2009a.)
19.5 STABILITY OF RECALIBRATION OVER TIME As noted before, studies on phonetic recalibration began with a pair of seminal studies, one of which used lip-read information (Bertelson et al. 2003) and the other used lexical information (Norris et al. 2003). Both showed in essence the same phenomenon, but the results were nevertheless strikingly different in one aspect: Whereas lip-read–induced recalibration was short-lived, lexical recalibration turned out to be robust and long-lived in the majority of studies. The reasons for this difference are still not well understood, but in the following subsections we will give an overview of the findings and some hints on possible causes.
19.5.1 Basic Phenomenon of Lexically Induced Recalibration It is well known that in natural speech there are, besides the acoustic and lip-read input, other information sources that inform listeners about the identity of the phonemes. One of the most important ones is the listener’s knowledge about the words in the language, or for short, lexical information.
370
The Neural Bases of Multisensory Processes
As an example, listeners can infer that an ambiguous sound somewhere in between /b/ and /d/ in the context of “?utter” is more likely to be /b/ rather than /d/ because “butter” is a word in English, but not “dutter.” There is also, as for lip-reading, an immediate lexical bias in phoneme identification known as the Ganong effect (Ganong 1980). For example, an ambiguous /g/-/k/ sound is “heard” as /g/ when followed by “ift” and as /k/ when followed by “iss” because “gift” and “kiss” are words, but “kift” and “giss” are not. The corresponding aftereffect that results from exposure to such lexically biased phonemes was first reported by Norris et al. (2003). They exposed listeners to a sound halfway between /s/ and /f/ in the context of an f- or s-biasing word, and listeners were then tested on an /es/-/ef/ continuum. As comparable to the lip-reading case, the authors observed recalibration (or in their terminology, perceptual learning), so more /f/ responses after an f-biasing context, and more /s/ responses after an s-biasing context. Later studies confirmed the original finding and additionally suggested that the effect is speakerspecific (Eisner and McQueen 2005), or possibly, token-specific (Kraljic and Samuel 2006, 2007), that it generalizes to words outside the original training set (McQueen et al. 2006a) and across syllabic positions (Jesse and McQueen 2007), and that it arises automatically as a consequence of hearing the ambiguous pronunciations in words (McQueen et al. 2006b). Although Jesse and McQueen (2007) demonstrated that lexical recalibration can generalize to word onset positions, there was no lexical learning when listeners were exposed to ambiguous onset words (Jesse and McQueen 2007). However, Cutler et al. (2008) showed that legal word-onset phonotactic information can induce recalibration, presumably because this type of information can be used immediately, whereas lexical knowledge about the word is not yet available when one hears the ambiguous onset. Moreover, lexical retuning is not restricted to a listener’s native language as the English fricative theta ([θ] as in “bath”) presented in a Dutch f- or s-biasing context induced lexical learning (Sjerps and McQueen 2010).
19.5.2 Lip-Read –Induced versus Lexically Induced Recalibration So far, these data fit well with studies on lip-read–induced recalibration, but there was one remarkable difference: the duration of the reported aftereffects. Whereas lip-read–induced recalibration was found to be fragile and short-lived (in none of the tests did it survive more than 6 to 12 test trials; van Linden and Vroomen 2007; Vroomen and Baart 2009b; Vroomen et al. 2004), two studies on lexically induced recalibration found that it was long-lived and resistant to change. Kraljic and Samuel (2005) demonstrated that recalibration of an ambiguous /s/ or /∫/ remained robust after a 25-min delay. Moreover, it remained robust even after listeners heard canonical pronunciations of /s/ and /∫/ during the 25-min delay, and the only condition in which the effect became somewhat smaller, although not significantly so, was when listeners heard canonical pronunciations of /s/ and /∫/ from the same speaker that they had originally adjusted to. In another study, Eisner and McQueen (2006) showed that lexically induced recalibration remained stable over an even much longer delay (12 h) regardless of whether subjects slept in the intervening time. At this stage, one might conclude that, simply by their nature, lexical recalibration is robust and lip-read recalibration is fragile. However, these studies were difficult to compare in a direct way because there were many procedural and item-specific differences. To examine this in more detail, van Linden and Vroomen (2007) conducted a series of experiments on lip-read–induced and lexically induced recalibration using the same procedure and test stimuli to check various possibilities. They used an ambiguous stop consonant halfway between /t/ or /p/ that could be disambiguated by either lip-read or lexical information. For lip-read recalibration, the auditory ambiguous sound was embedded in Dutch nonwords such as “dikasoo?” and dubbed onto the video of lip-read “dikasoop” or “dikasoot,” for lexical recalibration the ambiguous sound was embedded in Dutch p-words such as “microscoo?” (“microscope”) or t-words such as “idioo?” (“idiot”). Across experiments, results showed that lip-read and lexically recalibration effects were very much alike. The lip-read aftereffect tended to be bigger than the lexical one, which was to be
Phonetic Recalibration in Audiovisual Speech
371
expected because lip-reading has in general a much stronger impact on sound processing than lexical information does (Brancazio 2004). Most important, though, both aftereffects dissipated equally fast, and thus there was no sign that lexical recalibration by itself was more robust than lip-read– induced recalibration. The same study also explored whether recalibration would become more stable if a contrast phoneme from the opposite category was included in the set of exposure items. Studies reporting longlasting lexical aftereffects presented during the exposure not only words with ambiguous sounds, but also filler words with nonambiguous sounds taken from the opposite side of the phoneme continuum. For example, in the exposure phase of Norris et al. (2003) in which an ambiguous s/f sound was biased toward /f/, there were not only exposure stimuli such as “witlo?” that supposedly drive recalibration, but also contrast stimuli containing the nonambiguous sound /s/ (e.g., naaldbos). Such contrast stimuli might serve as an anchor or a comparison model for another stimulus, and aftereffects thought to reflect recalibration might in this way be boosted because listeners set the criterion for the phoneme boundary in between the ambiguous token and the extreme one. The obtained aftereffect may then reflect the contribution of two distinct processes: one related to recalibration proper (i.e., a shift in the phoneme boundary meant to reduce the conflict between the sound and the context), the other to a strategic and long-lasting criterion setting operation that depends on the presence of an ambiguous phoneme and a contrast phoneme from the opposing category. Our results showed that aftereffects did indeed become substantially bigger if a contrast stimulus was included in the exposure set but crucially, aftereffects did not become more stable. Contrast stimuli thus boosted the effect, but did not explain why sometimes long-lasting aftereffects were obtained. Another factor that was further explored was whether participants were biased in consecutive exposure phases toward only one or both phoneme categories. One can imagine that if listeners are biased toward both a t-word and p-word (as was standard in lip-read studies, but not the lexical ones), the boundary setting that listeners adopt may become fragile. However, this did not turn out to be critical: Regardless of whether participants were exposed to only one or both contexts, it did not change the size and stability of the aftereffect. Of note is that lip-read and lexical recalibration effect did not vanish if a 3-min silent interval separated the exposure phase from test. The latter finding indicates that recalibration as such is not fragile, but that other factors possibly related to the test itself may explain why aftereffects dissipate quickly during testing. One such possibility might be that listeners adjust their response criterion in the course of testing such that the two response alternatives are chosen about equally often. However, although this seems reasonable, it does not explain why in the same test selective speech adaptation effects remained stable in due course of testing (Vroomen et al. 2004). Still, another possibility is that recalibration needs time to consolidate, and sleep might be a factor in this. Eisner and McQueen (2006) explored this possibility and observed equal amounts of lexically induced aftereffects after 12 h, regardless of whether listeners had slept. Vroomen and Baart (2009b) conducted a similar study on lip-read–induced recalibration, including contrast phonemes to boost the aftereffect, and tested participants twice: immediately after the lip-read exposure phase (as standard) and after a 24-h period during which participants had slept. The authors found large recalibration effects in the beginning of the test (the first six test trials), but they again quickly dissipated with prolonged testing (within 12 trials), and did not reappear after a 24-h delay. It may also be the case that the dissipation rate of recalibration depends on the acoustic nature of the stimuli. The studies that found quick dissipation used intervocalic and syllable-final stops that varied in place of articulation (/aba/-/ada/ and /p/-/t/), whereas others used fricatives (/f-s/ and /s-∫/; Eisner and McQueen 2006; Kraljic et al. 2008b; Kraljic and Samuel 2005) or syllable-initial voiced–voiceless stop consonants (/d-t/ and /b/-/p/; Kraljic and Samuel 2006). If the stability of the phenomenon depends on the acoustic nature of the cues (e.g., place cues might be more vulnerable), one may observe aftereffects to differ in this respect as well. Another variable that may play a role is whether the same ambiguous sound is used during the exposure phase, or whether the token varies from trial to trial. Stevens (2007, Chapter 3) examined
372
The Neural Bases of Multisensory Processes
token variability in lexical recalibration using similar procedures as those used by Norris et al. (2003), but listeners were either exposed to the same or different versions of an ambiguous s/f sound embedded in s- and f-biasing words. His design also included contrast phonemes from the opposite phoneme category that should have boosted the effect. When the ambiguous token was constant, as in the original study by Norris et al., the learning effect was quite substantial on the first test trials, but quickly dissipated with prolonged testing, and in the last block (test trials 36–42), lexical recalibration had disappeared completely akin to lip-read–induced recalibration (van Linden and Vroomen 2007; Vroomen and Baart 2009b; Vroomen et al. 2004). When the sound varied from trial to trial, the overall learning effect was much smaller and restricted to the f-bias condition, but the effect lasted longer. Another aspect that may play a role is the use of filler items. Studies reporting short-lived aftereffects tended to use massed trials of adapters with either no filler items separating the critical items, or only a few contrast stimuli. Others, reporting long-lasting effects used lots of filler items separating the critical items (Eisner and McQueen 2006; Kraljic and Samuel 2005, 2006; Norris et al. 2003). Typically, about 20 critical items containing the ambiguous phoneme were interspersed among 180 fillers items. A classic learning principle is that massed trials produce weaker learning effect than spaced trials (e.g., Hintzman 1974). At present, it remains to be explored whether recal ibration is sensitive to this variable as well and whether it follows the same principle. One other factor that might prove to be valuable in the discussion regarding short- versus long-lasting effects is that extensive testing may override, or wash out, the learning effects (e.g., Stevens 2007) because during the test, listeners might “relearn” their initial phoneme boundary. Typically, in the Bertelson et al. (2003) paradigm, more test trials are used than in the Norris et al. (2003) paradigm, possibly influencing the time course of the observed effects. For the time being, though, the critical difference between the short- and long-lasting recalibration effects remains elusive.
19.6 DEVELOPMENTAL ASPECTS Several developmental studies have suggested that integration of visual and auditory speech is already present early in life (e.g., Desjardins and Werker 2004; Kuhl and Meltzoff 1982; Rosenblum et al. 1997). For example, 4-month-old infants, exposed to two faces articulating vowels on a screen, look longer at the face that matches an auditory vowel played simultaneously (Kuhl and Meltzoff 1982; Patterson and Werker 1999) and even 2-month-old infants can detect the correspondence between auditory and visually presented speech (Patterson and Werker 2003). However, it has also been found that the impact of lip-reading on speech perception increases with age (Massaro 1984; McGurk and MacDonald 1976). Such a developmental trend in the impact of visual speech may suggest that lip-reading is an ability that needs to mature, or alternatively that linguistic experience is necessary, possibly because visible articulation is initially not well specified. Exposure to audiovisual speech may then be necessary to develop phonetic representations more completely. Van Linden and Vroomen (2008) explored whether there is a developmental trend in the use of lip-read information by testing children of two age groups, 5-year-olds and 8-year-olds, on lipread–induced recalibration. Results showed that the older children learned to categorize the initially ambiguous speech sound in accord with the previously seen lip-read information, but this was not the case for the younger age group. Presumably, 8-year-olds adjusted their phoneme boundary to reduce the phonetic conflict in the audiovisual stimuli and this shift may occur in the older group but not the younger one because lip-reading is not yet very effective at the age of 5. However, Teinonen et al. (2008) were able to observe learning effects induced by lip-read speech testing much younger infants with a different procedure. They exposed 6-month-old infants to speech sounds from a /ba/-/da/ continuum. One group was exposed to audiovisual congruent mappings so that tokens from the /ba/ side of the continuum were combined with lip-read /ba/, and tokens from the /da/ side were combined with lip-read /da/. Two other groups of infants were presented with the same sounds from the /ba/-/da/ continuum, but in one group all auditory tokens were
Phonetic Recalibration in Audiovisual Speech
373
paired with lip-read /ba/, and in the other group all auditory tokens were paired with lip-read /da/. In the latter two groups, lip-read information thus did not inform the infant how to divide the sounds from the continuum into two categories. A preference procedure revealed that infants in the former, but not in the two latter groups learned to discriminate the tokens from the /ba/–/da/ continuum. These results suggest that infants can use lip-read information to adjust the phoneme boundary of an auditory speech continuum. Further testing, however, is clearly needed so as to understand what critical experience is required and how it relates to lip-read–induced recalibration in detail.
19.7 COMPUTATIONAL MECHANISMS How might the retuning of phoneme categories be accomplished from a computational perspective? In principle, there are many solutions. All that is needed is that the system is able to use context to change the way an ambiguous phoneme is categorized. Recalibration may be initiated whenever there is discrepancy between the phonological representations induced by the auditory and lip-read input, or for lexical recalibration, if there is a mismatch between the auditory input and the one expected from lexical information. Recalibration might be accomplished at the phonetic level by moving the position of the whole category, by adding the ambiguous sound as a new exemplar of the appropriate category, or by changing the category boundaries. For example, in models such as TRACE (McClelland and Elman 1986) or Merge (Norris et al. 2000), speech perception is envisaged in layers where features activate phonemes that in their turn activate words. Here, one can implement recalibration as a change in the weights of the auditory feature-to-phoneme connections (Mirman et al. 2006; Norris et al. 2000). Admittedly though, the differences among these various possibilities are quite subtle. Yet, the extent to which recalibration generalizes to new exemplars might be of relevance to distinguish these alternatives. One observation is that repeated exposure to typical McGurk stimuli containing a canonical sound, say nonambiguous auditory /ba/ combined with lip-read /ga/, does not invoke a retuning effect of the canonical /ba/ sound itself (Roberts and Summerfield 1981). A “good” auditory /ba/ thus remains a good example of its category despite that there is lip-read input repeatedly telling that the phoneme belonged to another category. This may suggest that recalibration reflects a shift in the phoneme boundary, and thus only affecting sounds near that boundary, rather than that the acoustic-to-phonetic connections are rewired on the fly, thus affecting all sounds, and in particular the trained ones. In contrast with this view, however, there are also some data indicating the opposite. In particular, a closer inspection of the data from Shigeno (2002) shows that a single exposure to a McGurklike stimulus AbVg—here, called here an anchor—and followed by a target sound did change the quality of canonical target sound /b/ (see Figure 2 of Shigeno 2002). This finding may be more in line with the idea of a “rewiring” of feature-to-phoneme connections, or alternatively that this specific trained sound is incorporated into the new category. However, it is clear that more data are needed that specifically address these details. There has also been a controversy about whether lexical recalibration actually occurs at the same processing level as immediate lexical bias. Norris et al. (2003) have argued quite strongly in favor of two types of lexical influence in speech perception: a lexical bias on phonemic decision-making that does not involve any form of feedback, and lexical feedback necessary for perceptual learning. Although there is a recent report supporting the idea of a dissociation between lexical involvement in online decisions and in lexical recalibration (McQueen et al. 2009), we never obtained any data that support this distinction—that is, we have not been able to dissociate bias (lip-read or lexical) from recalibration. In fact, listeners who were strongly biased by the lip-read or lexical context from the adapter stimuli (as measured in separate tests) also tended to show the biggest recalibration effects (van Linden and Vroomen 2007). Admittedly, this argument is only based on a correlation, and the correlation was at best marginally significant. Perhaps more relevant though are the SWS findings in which it was demonstrated that when lip-read context did not induce a cross-modal
374
The Neural Bases of Multisensory Processes
bias—namely, in the case where SWS stimuli were perceived as nonspeech—there was also no recalibration. Immediate bias and recalibration thus usually go hand in hand, and in order to claim that they are distinct, one would like to see empirical evidence in the form of a dissociation.
19.8 NEURAL MECHANISMS What are the neural mechanisms that underlie phonetic recalibration? Despite the fact that the integration of auditory and visual speech has been extensively studied with brain imaging methods (e.g., Callan et al. 2003; Calvert et al. 1997; Calvert and Campbell 2003; Campbell 2008; Colin et al. 2002; Klucharev et al. 2003; Sams et al. 1991; Stekelenburg and Vroomen 2007), so far only two studies have addressed the potential brain mechanisms involved in phonetic recalibration. Van Linden et al. (2007) used mismatch negativity (MMN) as a tool to examine whether a recalibrated phoneme left traces in the evoked potential. The MMN is a component in the event-related potential that signals an infrequent discriminable change in an acoustic or phonological feature of a repetitive sound (Näätänen et al. 1978), and its latency and amplitude is correlated with the behavioral discriminability of the stimuli (Lang et al. 1990). The MMN is thought to be generated through automatic change detection and is elicited regardless of sound relevance for the participant’s task (Näätänen 1992; Näätänen et al. 1993). The MMN is not only sensitive to acoustic changes, but also to learned language-specific auditory deviancy (Näätänen 2001; Winkler et al. 1999). Van Linden et al. (2007) used a typical oddball paradigm to elicit an MMN so as to investigate whether lexically induced recalibration penetrates the mechanisms of perception at early pre-lexical levels, and thus affect the way a sound is heard. The standard stimulus (delivered in 82% of the trials) was an ambiguous sound halfway between /t/ and /p/ in either a t-biasing context “vloo?” (derived from “vloot,” meaning “fleet”) or a p-biasing context “hoo?” (derived from “hoop,” meaning “hope”). For the deviant condition, the ambiguous sound was in both conditions replaced by an acoustically clear /t/, so “vloot” for the t-biasing context and “hoot” (a pseudoword in Dutch) for the p-biasing context. If subjects had learned to “hear” the sound as specified by the context, we predicted the perceptual change—as indexed by MMN—from /?/ → /t/ to be smaller in t-words than p-words, despite that the acoustic change was identical. As displayed in Figure 19.3, the MMN in t-words was indeed smaller than in p-words, thus confirming that recalibration might penetrate low-level auditory mechanisms. The second study concerned with potentially involved brain mechanisms used fMRI to examine the brain mechanisms that drive phonetic recalibration (Kilian-Hütten et al. 2008). The authors adapted the original study of Bertelson et al. (2003) for the fMRI scanner environment. Listeners were presented with a short block of eight audiovisual adapters containing the ambiguous /aba/-/ ada/ sound dubbed onto the video of lip-read /aba/ or /ada/ (A?Vb or A?Vd). Each exposure block was followed by six auditory test trials consisting of event-related forced-choice /aba/-/ada/ judgments. Functional runs were analyzed using voxelwise multiple linear regression (General Linear Model) of the blood oxygen level–dependent (BOLD) response time course. Brain regions involved in the processing of the audiovisual stimuli were identified by contrasting the activation blocks with a baseline. Moreover, a contrast based on behavioral performance was utilized so as to identify regions of interest (ROIs) whose activation during the recalibration phase would predict subsequent test performance (see also Formisano et al. 2008). Behaviorally, the results of Bertelson et al. (2003) were replicated in the fMRI environment, so there were more /aba/ responses after exposure to A?Vb than A?Vd. Also as expected, lip-read information during the exposure blocks elicited activation in typical areas, including primary and extrastriate visual areas, early auditory areas, superior temporal gyrus and sulcus (STG/STS), middle and inferior frontal gyrus (MFG, IFG), premotor regions, and posterior parietal regions. Most interestingly, the BOLD behavior analysis identified a subset of this network (MFG, IFG, and inferior parietal cortex) whose activity during audiovisual exposure correlated with the proportion of correctly recalibrated responses in the auditory test trials. Activation in areas MFG, IFG, and inferior parietal cortex thus predicted, on a trial-by-trial
50 100 150 Time (ms)
200
250
-50
4 0
-50
4 0
250
4 0
50 100 150 Time (ms)
2
2
-2
-4
μV
2
MMN
Fz
0
Standard Deviant MMN p-word
0
-4
μV
0
200
Fz
-2
Standard Deviant MMN t-word
-2
-4
μV
50 100 150 Time (ms)
200
MMN t-word
MMN p-word
250
-µV
±1.9 µV
-µV
FIGURE 19.3 Grand-averaged waveforms of standard, deviant, and MMN at electrode Fz for t-word condition (left panel) and p-word condition (middle panel). (Adapted from Vroomen, J. et al., Neuropsychologia, 45, 3, 572–577, 2007.) Right panel: MMNs and their scalp topographies for both conditions. Voltage map ranges in μV are displayed below each map. y-axis marks onset of acoustic deviation between /?/ and /t/.
-50
Fz
Phonetic Recalibration in Audiovisual Speech 375
376
The Neural Bases of Multisensory Processes
basis, the subjects’ percepts of ambiguous sounds to be tested about 10 s later. The functional interpretation of these areas is to be explored further, but the activation changes may reflect trial-by-trial variations in subjects’ processing of the audiovisual stimuli, which in turn influence recalibration and later auditory perception. For instance, variations in recruitment of attentional mechanisms and/or involvement of working memory might be of importance, although the latter seems to be unlikely (Baart and Vroomen 2010b).
19.9 CONCLUSION We reviewed literature that demonstrates that listeners adjust their phoneme boundaries to the prevailing speech context. Phonetic recalibration can be induced by lip-read and lexical context. Both yield converging data, although the stability of the effect varies quite substantially between studies for as yet unknown reasons. One reason could be that aftereffects as measured during tests reflect the contribution of both recalibration and selective speech adaptation that run in parallel but with different contributions over time. Several computational mechanisms have been proposed that can account for phonetic recalibration, but critical data that distinguish between these alternatives—in particular, about the generalization to new tokens—have not yet been collected. Phonetic recalibration leaves traces in the brain that can be examined with brain imaging techniques. Initial studies suggest that a recalibrated sound behaves like an acoustically real sound from that category, and possible loci (e.g., middle and inferior frontal gyrus, parietal cortex) that subserve recalibration have been identified. Further testing, however, is needed to examine this in more detail. Involvement of the parietal cortex could indicate that (verbal) short-term memory plays a role in phonetic recal ibration, although a recent study conducted by our group indicates that phonetic recalibration is not affected if subjects are involved in a difficult verbal or spatial short-term memory task (Baart and Vroomen 2010b). Moreover, auditory speech has also been shown to shift the interpretation of lip-read speech categories in a similar manner as auditory speech can be recalibrated by lip-read information, so the effect is genuinely bidirectional (Baart and Vroomen 2010a). On this view, audiovisual speech is like other cross-modal learning effects (e.g., the ventriloquist illusion) where bidirectional effects have been demonstrated.
ACKNOWLEDGMENTS We would like to thank Arthur Samuel and James McQueen for insightful comments on an earlier version of this manuscript.
REFERENCES Anstis, S. 1986. Motion perception in the frontal plane: Sensory aspects. In Handbook of perception and human performance, Vol. 2, Chap. 27, ed. K. R. Boff, L. Kaufman, and J. P. Thomas. New York: Wiley. Anstis, S., F. A. J. Verstraten, and G. Mather. 1998. The motion aftereffect. Trends in Cognitive Sciences 2: 111–117. Baart, M., and J. Vroomen. 2010a. Do you see what you are hearing?: Crossmodal effects of speech sounds on lipreading. Neuroscience Letters 471: 100–103. Baart, M., and J. Vroomen. 2010b. Phonetic recalibration does not depend on working memory. Experimental Brain Research 203: 575–582. Bermant, R. I., and R. B. Welch. 1976. Effect of degree of separation of visual–auditory stimulus and eye position upon spatial interaction of vision and audition. Perceptual and Motor Skills 42(43): 487–493. Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic Bulletin and Review 5(3): 482–489. Bertelson, P., I. Frissen, J. Vroomen, B. De Gelder B. et al. 2006. The aftereffects of ventriloquism: Patterns of spatial generalization. Perception and Psychophysics 68(3): 428–436. Bertelson, P., and M. Radeau. 1981. Cross-modal bias and perceptual fusion with auditory–visual spatial discordance. Perception and Psychophysics 29(6): 578–584.
Phonetic Recalibration in Audiovisual Speech
377
Bertelson, P., J. Vroomen, and B. De Gelder. 2003. Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Science 14(6): 592–597. Blakemore, C., and P. Sutton. 1969. Size adaptation: A new aftereffect. Science 166(902): 245–247. Brancazio, L. 2004. Lexical influences in audiovisual speech perception. Journal of Experimental Psychology: Human Perception and Performance 30(3): 445–463. Callan, D. E. et al. 2003. Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport 14(17): 2213–2218. Calvert, G. A., E. T. Bullmore, M. J. Brammer et al. 1997. Activation of auditory cortex during silent lipreading. Science 276(5312): 593–596. Calvert, G. A., and R. Campbell. 2003. Reading speech from still and moving faces: The neural substrates of visible speech. Journal of Cognitive Neuroscience 15(1): 57–70. Campbell, R. 2008. The processing of audio-visual speech: Empirical and neural bases. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363(1493): 1001–1010. Colin, C., M. Radeau, A. Soquet, D. Demolin, F. Colin, and P. Deltenre. 2002. Mismatch negativity evoked by the McGurk-MacDonald effect: A phonetic representation within short-term memory. Clinical Neurophysiology 113(4): 495–506. Cutler, A., J. M. McQueen, S. Butterfield, and D. Norris. 2008. Prelexically-driven perceptual retuning of phoneme boundaries. Proceedings of Interspeech 2008, Brisbane, Australia. Desjardins, R. N., and J. F. Werker. 2004. Is the integration of heard and seen speech mandatory for infants? Developmental Psychobiology 45: 187–203. Diehl, R. L. 1981. Feature detectors for speech: a critical reappraisal. Psychological Bulletin 89(1): 1–18. Diehl, R. L., J. L. Elman, and S. B. McCusker. 1978. Contrast effects on stop consonant identification. Journal of Experimental Psychology: Human Perception and Performance 4(4): 599–609. Diehl, R. L., M. Lang, and E. M. Parker. 1980. A further parallel between selective adaptation and contrast. Journal of Experimental Psychology: Human Perception and Performance 6(1): 24–44. Eimas, P. D., and J. D. Corbit. 1973. Selective adaptation of linguistic feature detectors. Cognitive Psychology 4: 99–109. Eisner, F., and J. M. McQueen. 2005. The specificity of perceptual learning in speech processing. Perception and Psychophysics 67(2): 224–238. Eisner, F., and J. M. McQueen. 2006. Perceptual learning in speech: Stability over time. Journal of the Acoustical Society of America 119(4): 1950–1953. Epstein, W. 1975. Recalibration by pairing: A process of perceptual learning. Perception 4: 59–72. Formisano, E., F. De Martino, M. Bonte, and R. Goebel. 2008. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322(5903): 970–973. Ganong, W. F. 1980. Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance 6(1): 110–125. Gibson, J. J. 1933. Adaptation, after-effects and contrast in the perception of curved lines. Journal of Experimental Psychology 18: 1–31. Held, R. 1965. Plasticity in sensory–motor systems. Scientific America 213(5): 84–94. Hintzman, D. L. 1974. Theoretical implications of the spacing effect. In Theories in cognitive psychology: The Loyola symposium, ed. R. L. Solso, 77–99. Potomac, MD: Erlbaum. Jesse, A., and J. M. McQueen. 2007. Prelexical adjustments to speaker idiosyncracies: Are they positionspecific? In Proceedings of Interspeech 2007, ed. H. V. Hamme and R. V. Son, 1597–1600. Antwerpen, Belgium: Causal Productions (DVD). Kilian-Hütten, N. J., J. Vroomen, and E. Formisano. 2008. One sound, two percepts: Predicting future speech perception from brain activation during audiovisual exposure. [Abstract]. Neuroimage 41, Supplement 1: S112. Klemm, O. 1909. Localisation von Sinneneindrücken bei disparaten Nebenreizen. Psychologische Studien 5: 73–161. Klucharev, V., R. Möttönen, and M. Sams. 2003. Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception. Brain Research, Cognitive Brain Research 18(1): 65–75. Kraljic, T., S. E. Brennan, and A. G. Samuel. 2008a. Accommodating variation: dialects, idiolects, and speech processing. Cognition 107(1): 54–81. Kraljic, T., and A. G. Samuel. 2005. Perceptual learning for speech: Is there a return to normal? Cognitive Psychology 51(2): 141–178. Kraljic, T., and A. G. Samuel. 2006. Generalization in perceptual learning for speech. Psychonomic Bulletin and Review 13(2): 262–268.
378
The Neural Bases of Multisensory Processes
Kraljic, T., and A. G. Samuel. 2007. Perceptual adjustments to multiple speakers. Journal of Memory and Language 56: 1–15. Kraljic, T., A. G. Samuel, and S. E. Brennan. 2008b. First impressions and last resorts: How listeners adjust to speaker variability. Psychological Science 19(4): 332–338. Kuhl, P. K., and A. N. Meltzoff. 1982. The bimodal perception of speech in infancy. Science 218: 1138–1141. Lang, H., T. Nyrke, M. Ek, O. Aaltonen, I. Raimo, and R. Näätänen. 1990. Pitch discrimination performance and auditory event-related potentials. In Psychophysiological Brain Research, vol. 1, ed. C. M. H. Brunia, A. W. K. Gaillard, A. Kok, G. Mulder, and M. N. Verbaten, 294–298. Tilburg: Tilburg University Press. Massaro, D. W. 1984. Children’s perception of visual and auditory speech. Child Development 55: 1777–1788. McClelland, J. L., and J. L. Elman. 1986. The TRACE model of speech perception. Cognitive Psychology 18(1): 1–86. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264: 746–748. McQueen, J. M., A. Cutler, and D. Norris. 2006a. Phonological abstraction in the mental lexicon. Cognitive Science 30: 1113–1126. McQueen, J. M., A. Jesse, and D. Norris. 2009. No lexical–prelexical feedback during speech perception or: Is it time to stop playing those Christmas tapes? Journal of Memory and Language 61: 1–18. McQueen, J. M., D. Norris, and A. Cutler. 2006b. The dynamic nature of speech perception. Language and Speech 49(1): 101–112. Mirman, D., J. L. McClelland, and L. L. Holt. 2006. An interactive Hebbian account of lexically guided tuning of speech perception. Psychonomic Bulletin and Review 13(6): 958–965. Norris, D., J. M. McQueen, and A. Cutler. 2000. Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences 23(3): 299–325 discussion: 325–370. Norris, D., J. M. McQueen, and A. Cutler. 2003. Perceptual learning in speech. Cognitive Psychology 47(2): 204–238. Näätänen, R. 1992. Attention and brain function. Hillsdale: Erlbaum. Näätänen, R. 2001. The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent. Psychophysiology 38: 1–21. Näätänen, R., A. W. K. Gaillard, and S. Mäntysalo. 1978. Early selective-attention effect in evoked potential reinterpreted. Acta Psychologica 42: 313–329. Näätänen, R., P. Paavilainen, H. Tiitinen, D. Jiang, and K. Alho. 1993. Attention and mismatch negativity. Psychophysiology 30: 436–450. Patterson, M., and J. F. Werker. 1999. Matching phonetic information in lips and voice is robust in 4.5-monthold infants. Infant Behavior and Development 22: 237–247. Patterson, M. L., and J. F. Werker. 2003. Two-month-old infants match phonetic information in lips and voice. Developmental Science 6(2): 191–196. Radeau, M., and P. Bertelson. 1974. The after-effects of ventriloquism. The Quarterly Journal of Experimental Psychology 26(1): 63–71. Radeau, M., and P. Bertelson. 1976. The effect of a textured visual field on modality dominance in a ventriloquism situation. Perception and Psychophysics 20: 227–235. Radeau, M., and P. Bertelson. 1977. Adaptation to auditory–visual discordance and ventriloquism in semirealistic situations. Perception and Psychophysics 22(2): 137–146. Remez, R. E., P. E. Rubin, D. B. Pisoni, and T. D. Carrell. 1981. Speech perception without traditional speech cues. Science 212: 947–949. Roberts, M., and Q. Summerfield. 1981. Audiovisual presentation demonstrates that selective adaptation in speech perception is purely auditory. Perception and Psychophysics 30(4): 309–314. Rosenblum, L. D., M. A. Schmuckler, and J. A. Johnson. 1997. The McGurk effect in infants. Perception and Psychophysics 59: 347–357. Saldaña, H. M., and L. D. Rosenblum. 1994. Selective adaptation in speech perception using a compelling audiovisual adaptor. Journal of the Acoustical Society of America 95(6): 3658–3661. Sams, M., R. Aulanko, M. Hämäläinen et al. 1991. Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters 127(1): 141–145. Samuel, A. G. 1986. Red herring detectors and speech perception: in defense of selective adaptation. Cognitive Psychology 18(4): 452–499. Samuel, A. G. 2001. Knowing a word affects the fundamental perception of the sounds within it. Psychological Science 12(4): 348–351. Samuel, A. G., and D. Kat. 1996. Early Levels of Analysis of Speech. Journal of Experimental Psychology: Human Perception and Performance 22(3): 676–694.
Phonetic Recalibration in Audiovisual Speech
379
Sawusch, J. R. 1977. Peripheral and central processes in selective adaptation of place of articulation in stop consonants. Journal of the Acoustical Society of America 62(3): 738–750. Shigeno, S. 2002. Anchoring effects in audiovisual speech perception. Journal of the Acoustical Society of America 111(6): 2853–2861. Sjerps, M. J., and J. M. McQueen. 2010. The bounds on flexibility in speech perception. Journal of Experimental Psychology: Human Perception and Performance 36: 195–211. Stekelenburg, J. J., and J. Vroomen. 2007. Neural correlates of multisensory integration of ecologically valid audiovisual events. Journal of Cognitive Neuroscience 19(12): 1964–1973. Stevens, M. 2007. Perceptual adaptation to phonological differences between language varieties. University of Gent, Gent (Ph.D.-thesis). Stratton, G. M. 1896. Some preliminary experiments on vision without inversion of the retinal image. Psychological Review 611–617. Sumby, W. H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26: 212–215. Teinonen, T., R. N. Aslin, P. Alku, and G. Csibra. 2008. Visual speech contributes to phonetic learning in 6-month-old infants. Cognition 108(3): 850-855. Tuomainen, J., T. S. Andersen, K. Tiippana, and M. Sams. 2005. Audio-visual speech perception is special. Cognition 96(1): B13–B22. van Linden, S., J. J. Stekelenburg, J. Tuomainen, and J. Vroomen. 2007. Lexical effects on auditory speech perception: An electrophysiological study. Neuroscience Letters 420(1): 49–52. van Linden, S., and J. Vroomen. 2007. Recalibration of phonetic categories by lipread speech versus lexical information. Journal of Experimental Psychology: Human Perception and Performance 33(6): 1483–1494. van Linden, S., and J. Vroomen. 2008. Audiovisual speech recalibration in children. Journal of Child Language 35(4): 809–822. Vroomen, J., and M. Baart. 2009a. Phonetic recalibration only occurs in speech mode. Cognition 110(2): 254–259. Vroomen, J., and M. Baart. 2009b. Recalibration of phonetic categories by lipread speech: Measuring aftereffects after a twenty-four hours delay. Language and Speech 52: 341–350. Vroomen, J., S. van Linden, B. de Gelder, and P. Bertelson. 2007. Visual recalibration and selective adaptation in auditory–visual speech perception: Contrasting build-up courses. Neuropsychologia 45(3): 572–577. Vroomen, J., S. van Linden, M. Keetels, B. de Gelder, and P. Bertelson. 2004. Selective adaptation and recalibration of auditory speech by lipread information: Dissipation. Speech Communication 44: 55–61. Wallach, H. 1968. Informational discrepancy as a basis of perceptual adaptation. In The neuropsychology of spatially oriented behaviour, ed. S. J. Freeman, 209–230. Homewood, IL: Dorsey. Welch, R. B., and D. H. Warren. 1986. In Handbook of perception and human performance, ed. K. R. Kaufman and J. P. Thomas, 1–36. New York: Wiley. Winkler, I., T. Kujala, and Y. Shtyrov. 1999. Brain responses reveal the learning of foreign language phonemes. Psychophysiology 36: 638–642.
20
Multisensory Integration and Aging Jennifer L. Mozolic, Christina E. Hugenschmidt, Ann M. Peiffer, and Paul J. Laurienti
CONTENTS 20.1 General Cognitive Slowing.................................................................................................... 383 20.2 Inverse Effectiveness............................................................................................................. 383 20.3 Larger Time Window of Integration...................................................................................... 385 20.4 Deficits in Attentional Control.............................................................................................. 385 20.5 An Alternative Explanation: Increased Noise at Baseline.................................................... 387 20.6 Summary and Conclusions.................................................................................................... 388 References....................................................................................................................................... 389 Effective processing of multisensory stimuli relies on both the peripheral sensory organs and central processing in subcortical and cortical structures. As we age, there are significant changes in all sensory systems and a variety of cognitive functions. Visual acuity tends to decrease and hearing thresholds generally increase (Kalina 1997; Liu and Yan 2007), whereas performance levels on tasks of motor speed, executive function, and memory typically decline (Rapp and Heindel 1994; Birren and Fisher 1995; Rhodes 2004). There are also widespread changes in the aging brain, including reductions in gray and white matter volume (Good et al. 2001; Salat et al. 2009), alterations in neurotransmitter systems (Muir 1997; Backman et al. 2006), regional hypoperfusion (Martin et al. 1991; Bertsch et al. 2009), and altered patterns of functional activity during cognitive tasks (Cabeza et al. 2004; Grady 2008). Given the extent of age-related alterations in sensation, perception, and cognition, as well as in the anatomy and physiology of the brain, it is not surprising that multisensory integration also changes with age. Several early studies provided mixed results on the differences between multisensory processing in older and younger adults (Stine et al. 1990; Helfer 1998; Strupp et al. 1999; Cienkowski and Carney 2002; Sommers et al. 2005). For example, Stine and colleagues (1990) reported that although younger adults’ memory for news events was better after audiovisual presentation than after auditory information alone, older adults did not show improvement during the multisensory conditions. In contrast, Cienkowski and Carney (2002) demonstrated that audiovisual integration on the McGurk illusion was similar for older and younger adults, and that in some conditions, older adults were even more likely to report the fusion of visual and auditory information than their young counterparts. Similarly, in a study examining the contribution of somatosensory input to participants’ perception of visuospatial orientation, Strupp et al. (1999) reported an age-related increase in the integration of somatosensory information into the multisensory representation of body orientation. Despite providing a good indication that multisensory processing is somehow altered in aging, the results of these studies are somewhat difficult to interpret due to their use of complex cognitive tasks and illusions, and to the variability in analysis methods. Several newer studies that
381
382
The Neural Bases of Multisensory Processes
have attempted to address these factors more clearly demonstrate that multisensory integration is enhanced in older adults (Laurienti et al. 2006; Peiffer et al. 2007; Diederich et al. 2008). On a two-choice audiovisual discrimination task, Laurienti and colleagues (2006) showed that response time (RT) benefits for multisensory versus unisensory targets were larger for older adults than for younger adults (Figure 20.1). That is, older adults’ responses during audiovisual conditions were speeded more than younger adults’, when compared with their respective responses during unisensory conditions. Multisensory gains in older adults remained significantly larger than those observed in younger adults, even after controlling for the presence of two targets in the multisensory condition (redundant target effect; Miller 1982, 1986; Laurienti et al. 2006). Using similar analysis methods, Peiffer et al. (2007) also reported increased multisensory gains in older adults. On a simple RT task, where average unisensory RTs were equivalent in younger and older adults, older adults actually responded faster than younger adults on multisensory trials because of their enhanced multisensory integration (Peiffer et al. 2007). Diederich and colleagues (2008) have also shown that older adults exhibit greater speeding of responses to multisensory targets than younger adults on a saccadic RT task. The analysis methods used in this experiment indicate a slowing of peripheral sensory processing, as well as a wider time window over which integration of auditory and visual stimuli can occur (Diederich et al. 2008). These experiments highlight several possible explanations that could help answer a critical question about multisensory processing in aging: Why do older adults exhibit greater integration of multisensory stimuli than younger adults? Potential sources of enhanced integration in older adults include age-related cognitive slowing not specific to multisensory processing, inverse effectiveness
14% Young Elderly
Probability difference
12% 10% 8% 6% 4% 2%
0% 250 –2% –4%
400
550
700
850
1000 1150 1300 1450 1600
Response time (ms)
FIGURE 20.1 Multisensory performance enhancements are significantly larger in older adults than in younger adults on a two-choice audiovisual discrimination paradigm. These curves illustrate multisensorymediated gains relative to race model, which is the summed probability of unisensory responses. Each curve is the difference between the cumulative distribution of response times for the multisensory condition and the race model cumulative distribution function. Thus, positive deflections in these curves represent responses to multisensory stimuli that were faster than would be predicted by independent processing of the auditory and visual stimulus components (i.e., multisensory integration). Significant multisensory facilitation was observed in younger adults 340–550 ms after stimulus onset, and the maximum benefit achieved was approximately 8.3%. Older adults exhibited significant multisensory gains over a broader temporal window (330–690 and 730–740 ms after stimulus onset), and had performance gains of about 13.5%. Thus, both younger and older participants demonstrated speeding of responses to multisensory stimuli that exceeded gains predicted by the race model; however, older adults benefited more from the multisensory stimulus presentation than did younger adults. (Adapted from Laurienti, P.J. et al., Neurobiol Aging, 27, 1155–1163, 2006, with permission from Elsevier.)
Multisensory Integration and Aging
383
associated with sensory deficits, alterations in the temporal parameters of integration, and inefficient top–down modulation of sensory processing. In the following sections we will investigate each of these possible explanations in greater detail and offer some alternative hypotheses for the basis of enhanced multisensory integration in older adults.
20.1 GENERAL COGNITIVE SLOWING It is well documented that older adults exhibit a general slowing of sensorimotor and cognitive processing that impacts their performance on nearly all tasks (Cerella 1985; Birren and Fisher 1995; Salthouse 2000). This general cognitive slowing is very mild on easy tasks, but is exacerbated on more demanding tasks that require more cognitive processing. For example, in a meta-analysis of age-related changes in performance on the Stroop color–word task, Verhaeghen and De Meersman (1998) showed that older adults were slower than younger adults on both the easier baseline condition and the more difficult interference condition of the task. If cognitive slowing did not factor into task performance, we would expect that any differences between younger and older adults (because of differences in sensory processing, motor responses, etc.) would remain constant across the different task conditions. Older adults however, were slowed down to a greater extent on the difficult task compared to the easy task. The authors interpreted these findings as support for the general cognitive slowing hypothesis rather than evidence of an age-related change specific to the skills assessed by the Stroop task (Verhaeghen and De Meersman 1998). In a typical multisensory paradigm, the multisensory trials contain redundant input (e.g., an auditory and a visual stimulus), whereas the unisensory trials contain only one input. Thus, the unisensory condition could be regarded as a more difficult task where more slowing would be expected in older adults. When multisensory RTs are then compared to unisensory RTs, it would appear as if the older adults were speeded more in the multisensory condition relative to the unisensory condition than the younger adults. If this were the case, increased multisensory gains observed in older adults might not be attributable to specific changes in multisensory processing, but could simply be an artifact of proportional differences in younger and older adults’ processing speed on tasks of different cognitive loads. To account for general cognitive slowing, Laurienti et al. (2006) log transformed multisensory and unisensory RTs from younger and older adults who performed the two-choice audiovisual discrimination task. Log transforming RTs is a post hoc adjustment that can help to equate young and old RTs and correct for differences related to general cognitive slowing (Cerella 1985; Salthouse 1988; Cornelissen and Kooijman 2000). Older adults still exhibited larger gains in multisensory integration than younger adults after log transforming the data, suggesting that differences between the age groups cannot be accounted for solely by general cognitive slowing (Laurienti et al. 2006). Peiffer et al. (2007) took additional steps to rule out general cognitive slowing as an explanation for age-related multisensory enhancements by using a very simple audiovisual detection task. The effects of general cognitive slowing are minimized on such tasks where RTs on unisensory trials are the same for younger and older adults (Yordanova et al. 2004). In this experiment there were no differences between old and young RTs on unisensory visual or auditory trials; however, on trials that contained simultaneous visual and auditory targets, older participants were significantly faster than young subjects (Peiffer et al. 2007). These results support the notion that there are specific age-related differences in multisensory processing that cannot be explained by general cognitive slowing.
20.2 INVERSE EFFECTIVENESS In addition to nonspecific slowing of cognitive processes evident in aging, older adults also demonstrate functional declines in all sensory systems. These functional declines are attributable both to age-related changes in the peripheral sensory organs, such as rigidity in the lens, loss of hair
384
The Neural Bases of Multisensory Processes
cells, and changes in cutaneous receptors and the olfactory epithelium (Kovács 2004; Liu and Yan 2007; Shaffer and Harrison 2007; Charman 2008), and to age-related alterations in how the central nervous system processes sensory information (Schmolesky et al. 2000; Cerf-Ducastel and Murphy 2003; Ostroff et al. 2003; Quiton et al. 2007). Reduced sensitivity or acuity in the individual sensory systems is another potential explanation for increased multisensory benefits for older adults, attributable to a governing principle of multisensory integration known as inverse effectiveness. According to this principle, decreasing the effectiveness of individual sensory stimuli increases the magnitude of multisensory enhancements (Meredith and Stein 1983, 1986). In other words, when an auditory or visual stimulus is presented just above threshold level, the gains produced by bimodal audiovisual presentation are larger than when the individual stimuli are highly salient. Early demonstrations of inverse effectiveness in the cat superior colliculus (Meredith and Stein 1983, 1986) have been extended to cat and monkey cortex (Wallace et al. 1992; Kayser et al. 2005) as well as both neural and behavioral data in humans (Hairston et al. 2003; Stevenson and James 2009). For example, Hairston and colleagues (2003) demonstrated that young participants with normal vision were able to localize unimodal visual and bimodal audiovisual targets equally well; however, when participants’ vision was artificially degraded, their localization abilities were significantly enhanced during audiovisual conditions relative to performance on visual targets alone. The evidence for inverse effectiveness as a source of enhanced multisensory integration in older adults is not yet clear. In the study performed by Peiffer and colleagues (2007; mentioned above), RTs on unisensory trials were similar for younger and older adults, yet the older adults still showed larger multisensory gains than the younger group. This finding suggests that other mechanisms beyond inverse effectiveness may be required to explain the age-related enhancements. The paradigm used in this study, however, matched the performance between populations using superthreshold stimuli and did not specifically investigate the consequence of degrading stimulus effectiveness. In a population composed exclusively of older adults, Tye-Murray et al. (2007) demonstrated that integration levels in an audiovisual speech perception task did not differ for older adults with mild-to-moderate hearing loss and older adults without hearing impairment. However, all testing in this experiment was conducted in the presence of background auditory noise (multitalker “babble”), and the level of this noise was adjusted for each participant so that the performance of the two groups was matched in the unisensory auditory condition. This design makes it difficult to address the interesting question of whether reduced stimulus effectiveness due to age-related hearing loss would increase multisensory integration in hearing-impaired versus normal-hearing older adults. Results from a study conducted by Cienkowski and Carney (2002) provide some clues on the effects of hearing loss on age-related integration enhancements. This experiment tested three groups of participants on the McGurk illusion: (1) young adults with normal hearing, (2) older adults with mild, but age-appropriate hearing loss, and (3) a control group of young adults with hearing thresholds artificially shifted to match the older adults. Both the older adults and the threshold-shifted controls were more likely to integrate the visual and auditory information than young, normal hearing participants in one experimental condition (Cienkowski and Carney 2002). In this condition, the participants viewed the McGurk illusion presented by a male talker. Interestingly, integration did not differ between the three groups when the illusion was presented by a female talker. Although the response patterns of the threshold-shifted controls closely matched that of the older adults with mild hearing loss, the level of integration experienced by each group across the different experimental conditions did not have a clear inverse relationship with successful unisensory target identification. For example, in an auditory-only condition, all groups were better at identifying syllables presented by the male talker than the female talker, yet levels of audiovisual integration were higher for all groups in the male-talker condition (Cienkowski and Carney 2002). If increased integration in this task were due simply to increased ambiguity in the auditory signals for older adults and control
Multisensory Integration and Aging
385
subjects (whose hearing thresholds were shifted by noise), then we would expect the highest levels of integration under conditions where unisensory performance was poorest. Clearly, more studies that carefully modulate signal intensities and compare the multisensory gains in younger and older adults will be needed to further characterize the role of inverse effectiveness in age-related multisensory enhancements.
20.3 LARGER TIME WINDOW OF INTEGRATION A common finding across many studies that compare distributions of RTs in younger and older adults is that older adults’ responses are both slower and more variable, creating distributions that are broader and shifted to the right in older adults relative to young (Hale et al. 1988; Morse 1993; Hultsch et al. 2002). Multisensory enhancements have also been demonstrated to occur over a wider distribution of RTs for older adults (Laurienti et al. 2006; Peiffer et al. 2007; Diederich et al. 2008). For example, in an audiovisual discrimination paradigm, Laurienti et al. (2006) reported that whereas younger adults showed multisensory behavioral facilitation 340–550 ms after stimulus onset, older adults began showing facilitation at approximately the same point (330 ms), but continued to show enhancements in responses made up to 740 ms after the audiovisual stimuli had been presented (see Figure 20.1). Recently, Diederich and colleagues (2008) have studied the temporal characteristics of integration in older and younger adults using a time-window-of-integration (TWIN) model. This model is able to distinguish between the relative contributions of early, peripheral sensory processes and subsequent, central integration processes to multisensory enhancements (Colonius and Diederich 2004; Diederich et al. 2008). Using a focused attention task where saccadic reaction time to a visual target was measured with and without an accessory auditory stimulus, the authors reported that older adults’ responses were slower, more variable, and showed greater multisensory enhancements than younger adults’ responses (Diederich et al. 2008). Additionally, the TWIN model analysis indicated that peripheral slowing in older adults resulted in a broader temporal window of multisensory integration. Despite this longer period for potential interaction between stimuli, increased RT and response variability in older adults actually reduce the probability that processing of both the auditory and visual stimulus will occur within this time window. Given this reduced probability of stimulus overlap, these data suggest that a longer time window for cross-modal interactions can only partially compensate for an age-related reduction in the probability that multisensory integration will occur (Diederich et al. 2008). Thus, a wider time window of integration in older adults is primarily the result of slower and more variable peripheral sensory processing, and cannot fully explain why, when integration does occur, its magnitude is larger in older adults.
20.4 DEFICITS IN ATTENTIONAL CONTROL In addition to stimulus properties such as timing, location, and intensity that can affect multisensory integration, there are also top–down cognitive factors that can modulate cross-modal interactions such as semantic congruence (Laurienti et al. 2004) and selective attention (Alsius et al. 2005; Talsma and Woldorff 2005; Talsma et al. 2007; Mozolic et al. 2008a). Of particular relevance to aging is selective attention, a top–down control mechanism that allows us to focus on a particular location, stimulus feature, or sensory modality while ignoring other possible options (Corbetta et al. 1990; Posner and Driver 1992; Spence and Driver 1997; Kastner and Ungerleider 2000; Spence et al. 2001). Attention to a particular sensory modality typically results in small behavioral benefits in the attended modality and larger deficits in the unattended modality (Spence and Driver 1997; Spence et al. 2001). Similarly, neuroimaging data suggest that modality-specific attention causes activity increases in cortical areas associated with the attended modality and activity decreases in brain regions associated with processing information from the unattended modality (Kawashima et al. 1995; Ghatan et al. 1998; Johnson and Zatorre 2006; Mozolic et al. 2008b).
386
The Neural Bases of Multisensory Processes
In young, healthy adults, dividing attention across multiple sensory modalities appears to be critical for multisensory integration, whereas restricting attention to a single sensory modality can abolish behavioral and neural enhancements associated with multisensory stimuli (Alsius et al. 2005; Talsma and Woldorff 2005; Talsma et al. 2007; Mozolic et al. 2008a). Many studies have demonstrated that older adults have deficits in attention and are more distracted by stimuli within and across sensory modalities (Dywan et al. 1998; Alain and Woods 1999; West and Alain 2000; Milham et al. 2002; Andres et al. 2006; Poliakoff et al. 2006; Yang and Hasher 2007; Healey et al. 2008). For example, Andres and colleagues (2006) reported that older adults were more distracted by irrelevant sounds than younger adults on an auditory–visual oddball paradigm. It would seem possible then, that increased multisensory integration in older adults could result from deficits in top–down attentional control that allow more cross-modal information to be processed. This apparently simple account of age-related increases in distractibility is complicated by the fact that there is also strong evidence suggesting that older adults can, in fact, successfully engage selective attention on a variety of tasks (Groth and Allen 2000; Verhaeghen and Cerella 2002; Madden et al. 2004; Townsend et al. 2006; Ballesteros et al. 2008; Hugenschmidt et al. 2009a; Hugenschmidt et al. 2009c). In a recent study, Hugenschmidt and colleagues (2009a) used a cued multisensory discrimination paradigm to demonstrate that older adults can reduce multisensory integration by attending to a single sensory modality in a similar manner as has been observed young adults (Mozolic et al. 2008a). However, multisensory integration was still enhanced in older adults relative to young because the levels of integration in older adults were significantly higher at baseline, in the absence of modality-specific attentional modulation (Figure 20.2). These results indicate that enhanced integration in older adults is not due to deficits in engaging top–down selective attention mechanisms, but could instead result from age-related increases in baseline cross-modal interactions. This alternative explanation may also help to account for the seemingly contradictory evidence that older adults are both more distractible than younger adults and equally able to engage selective attention.
Probability difference (%) (multisensory–race)
(a)
(b)
15
Divided Selective auditory Selective visual
10 5 0 200 –5
–10
400
600
800
1000
Reaction time (ms)
1200
1400
15
Divided Selective auditory Selective visual
10 5 0 200 –5 –10
400
600
800
1000
1200
1400
Reaction time (ms)
FIGURE 20.2 Selective attention reduces multisensory integration in younger and older adults. As in Figure 20.1, each curve represents the difference between the cumulative distribution for multisensory responses and the race model, and thus, positive deflections show time bins where multisensory integration was observed. In this cued, two-choice discrimination paradigm, multisensory and unisensory targets were presented under three different attention conditions: divided attention, selective auditory attention, and selective visual attention. Younger adults exhibited integration only during divided attention conditions (peak facilitation ≈ 5%); selective attention abolished multisensory gains (a). Older adults were also able to reduce multisensory integration during selective attention; however, due to higher levels of integration during the baseline divided attention condition (peak facilitation ≈ 10%), older adults still exhibited significant multisensory gains during selective attention (b). These data demonstrate that older adults are able to engage selective attention and modulate multisensory integration, yet have a general increase in the level of integration relative to younger adults that is independent of attention condition. (Adapted from Hugenschmidt, C.E. et al., Neuroreport, 20, 349–353, 2009a, with permission form Wolters Kluwer Health.)
Multisensory Integration and Aging
387
20.5 AN ALTERNATIVE EXPLANATION: INCREASED NOISE AT BASELINE In the cued multisensory discrimination paradigm mentioned above (Hugenschmidt et al. 2009a), older adults experienced multisensory enhancements even while selectively attending to a single sensory modality. These multisensory gains can be used to index the level of background sensory noise being processed, because behavioral enhancements can only result if irrelevant stimuli from the ignored sensory modality speed up responses to targets in the attended modality. When younger adults engage selective attention on this task, multisensory enhancements are abolished, indicating that extraneous sensory information from the ignored modality is being successfully suppressed (Mozolic et al. 2008a; Hugenschmidt et al. 2009a). In contrast, results in older adults appear paradoxical. They decrease multisensory integration during selective attention commensurately with younger adults, but in spite of this, they integrate nearly as much during selective attention as their younger counterparts do during divided attention (see Figure 20.2). This occurs because older adults show increased integration during the baseline divided attention state. When older adults engage selective attention, their relative level of integration suppression (peakdivided ≈ 10%; peakselective ≈ 5%) is similar to the attention-mediated drop seen in younger adults (peakdivided ≈ 5%; peakselective < 2%). However, because older adults have higher levels of integration at baseline, they still exhibit robust integration after attention-mediated suppression. The important point is that increased processing of irrelevant sensory information during selective attention is not due to a failure of attentional processes, but rather to a shift in the processing of sensory stimuli at baseline. Attention does serve to limit integration in older adults, but it does not appear to completely compensate for the increased level of background sensory noise (Hugenschmidt et al. 2009a). To directly test for age-related increases in background sensory processing, Hugenschmidt and colleagues (2009b) compared cerebral blood flow (CBF) during resting state and a visual steadystate task in younger and older adults. The hypothesis that older adults show higher baseline levels of sensory processing led to three sets of predictions. First, during resting state, CBF to the auditory cortex associated with background noise of the magnetic resonance imaging (MRI) scanner would be increased in older adults relative to younger adults. Second, during visual stimulation, both groups would have reduced auditory cortex CBF, but the relative amount of auditory CBF should still be higher in the older adults. Third, older adults would show reductions in cross-modal signal-to-noise ratio (SNR) during both resting and visual tasks. The cross-modal SNR was quantified as the ratio between CBF in the visual and auditory cortices. The results of this study support these claims, suggesting that older adults process more background sensory information than younger adults, demonstrated by increased CBF in the auditory cortex during rest and during visual stimulation. Despite the fact that both older and younger adults show comparable reductions in auditory CBF when engaged in a visual task, the older adults still have higher CBF in the auditory cortex in response to the ongoing, but task-irrelevant, scanner noise. This increase in background sensory processing also results in a reduced SNR for older adults during visual task performance (Hugenschmidt et al. 2009b). The results of this imaging study parallel the behavioral results discussed above, suggesting that enhanced multisensory integration in older adults is at least partly attributable to the fact that older adults are processing more sensory information than their younger counterparts, regardless of stimulus relevance or attentional state (Alain and Woods 1999; Rowe et al. 2006; Hugenschmidt et al. 2009a). Environmental conditions and task demands can determine whether increased sensory processing is beneficial or detrimental to performance. In certain controlled laboratory paradigms, processing more sensory information can be beneficial. For example, enhanced processing of both a visual target and a spatially coincident, semantically congruent auditory stimulus can result in speeded RTs (Laurienti et al. 2004). In the real world, however, we are constantly bombarded with information from all sensory modalities. We use top–down control mechanisms, like selective attention, to focus on important and task-relevant sensory information and filter out irrelevant and distracting stimuli. It seems that older adults can benefit from selective attention, yet because
388
The Neural Bases of Multisensory Processes
their baseline levels of sensory processing are elevated, they are still more distracted than younger adults when incoming sensory streams contain irrelevant or conflicting information. However, if the extraneous sensory information becomes task relevant, older adults will exhibit larger gains than younger adults, as information that was previously interfering with task performance becomes helpful in completing the task. Additional illustrations of the costs and benefits that older adults experience as a consequence of increased baseline sensory processing can be seen in unisensory distraction tasks (Rowe et al. 2006; Yang and Hasher 2007; Healey et al. 2008). In one example, Yang and Hasher (2007) demonstrated that older adults were more distracted by irrelevant pictures than young in a task that required participants to make semantic judgments about words that appeared superimposed on the pictures. In a very similar paradigm that modified task demands, however, older adults had an advantage (Rowe et al. 2006). In this experiment, younger and older adults were required to make same/different judgments about the pictures that appeared beneath an overlay containing irrelevant words. On a subsequent test of implicit memory for the irrelevant words, older adults actually showed better memory, indicating that they had indeed processed more “noise” or irrelevant background information than younger adults (Rowe et al. 2006). These studies support the notion that older adults are more distractible than younger adults because they do not adequately filter sensory noise, but when to-be-ignored information becomes relevant, older adults can actually benefit from increased background sensory processing. In spite of the accumulating evidence that baseline sensory processing changes with age, there is no clear evidence for an underlying neural mechanism. One potential source of age-related changes in baseline filtering parameters is dysregulation of the default mode network (DMN), an anatomically and physiologically defined system of structures thought to be involved in monitoring internal thoughts and the external environment at rest (Raichle et al. 2001; Greicius and Menon 2004; Buckner et al. 2008). Composed of regions such as the anterior cingulate, posterior cingulate/precuneus region, and the parietal cortex, the default mode network is most active during rest and becomes less active during most goal-directed behaviors (Raichle et al. 2001; Greicius and Menon 2004; Buckner et al. 2008). Several studies have reported that the DMN is not suppressed as effectively during external tasks in older adults as in young (Lustig et al. 2003; Grady et al. 2006; Persson et al. 2007). Failure to suppress default mode network activity has also been implicated in reduced stimulus processing during attentional lapses, increased frequency of task-unrelated thoughts, and increased error rates (McKiernan et al. 2006; Weissman et al. 2006; Li et al. 2007). A recent study by Stevens and colleagues (2008) directly linked increased background activity in auditory cortex during a visual task to DMN activity. In this functional MRI (fMRI) study, older and younger adults were asked to complete a visual working memory task in a noisy MRI scanner environment. When older adults made errors on this task, they had increased activity in the auditory cortex. In younger adults, however, error trials were not associated with increased auditory activation. This suggests that older adults were processing more background information than younger adults and that the increased processing was related to distraction by irrelevant auditory stimulation. Furthermore, increased auditory activity was associated with increased DMN activity, indicating that older adults’ vulnerability to distraction may be linked to age-related differences in suppression of the DMN (Stevens et al. 2008). It seems likely, therefore, that further characterization of the default mode network in aging may be important for understanding the neural basis of altered baseline sensory processing and enhanced multisensory integration in older adults.
20.6 SUMMARY AND CONCLUSIONS Given the existing literature on multisensory processing in aging, it appears that there is not yet a clear explanation for why older adults exhibit greater multisensory integration than younger adults. Based on the studies summarized in this review, several potential sources of increased integration can be ruled out as the sole cause of age-related gains. Experiments that apply adjustments for
Multisensory Integration and Aging
389
general cognitive slowing (Laurienti et al. 2006) or use paradigms that equate unisensory RTs for younger and older adults (Peiffer et al. 2007) demonstrate that multisensory gains are still larger for older participants. A large portion of the behavioral changes that older adults exhibit in these paradigms must therefore be specific to multisensory processing, rather than be attributed to the general effects of sensorimotor and cognitive slowing. Similarly, older adults’ broad time window of integration does not seem to be the source of their multisensory processing enhancements. The analysis methods used by Diederich and colleagues (2008) clearly show that older adults have a larger time interval over which multisensory integration can occur; however, this is the result of slowed peripheral sensory processing and does not appear to compensate for a decreased probability that the processing of multiple unisensory stimuli will overlap in time. This decreased probability of interaction between unisensory stimuli is because older adults’ unisensory processing times are slow and highly variable, and therefore two independent stimuli are less likely to be available for processing and integration at the same time. Yet if the two stimuli are integrated, the older adults are speeded more than younger adults (Diederich et al. 2008). Thus, older adults’ wider time window of integration, a consequence of increased RT and variability, does not provide an explanation as to why integration is stronger in older adults when it does occur. Another logical hypothesis is that older adults show enhanced multisensory integration because they are unable to use selective attention to filter incoming sensory information; however, age-related deficits in attentional control fail to adequately explain integration enhancements. Hugenscmidt et al. (2009c) have confirmed that older adults can successfully instantiate modality-specific selective attention and have further demonstrated that there is no age-related difference in the magnitude of multisensory integration reduction during selective attention (Hugenschmidt et al. 2009a). Rather than implicating selective attention deficits as the source of underlying increases in multisensory integration, data suggest that older adults differ from younger adults in the amount of baseline sensory processing. Findings from an MRI study of CBF support this notion, showing that auditory cortex CBF associated with task-irrelevant scanner noise is increased in older adults relative to young, both during rest and during a visual task (Hugenschmidt et al. 2009b). Increased activity in brain structures that comprise the default mode network has been implicated in the level of background sensory processing in older adults, and further investigation of the DMN may yield critical information about the nature of age-related changes in baseline sensory processing that can inform our understanding of multisensory integration in aging. Another potential mechanism for age-related increases in multisensory benefits that cannot be discounted is inverse effectiveness. To our knowledge, there have been no conclusive studies on the relationship between stimulus salience and multisensory gains in older adults. A recent fMRI study in younger adults, performed by Stevenson and colleagues (2009), demonstrated inverse effectiveness in the patterns of cortical activity during audiovisual presentations of speech and object stimuli. As the intensity of the auditory and visual stimulus components decreased, activation gains in the superior temporal sulcus during multisensory stimuli increased. In other words, highly effective sensory stimuli resulted in smaller activity changes in multisensory cortex compared to degraded stimuli. A similar experimental paradigm could be used to investigate the relationship between stimulus effectiveness and multisensory enhancements at the cortical level in younger and older adults. Over the past several years, we have learned a great a deal about how multisensory processing changes with age; however the mechanisms underlying age-related enhancements in multisensory integration are not yet clear. Further exploration of the connections between baseline sensory processing, stimulus salience, and multisensory gains should provide insight into the advantages and impairments older adults can experience from changes in multisensory integration.
REFERENCES Alain, C., and D. L. Woods. 1999. Age-related changes in processing auditory stimuli during visual attention: Evidence for deficits in inhibitory control and sensory memory. Psychol Aging 14:507–519.
390
The Neural Bases of Multisensory Processes
Alsius, A., J. Navarra, R. Campbell, and S. Soto-Faraco. 2005. Audiovisual integration of speech falters under high attention demands. CurrBiol 15:839–843. Andres, P., F. B. Parmentier, and C. Escera. 2006. The effect of age on involuntary capture of attention by irrelevant sounds: A test of the frontal hypothesis of aging. Neuropsychologia 44:2564–2568. Backman, L., L. Nyberg, U. Lindenberger, S. C. Li, and L. Farde. 2006. The correlative triad among aging, dopamine, and cognition: Current status and future prospects. Neurosci Biobehav Rev 30:791–807. Ballesteros, S., J. M. Reales, J. Mayas, and M. A. Heller. 2008. Selective attention modulates visual and haptic repetition priming: Effects in aging and Alzheimer’s disease. Exp Brain Res 189:473–483. Bertsch, K., D. Hagemann, M. Hermes, C. Walter, R. Khan, and E. Naumann. 2009. Resting cerebral blood flow, attention, and aging. Brain Res 1267:77–88. Birren, J. E., and L. M. Fisher. 1995. Aging and speed of behavior: Possible consequences for psychological functioning. Ann Rev Psychol 46:329–353. Buckner, R. L., J. R. Andrews-Hanna, and D. L. Schacter. 2008. The brain’s default network: Anatomy, function, and relevance to disease. Ann NY Acad Sci 1124:1–38. Cabeza, R., S. M. Daselaar, F. Dolcos, S. E. Prince, M. Budde, and L. Nyberg. 2004. Task-independent and task-specific age effects on brain activity during working memory, visual attention and episodic retrieval. Cereb Cortex 14:364–375. Cerella, J. 1985. Information processing rates in the elderly. Psychol Bull 98:67–83. Cerf-Ducastel, B., and C. Murphy. 2003. FMRI brain activation in response to odors is reduced in primary olfactory areas of elderly subjects. Brain Res 986:39–53. Charman, W. N. 2008. The eye in focus: Accommodation and presbyopia. Clinical and Experimental Optometry 91:207–225. Cienkowski, K. M., and A. E. Carney. 2002. Auditory–visual speech perception and aging. Ear Hear 23:439–449. Colonius, H., and A. Diederich. 2004. Multisensory interaction in saccadic reaction time: A time-window-ofintegration model. J Cogn Neurosci 16:1000–1009. Corbetta, M., F. M. Miezin, S. Dobmeyer, G. L. Shulman, and S. E. Petersen. 1990. Attentional modulation of neural processing of shape, color, and velocity in humans. Science 248:1556–1559. Cornelissen, F. W., and A. C. Kooijman. 2000. Does age change the distribution of visual attention? A comment on McCalley, Bouwhuis, and Juola (1995). J Gerontol B Psychol Sci Soc Sci 55:187–190. Diederich, A., H. Colonius, and A. Schomburg. 2008. Assessing age-related multisensory enhancement with the time-window-of-integration model. Neuropsychologia 46:2556–2562. Dywan, J., S. J. Segalowitz, and L. Webster. 1998. Source monitoring: ERP evidence for greater reactivity to nontarget information in older adults. Brain Cogn 36:390–430. Ghatan, P. H., J. C. Hsieh, K. M. Petersson, S. Stone-Elander, and M. Ingvar. 1998. Coexistence of attentionbased facilitation and inhibition in the human cortex. Neuroimage 7:23–29. Good, C. D., I. S. Johnsrude, J. Ashburner, R. N. Henson, K. J. Friston, and R. S. Frackowiak. 2001. A voxelbased morphometric study of ageing in 465 normal adult human brains. Neuroimage 14:21–36. Grady, C. L. 2008. Cognitive neuroscience of aging. Ann NY Acad Sci 1124:127–144. Grady, C. L., M. V. Springer, D. Hongwanishkul, A. R. McIntosh, and G. Winocur. 2006. Age-related changes in brain activity across the adult lifespan. J Cogn Neurosci 18:227–241. Greicius, M. D., and V. Menon. 2004. Default-mode activity during a passive sensory task: Uncoupled from deactivation but impacting activation. J Cogn Neurosci 16:1484–1492. Groth, K. E., and P. A. Allen. 2000. Visual attention and aging. Front Biosci 5:D284. Hairston, W. D., P. J. Laurienti, G. Mishra, J. H. Burdette, and M. T. Wallace. 2003. Multisensory enhancement of localization under conditions of induced myopia. Exp Brain Res 152:404–408. Hale, S., J. Myerson, G. A. Smith, and L. W. Poon. 1988. Age, variability, and speed: Between-subjects diversity. Psychology and Aging 3:407. Healey, M. K., K. L. Campbell, and L. Hasher. 2008. Cognitive aging and increased distractibility: Costs and potential benefits (Chapter 22). Prog Brain Res 169:353–363. Helfer, K. S. 1998. Auditory and auditory–visual recognition of clear and conversational speech by older adults. J Am Acad Audiol 9:234. Hugenschmidt, C. E., J. L. Mozolic, and P. J. Laurienti. 2009a. Suppression of multisensory integration by modality-specific attention in aging. Neuroreport 20:349–353. Hugenschmidt, C. E., J. L. Mozolic, H. Tan, R. A. Kraft, and P. J. Laurienti. 2009b. Age-related increase in cross-sensory noise in resting and steady-state cerebral perfusion. Brain Topography 20:241–251. Hugenschmidt, C. E., A. M. Peiffer, T. P. McCoy, S. Hayasaka, and P. J. Laurienti. 2009c. Preservation of crossmodal selective attention in healthy aging. Exp Brain Res 198:273–285.
Multisensory Integration and Aging
391
Hultsch, D. F., S. W. MacDonald, and R. A. Dixon. 2002. Variability in reaction time performance of younger and older adults. J Gerontol B Psychol Sci Soc Sci 57:101–115. Johnson, J. A., and R. J. Zatorre. 2006. Neural substrates for dividing and focusing attention between simultaneous auditory and visual events. Neuroimage 31:1673–1681. Kalina, R. E. 1997. Seeing into the future. Vision and aging. West J Med 167:253–257. Kastner, S., and L. G. Ungerleider. 2000. Mechanisms of visual attention in the human cortex. Annu Rev Neurosci 23:315–341. Kawashima, R., B. T. O’Sullivan, and P. E. Roland. 1995. Positron-emission tomography studies of crossmodality inhibition in selective attentional tasks: Closing the “mind’s eye.” Proc Natl Acad Sci U S A 92:5969–5972. Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48:373–384. Kovács, T. 2004. Mechanisms of olfactory dysfunction in aging and neurodegenerative disorders. Ageing Res Rev 3:215. Laurienti, P. J., J. H. Burdette, J. A. Maldjian, and M. T. Wallace. 2006. Enhanced multisensory integration in older adults. Neurobiol Aging 27:1155–1163. Laurienti, P. J., R. A. Kraft, J. A. Maldjian, J. H. Burdette, and M. T. Wallace. 2004. Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res 158:405–414. Li, C. S., P. Yan, K. L. Bergquist, and R. Sinha. 2007. Greater activation of the “default” brain regions predicts stop signal errors. Neuroimage 38:640–648. Liu, X., and D. Yan. 2007. Ageing and hearing loss. J Pathol 211:188–197. Lustig, C., A. Z. Snyder, M. Bhakta et al. 2003. Functional deactivations: Change with age and dementia of the Alzheimer type. Proc Natl Acad Sci U S A 100:14504. Madden, D. J., W. L. Whiting, R. Cabeza, and S. A. Huettel. 2004. Age-related preservation of top-down attentional guidance during visual search. PsycholAging 19:304. Martin, A. J., K. J. Friston, J. G. Colebatch, and R. S. Frackowiak. 1991. Decreases in regional cerebral blood flow with normal aging. J Cereb Blood Flow Metab 11:684–689. McKiernan, K. A., B. R. D’Angelo, J. N. Kaufman, and J. R. Binder. 2006. Interrupting the “stream of consciousness”: An fMRI investigation. Neuroimage 29:1185–1191. Meredith, M. A., and B. E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221:389–391. Meredith, M. A., and B. E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J Neurophysiol 56:640–662. Milham, M. P., K. I. Erickson, M. T. Banich et al. 2002. Attentional control in the aging brain: Insights from an fMRI study of the stroop task. Brain Cogn 49:277. Miller, J. 1982. Divided attention: Evidence for coactivation with redundant signals. Cogn Psychol 14:247–279. Miller, J. 1986. Time course of coactivation in bimodal divided attention. Percept Psychophys 40:331–343. Morse, C. K. 1993. Does variability increase with age? An archival study of cognitive measures. Psychol Aging 8:156–164. Mozolic, J. L., C. E. Hugenschmidt, A. M. Peiffer, and P. J. Laurienti. 2008a. Modality-specific selective attention attenuates multisensory integration. Exp Brain Res 184:39–52. Mozolic, J. L., D. Joyner, C. E. Hugenschmidt et al. 2008b. Cross-modal deactivations during modality-specific selective attention. BMC Neurol 8:35. Muir, J. L. 1997. Acetylcholine, aging, and Alzheimer’s disease. Pharmacol Biochem Behav 56:687–696. Ostroff, J. M., K. L. McDonald, B. A. Schneider, and C. Alain. 2003. Aging and the processing of sound duration in human auditory cortex. Hear Res 181:1–7. Peiffer, A. M., J. L. Mozolic, C. E. Hugenschmidt, and P. J. Laurienti. 2007. Age-related multisensory enhancement in a simple audiovisual detection task. Neuroreport 18:1077–1081. Persson, J., C. Lustig, J. K. Nelson, and P. A. Reuter-Lorenz, 2007. Age differences in deactivation: A link to cognitive control? J Cogn Neurosci 19:1021–1032. Poliakoff, E., S. Ashworth, C. Lowe, and C. Spence. 2006. Vision and touch in ageing: Crossmodal selective attention and visuotactile spatial interactions. Neuropsychologia 44:507–517. Posner, M. I., and J. Driver. 1992. The neurobiology of selective attention. Curr Opin Neurobiol 2:165–169. Quiton, R. L., S. R. Roys, J. Zhuo, M. L. Keaser, R. P. Gullapalli, and J. D. Greenspan. 2007. Age-related changes in nociceptive processing in the human brain. Ann NY Acad Sci 1097:175–178. Raichle, M. E., A. M. MacLeod, A. Z. Snyder, W. J. Powers, D. A. Gusnard, and G. L. Shulman. 2001. A default mode of brain function. Proc Natl Acad Sci U S A 98:676–682.
392
The Neural Bases of Multisensory Processes
Rapp, P. R., and W. C. Heindel. 1994. Memory systems in normal and pathological aging. Curr Opin Neurol 7:294–298. Rhodes, M. G. 2004. Age-related differences in performance on the Wisconsin card sorting test: A meta-analytic review. Psychol Aging 19:482–494. Rowe, G., S. Valderrama, L. Hasher, and A. Lenartowicz. 2006. Attentional disregulation: A benefit for implicit memory. Psychol Aging 21:826–830. Salat, D. H., D. N. Greve, J. L. Pacheco et al. 2009. Regional white matter volume differences in nondemented aging and Alzheimer’s disease. Neuroimage 44:1247–1258. Salthouse, T. A. 1988. The complexity of age × complexity functions: Comment on Charness and Campbell (1988). J Exp Psychol Gen 117:425. Salthouse, T. A. 2000. Aging and measures of processing speed. Biol Psychol 54:35–54. Schmolesky, M. T., Y. Wang, M. Pu, and A. G. Leventhal. 2000. Degradation of stimulus selectivity of visual cortical cells in senescent rhesus monkeys. Nat Neurosci 3:384–390. Shaffer, S. W., and A. L. Harrison, 2007. Aging of the somatosensory system: A translational perspective. Phys Ther 87:193–207. Sommers, M. S., N. Tye-Murray, and B. Spehar. 2005. Auditory–visual speech perception and auditory–visual enhancement in normal-hearing younger and older adults. Ear Hear 26:263–275. Spence, C., and J. Driver. 1997. On measuring selective attention to an expected sensory modality. Percept Psychophys 59:389–403. Spence, C., M. E. Nicholls, and J. Driver. 2001. The cost of expecting events in the wrong sensory modality. Percept Psychophys 63:330–336. Stevens, W. D., L. Hasher, K. S. Chiew, and C. L. Grady. 2008. A neural mechanism underlying memory failure in older adults. J Neurosci 28:12820–12824. Stevenson, R. A., and T. W. James. 2009. Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage 44:1210. Stine, E. A., A. Wingfield, and S. D. Myers. 1990. Age differences in processing information from television news: The effects of bisensory augmentation. J Gerontol 45:1–8. Strupp, M., V. Arbusow, C. Borges Pereira, M., Dieterich, and T. Brandt. 1999. Subjective straight-ahead during neck muscle vibration: Effects of ageing. Neuroreport 10:3191–3194. Talsma, D., and M. G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of effects on the evoked brain activity. J Cogn Neurosci 17:1098–1114. Talsma, D., T. J. Doty, and M. Woldorff. 2007. Selective attention and audiovisual integration: Is attending to both modalities a prerequisite for early integration? Cereb Cortex 17:679–690. Townsend, J., M. Adamo, and F. Haist. 2006. Changing channels: An fMRI study of aging and cross-modal attention shifts. Neuroimage. Tye-Murray, N., M.S. Sommers, and B. Spehar. 2007. Audiovisual integration and lipreading abilities of older adults with normal and impaired hearing. Ear Hear 28:656–668. Verhaeghen, P., and L. De Meersman. 1998. Aging and the Stroop effect: A meta-analysis. Psychol Aging 13:120–126. Verhaeghen, P., and J. Cerella. 2002. Aging, executive control, and attention: A review of meta-analyses. Neurosci Biobehav Rev 26:849–857. Wallace, M. T., M. A. Meredith, and B. E. Stein. 1992. Integration of multiple sensory modalities in cat cortex. Exp Brain Res 91:484–488. Weissman, D. H., K. C. Roberts, K. M. Visscher, and M. G. Woldorff. 2006. The neural bases of momentary lapses in attention. Nat Neurosci 9:971–978. West, R., and C. Alain. 2000. Age-related decline in inhibitory control contributes to the increased Stroop effect observed in older adults. Psychophysiology 37:179. Yang, L., and L. Hasher. 2007. The enhanced effects of pictorial distraction in older adults. J Gerontol B Psychol Sci Soc Sci 62:230–233. Yordanova, J., V. Kolev, J. Hohnsbein, and M. Falkenstein. 2004. Sensorimotor slowing with ageing is mediated by a functional dysregulation of motor-generation processes: Evidence from high-resolution eventrelated potentials. Brain 127:351–362.
Section V Clinical Manifestations
21
Neurophysiological Mechanisms Underlying Plastic Changes and Rehabilitation following Sensory Loss in Blindness and Deafness Ella Striem-Amit, Andreja Bubic, and Amir Amedi
CONTENTS 21.1 Introduction........................................................................................................................... 395 21.2 Rehabilitation following Sensory Loss.................................................................................. 397 21.2.1 Sensory Substitution Devices.................................................................................... 397 21.2.2 Sensory Restoration Approaches...............................................................................400 21.2.3 Functional Visual Rehabilitation...............................................................................402 21.3 Neural and Cognitive Consequences of Sensory Loss..........................................................403 21.3.1 Evidence for Robust Plasticity Promoted by Sensory Loss.......................................404 21.3.2 Principles Guiding Reorganization following Sensory Loss.....................................406 21.3.3 Plasticity following Sensory Loss across the Lifespan.............................................407 21.3.4 Neurophysiologic Mechanisms Underlying Plastic Changes in the Blind................409 21.4 Rehabilitation-Induced Plasticity.......................................................................................... 412 21.4.1 Plasticity after SSD Use and Its Theoretical Implications........................................ 412 21.5 Concluding Remarks and Future Directions......................................................................... 414 References....................................................................................................................................... 415
21.1 INTRODUCTION We live in a society that is based on vision. Visual information is used for orienting in our environment, identifying objects in our surroundings, alerting us to important events that require our attention, engaging in social interactions, and many more necessary functions so we can efficiently function in everyday life. Similarly, audition is used for communication and for guiding our attention to potentially important or even dangerous events (e.g., the sound of a nearing car). Thus, the loss of any of these modalities decreases the quality of life and represents a severe challenge to efficient functioning for tens of millions of individuals worldwide (World Health Organization, Fact Sheet no. 282, May 2009). Furthermore, it has a significant economic impact on society. It is therefore not surprising that numerous approaches and potential solutions designed to overcome these difficulties have been put forward to help the sensory-impaired. Although such compensation devices, for example, highly sensitive hearing aids, volume enhancing devices for different technologies, and medical–technological solutions such as cochlear implants, are much more successful for the auditorily impaired, compensation and technological aids for the visually impaired, 395
396
The Neural Bases of Multisensory Processes
the focus of this chapter, are currently much less effective. At this point, the most commonly used rehabilitation techniques for blindness are sensory aids such as the Braille reading system, mobility aids such as canes, or more contemporary devices such as obstacle detectors, laser canes, or ultrasonic echolocating devices. All of these devices derive from the premise that the blind are deprived of numerous important types of information typically acquired through vision and attempt to supply such information through other sensory systems. Typically, these attempts utilize the normal perceptual processing of the system they exploit for communicating the relevant information. In contrast to this, the new generation of sensory aids takes one step further, as it aims to deliver pure visual information to the brains of the blind, either by surgically or medically restoring the missing functionality of the eyes and brain areas typically exploited for visual processing (as is already done in audition to some extent, mainly for successful perception of auditory single speaker communication, using cochlear implants; Fallon et al. 2008; Spelman 2006; Geers 2006) or by “teaching” these regions to take over visual functions after introducing them to visual information transmitted through nonvisual modalities. The first group of such techniques, neuroprosthetic medical solutions is invasive—requiring surgical interventions, and extremely expensive at the moment. These approaches show some promising results only in very restricted populations of the blind, but unfortunately only to a limited extent. However, once the technological (and neuroscientific, i.e., the ability of the brain to make sense of the restored input; see below) obstacles are resolved, these may hold great future promise for restoring natural vision to many blind individuals, similar to the enormous progress in the treatment of deafness that has been made since the development of cochlear implants. Similarly, novel medical approaches for replacing the damaged sensory receptor cells, via stem cell transplantation (which will be discussed briefly) may be very promising in the further future, but are currently only at relatively preliminary research stages. The second group of rehabilitation approaches includes sensory substitution devices (SSDs) that represent noninvasive, cheap, and relatively accessible techniques. These devices are specifically designed in order to deliver visual information to the blind using their remaining and fully functioning sensory modalities, in hope that the brains of such individuals would learn to exploit this information, similar to the way the sighted use equivalent information transmitted through the visual pathway. Although this hope may appear counterintuitive or even unrealistic, the most recent SSDs are currently showing remarkable behavioral outcomes. Such efficiency combined with their low cost and broad applicability to different types or ages at sensory loss make them highly attractive sensory aids. This is especially important in blindness, given that 87% of the blind are located in developing countries and therefore need cheap and widely applicable solutions (World Health Organization, Fact Sheet no. 282, May 2009). In order to capture the “magic” of these rehabilitation approaches and illustrate how surprisingly efficient they might be if proper training is applied, we will begin this chapter by presenting some of these exciting new solutions and briefly discuss the rehabilitation outcomes currently associated with them. To better understand the mechanisms mediating such outcomes and appreciate the remaining challenges that need to be overcome, in the second part of the chapter we provide a more theoretical illustration of neuroplastic changes associated with the use of these devices. In particular, we show that these changes are not “magic” nor in any way restricted to the use of the presented rehabilitation techniques. On the contrary, these techniques are designed in order to exploit and channel the brain’s natural potential for change. This potential is present in all individuals, but may become somewhat more accentuated in the brains of the sensory-impaired, as the lack of one sensory modality leaves vast cortical regions free of their typical input and triggers a reorganization of such cortices and their integration into other brain networks. This reorganization is constrained and channeled by the individual’s own activity, information available from the environment, as well as intrinsic properties of the neural system promoting or limiting such changes during different periods in life. Importantly, such restructuring is crucial for enabling the cognitive changes that also occur after sensory loss, allowing the sensory-impaired individuals to efficiently function in their environment. Specifically, successfully dealing with sensory impairment often results in collateral benefits,
Neurophysiological Mechanisms Underlying Plastic Changes
397
which include better differentiation and higher efficiency of nonvisual sensory or other cognitive functions. Many of the neural and cognitive changes triggered by sensory loss will be reviewed in the second part of the chapter, illustrating how they rely on the same mechanisms as those underlying the successful outcomes of novel rehabilitation techniques, which will now be presented.
21.2 REHABILITATION FOLLOWING SENSORY LOSS Sensory loss and blindness in particular decreases quality of life in millions of individuals (e.g., 314 million of individuals are visually impaired worldwide; WHO Report 2009, Fact Sheet no. 282). Blindness hinders independent navigation in space, reading, recognizing people, and even communicating with them by restricting nonverbal communication via hand gesturing or facial expressions such as gaze direction or smiling. Numerous approaches and potential solutions aimed at overcoming these difficulties have been put forward (with various levels of success), offering hope and help to those suffering from sensory impairment. In the blind, these include reading and mobility aids, more advanced SSDs, and invasive sensory restoration and neuroprosthetic approaches. In this part of the chapter we present some of these techniques. The main focus is on SSDs, which are gaining increased popularity thanks to their noninvasiveness, low cost, and high potential for providing systematic rehabilitation solutions for all types of blindness. In addition, we will briefly discuss the potential for medically enabled sensory restoration which, although holding great potential, still needs to overcome numerous technical and other challenges before becoming truly useful for most of the blind.
21.2.1 Sensory Substitution Devices Sensory substitution refers to the transformation of the characteristics of one sensory modality into the stimuli of another modality. For example, it is possible to present visual information by touch or audition, audition or vestibular information by touch, etc. In the case of blindness, SSDs represent a noninvasive rehabilitation approach in which visual information is captured by an external device such as a video camera and communicated to the blind via a human–machine interface in the form of auditory or tactile input. Louis Braille (1809–1852), who developed Braille writing, pioneered the work that paved the way to modern SSDs by substituting visually read letters by a raised dot code, as an adaptation of Charles Barbier’s night writing code. Charles Barbier originally developed a tactile 12-raised-dot writing code for the French army, which was deemed too difficult to decipher and was abandoned by the army. After meeting Brabier and testing his invention, Louis Braille simplified the code to a six–raised dots code, which made the symbols easier to recognize without moving (the original 12-dot code required slow effortful motion to recognize each letter), thus inventing the tactile Braille code widely used today (Sathian 2000; Sadato 2005). However, Braille can only work for material transformed offline from printed visual letters to Braille dots, and cannot be used for online reading of regular letters. In recent years, other reading substitutions have been developed for online reading such as the Optacon (a print-to-tactual-image device devised for reading embossed letters; Goldish and Taylor 1974; Linvill and Bliss 1966) and various versions of dedicated text-to-speech engines (from the Kurzweil reading machine; Kleiner and Kurzweil 1977) to current talking software such as JAWS (Freedom Scientific, Inc., St. Petersburg, Florida). In addition to these reading aids, a great deal of effort has been invested in developing devices aimed at improving the mobility of the blind. The long cane used to mechanically probe for obstacles represents the simplest, most commonly used device. Both the Braille system and the cane solutions that were quickly adapted by blind users suggest that at times a simple, low technology, rather than a demanding solution might be the most widely used one. However, in recent years more advanced counterparts of the cane have become available, such as electronic travel aids designed to be used along with the long cane in order to extend the distance for environmental preview and thus increase the possible speed and efficiency of travel. The Sonic Pathfinder (Heyes 1984) and the Sonicguide
398
The Neural Bases of Multisensory Processes
(Kay and Kay 1983) typically scan the environment acoustically (ultrasonically) or optically (laser light), and transmit spatial information on obstacles and objects in the surroundings via vibrotactile or auditory signals. In contrast to devices that are typically designed for a limited purpose and are successful in substituting for only certain functional aspects of vision, more sophisticated techniques that replace vision through tactile or auditory information have been developed over the past few decades (see Figure 21.1a). The first targeted modality for substituting vision was touch, because of the simplicity and ease of transforming visual into tactile signals that are both characterized by two-dimensional (2-D) spatial representations (retina in vision and skin surface in touch). Pioneering work in this field was done in the 1970s by Paul Bach-y-Rita, who devised a tactile display that mapped images from a video camera to a vibrotactile device worn on the subject’s back. This device (Bach-y-Rita 2004; Bach-y-Rita et al. 1969; Bach-y-Rita and Kercel 2003), dubbed the Tactile Vision Substitution System, provided tactile transformation of black-and-white images at a resolution of 20 × 20 pixels and enabled the blind to perform sufficiently well in some visual tasks. However, it was extremely large and immobile, which motivated the development of smaller, mobile tactile devices placed on the tongue and forehead (for a review, see Bach-y-Rita 2004) that are also characterized by better spatial somatosensory resolution. One of these, the Tongue display unit (TDU) (Bach-y-Rita et al. 1968, 1998), an electrotactile device composed of a 12 × 12 matrix of stimulators (measuring approximately 3 cm2) placed on the subject’s tongue, provides blind individuals with an initial “visual” acuity (tested by the Snellen E chart) comparable to 20/860 (Sampaio et al. 2001; the numerator refers to the distance in feet from which a person can reliably distinguish a pair of objects, whereas the denominator is the distance from which a person with standard visual acuity would be able to distinguish them; in North America and most of Europe, legal blindness is defined as visual acuity of 20/200 or poorer), which might improve after training. Other studies investigating this device suggest that at least a subgroup of early-onset blind individuals may particularly benefit from its use (Chebat et al. 2007). Audition was the second candidate to substitute for vision. The development of auditory-based devices was triggered by certain limitations of tactile SSDs: their price and the fact that they are inherently limited by the spatial resolution of touch and relatively lower information content due to a cap on the number of electrodes. The first auditory SSD device was The vOICe system (Meijer 1992), which currently uses a default resolution of 176 × 64 sampling points. This mobile and inexpensive device uses a video camera, which provides the visual input, a small computer running the conversion program, and stereo headphones that provide the resulting sound patterns to the user. Given the fact that 87% of the world’s visually impaired live in developing countries (WHO report 2009 fact sheet 282), the importance of providing solutions that are not just high-resolution, but also cheap and accessible, cannot be underestimated. To some extent, visual-to-auditory SSDs fulfill all of these criteria. However, these devices still pose great challenges both to the developers and the brains of blind individuals using them, as they rely on conversion algorithms that are much less intuitive than those employed by visual-to-tactile SSDs. For example, in the visual-to-auditory The vOICe SSD (Meijer 1992), the conversion program transforms visual into auditory information (‘soundscapes’) based on three simple rules: the vertical axis (i.e., elevation of the object) is represented by frequency, the horizontal axis by time and stereo panning, and the brightness of the image is encoded by loudness. Although these conversion rules appear relatively simple, explicit and quite extensive training is required to learn how to interpret even simple shapes. Similar but not identical transformations are implemented in two more recently developed auditory SSDs: the Prosthesis Substituting Vision with Audition (PSVA; Capelle et al. 1998) and SmartSight (Cronly-Dillon et al. 1999, 2000). PSVA uses different tones to provide horizontal location directly, whereas SmartSight presents the vertical location information in terms of musical notes. PSVA can break down the “visual sound” into components of vertically and horizontally oriented edges. Additionally, PSVA applies a magnification to the center of the image to simulate the better resolution (magnification factor) of the human fovea.
399
Neurophysiological Mechanisms Underlying Plastic Changes
(a)
(b) n=7
Tactile and SSD objects p = 0.005
p = 0.05 (Corr.)
Visual objects Haptic objects
(c) IPS
LOtv
LOtv
FIGURE 21.1 Sensory substitution devices: general concept of sensory substitution (SSD) and use of SSDs in studying brain plasticity, perception, and multisensory integration. (a) SSDs typically include a visual capturing device (e.g., camera glasses), a computational device transforming visual input into either a tactile or auditory display using a simple known transformation algorithm, and an output device, then transmitting this information to user. Right: example of an auditory SSD (e.g., The vOICe; Meijer 1992) transmitting sensorytransformed information using headphones. Left: example of a tactile device that can transmit tactile information via an electrode array targeting the tongue (e.g., TDU; Bach-y-Rita et al. 1998) or another skin surface, in this case presented in neck. (With kind permission from Springer Science+Business Media: Multisensory Object Perception in the Primate Brain, part 4, 2010, 351–380, Bubic, A. et al., figure number 18.2.) (b) A con junction analysis for shape perception across modalities and experimental conditions in a group of seven expert users of The vOICe SSD (five sighted, one late blind, and one congenitally blind). A conjunction analysis testing for common areas of activation between object recognition using soundscapes (i.e., using The vOICe SSD to extract shape information) and by touch but not typical sounds made by objects (which does not convey shape information) or by corresponding sensory controls. Contrast (random-effect GLM model, corrected for multiple comparisons) showed bilateral LO activation with weaker responses in right hemisphere, signifying that lateral occipital complex (LOC) region is a multimodal operator for shape. (Modified and adapted from Amedi, A. et al., Nat Neurosci, 10, 687–689, 2007.) (c) Object-related regions in visual and haptic modalities shown on an inflated right hemisphere (top: lateral view; bottom: ventral view). Visual object selectivity is relative to scrambled visual images; haptic object selectivity is relative to haptic textures. Visuo-haptic object selectivity in LOC is found within lateral occipito-temporal sulcus (delineating LOtv), similar to location of multimodal object related area shown in panel b. (Modified and adapted from Amedi, A. et al., Nat Neurosci, 4, 324–330, 2001; and Lacey, S. et al., Brain Topogr, 21, 269–274, 2009.)
400
The Neural Bases of Multisensory Processes
Although extremely different, both auditory and tactile SSDs can potentially be very useful for the blind. Recent tests show that blind and/or blindfolded sighted individuals can, especially after training or prolonged use of the device (Poirier et al. 2006b), learn to interpret the transmitted information and use it in simple visual discrimination and recognition (Arno et al. 1999, 2001; Sampaio et al. 2001; Poirier et al. 2006b) as well as more complex tasks in which acquiring the knowledge of spatial locations of objects (Auvray et al. 2007; Proulx et al. 2008) or constructing mental images of more complex environments (Cronly-Dillon et al. 2000) is required. More anecdotal reports that have not yet been explored in formal research confirm that following extended use of such devices, behavioral abilities using SSDS may be even more promising, as they can be used to identify facial expressions and read simple words (Amedi, Striem-Amit, and Reich, unpublished observation; see, e.g., http://brain.huji.ac.il/press.asp) as well as to orient and navigate in everyday life (see, e.g., the reports of a late-onset blind individual of her experiences with the vOICe SSD in http://www .seeingwithsound.com/users.htm). Although the sensory-transformed information may partly occupy an available sensory channel or at least add to its attentional load (e.g., provide vOICe information in addition to naturally occurring environmental sounds), after training such percepts should not significantly interfere with normal sensory perception. However, it needs to be emphasized that training is crucial in obtaining optimal results in this regard, as initial usage of SSDs may be confusing and overwhelming for the sensory impaired. Because of the multisensory nature of perception, the human brain can be expected to successfully process these percepts in a parallel manner, similarly to processing several types of visual parameters, allocating attention to the most relevant visual feature at the time, and similarly to perceiving an auditory conversation above other environmental noises. Naturally, however, if an individual uses the SSD in very high volume or if the environmental sounds are near perceptual thresholds, there might be a significant cost to using the SSD in the intact sensory channel. Future studies on dividing attention between SSD input and the natural sensory input are needed to fully assess the possible interference in such cases. Overall, although there is still a great deal of work to be done in this area, initial experiences with SSDs show more than promising results. These devices truly offer new hope for the sensory-impaired in a somewhat nonintuitive, but “brain-friendly” manner, as they use normal neural resources and functioning modalities for transmitting previously unavailable information. Furthermore, in order to fully appreciate the value of sensory substitution, it might be useful to imagine how exciting it would be to have infrared vision or hear ultrasound frequencies. Interestingly, future, second-generation SSDs might just make these “superhuman” abilities possible: just as visual information can be transmitted and used by the blind through their functioning auditory or tactile modality, so could infrared or ultrasound frequencies be perceived by anyone using functioning vision or audition. For the blind, the efficient use of vision or visual information transferred via such SSDs represents exactly this type of ability or an even greater accomplishment, as they need to function in an environment optimally designed for the majority of the population, that is, the sighted.
21.2.2 Sensory Restoration Approaches Restoration of sensory input to the visually impaired represents an alternative to SSDs. Although these approaches are perhaps more attractive than SSDs, as they provide a sense of “real” vision (as compared to providing only visual information converted into the other senses), and are at the cutting edge of medical, technological, and scientific advances, they are unfortunately only at the relatively preliminary stages of research and development. Conventional sight restoration includes surgical removal of cataracts and treatment or surgical solutions to vision loss caused by glaucoma. Although highly practical and yielding remarkable results, especially after extensive training following the surgery (Ostrovsky et al. 2006, 2009; Sinha 2003), these solutions were originally only applicable to specific causes and stages of vision loss. Sight restoration in blindness due to other
Neurophysiological Mechanisms Underlying Plastic Changes
401
etiologies, such as congenital or late-onset degeneration [e.g., age-related macular degeneration (ARMD)] of the retina or optic tract, is only nowadays being addressed. Sight restoration approaches can be grossly divided to two major types: biological methods, such as cell transplantation therapy, which aim to replace the actual deteriorating retinal cells with healthy ones, and electrical sensory restoration solutions, which try to create brain–machine interfaces by, for example, replacing a damaged retina (by the use of visual prostheses, similarly to cochlear prostheses used in cases of deafness). Although we do not go into detail on the biological methods and research directions, we recommend several good review articles on the matter (to name some from recent years, Cronin et al. 2007; MacLaren and Pearson 2007; Lamba et al. 2008, 2009; Bull and Martin 2009; Locker et al. 2009; West et al. 2009) for a more thorough examination of these approaches. In short, cell replacement strategies for retinal repair can be done by implanting external stem cells (cell transplantation therapy), from various origins (e.g., neural or retinal precursor cells) and differentiation/developmental stages (from multipotent progenitor cells to postmitotic photoreceptor progenitor cells), to compensate for retinal degeneration. These approaches receive a great deal of attention, as they are most likely to generate high-resolution natural vision in the future. However, these approaches may still require many years of development until they are fully applicable, and even then, might suffer the same disadvantages as the visual prostheses methods, which will be detailed below. With regard to electrical sensory restoration efforts, the development of visual prostheses was motivated by early studies in which visual percepts (phosphenes, visual light dots or patterns) were successfully generated by electrical stimulation of the visual cortex (Penfield and Rasmussen 1950). The idea of channeling these findings into clinical applications was suggested years ago by a handful of researchers (Haddock and Berlin 1950; Newman et al. 1987), but their ideas are only now being pursued as shown by the extensive development of visual prostheses. Today, different approaches in which visual information is recorded by external (or implanted) devices and transmitted to the sensory tract or secondary processing cells in the retina, ganglion cells, thalamus, or the visual cortex, are being studied or tested in clinical trials (for several recent reviews of current technologies and the remaining challenges, see Dagnelie 2008; Dowling 2008; Merabet et al. 2005; Rizzo et al. 2007; Weiland et al. 2005). There are four main types of approaches to electrical sensory restoration, targeting the retina, optic nerve, lateral geniculate nucleus (LGN), and the visual cortex. The retinal approach is designed to stimulate secondary neurons in the inner retina by an electrode array placed on the inner retinal surface or inserted under the retina (for a description of the different groups, and devices developed in recent years, see Dowling 2008). Such an approach is mainly useful in cases of retinitis pigmentosa and ARMD, which cause selective degeneration of the photoreceptor layer of the outer retina. In this case, the information sent to the visual cortex can still be transmitted over minimally damaged retinal ganglion cells. Optic nerve approaches (Brelen et al. 2005; Veraart et al. 2003; Delbeke et al. 2002) use two forms of stimulation: the simultaneous activation of many optic nerve fibers through cuff electrodes, and more focused stimulation of small groups of fibers with penetrating microelectrodes. Future thalamic prostheses (Pezaris and Reid 2005, 2009) will stimulate a later station in the visual pathways, that is, the LGN, but their study is currently only under preliminary methodological research in nonhuman primates. The cortical approach (Troyk et al. 2003; Fernandez et al. 2002; Schmidt et al. 1996) places electrodes over the central visual field projection in primary visual cortex. Typically, this is accomplished using surface (or penetrating) electrodes that may provide relatively good stability of tissue stimulation, but are difficult to position in the optimal location based on the known retinotopic mapping of V1. However, this approach can be applied in most cases of blindness, including those (e.g., glaucoma and diabetic retinopathy, but apart from cortical blindness, a relatively rare cause of blindness) affecting the retina and may not benefit from a retinal prosthesis. Devices based on these approaches have so far shown some promising results, as experienced blind users can, to some extent, utilize visual phosphenes generated by some of these devices in order to experience meaningful visual percepts, detect motion (Weiland
402
The Neural Bases of Multisensory Processes
and Humayun 2008), or identify very simple patterns, shapes and even letters (Brelen et al. 2005; Dobelle 2000; Weiland and Humayun 2008). However, there are still several major issues currently preventing these electrical and biological approaches from becoming true clinical solutions. First of all, their invasive nature makes them prone to risks related to surgical procedures in the brain, such as inflammation, hemorrhage, increased patient mortality, and focal seizures induced by direct cortical stimulation in the case of retinal prostheses, and risks involved with immune rejection of the implanted cells in the case of cell transplantation solutions. Moreover, retinal prostheses (and retinal molecular approaches such as cell transplantation therapy, detailed above), which currently appear more promising as future solutions for blindness, are not applicable to all populations of the blind, as they require the existence of residual functional retinal ganglion cells. Additionally, these techniques are expensive, making them unavailable to the majority of the blind, who reside in developing countries. In addition to these drawbacks, visual prostheses have severe technical limitations including relatively low resolution, narrow field of view, and the need for complicated image processing algorithms compensating for the visual processing taking place in the retina itself. Functionally, these devices typically do not take advantage of eye movements (an exception to this is the system developed by Palanker (Palanker et al. 2005), and require large and slow head movements to scan entire visual patterns (Brelen et al. 2005; Veraart et al. 2003; Chen et al. 2007). Therefore, visual prostheses (which are not yet available except in preliminary clinical trials) do not yet provide sight that resembles natural vision and a key milestone in this field, namely, generating truly useful and functional vision, at affordable costs has yet to be reached. Finally, just like cochlear implants (or even much more than this), visual prostheses require extensive training in order to achieve reasonable performance even for very simple stimuli. This will be discussed in the next section. If, however, visual prosthesis research, and even more so biological methods replacing the actual retinal cells, can overcome these obstacles, these approaches could provide a real visual experience and not just the “visual” information or orientation provided by SSDs.
21.2.3 Functional Visual Rehabilitation Although further developing and improving rehabilitation techniques is still an enormous technological challenge, sensory restoration efforts may require more than simply transmitting the visual information (either via other modalities as in SSDs or by providing vision through the natural visual system) to the brain. In a sense, when first introduced to the brain of a congenitally blind individual, the visual information is meaningless as that individual lacks any previous experience on which such information could be interpreted. Furthermore, the brain of such individuals may lack a functioning visual system needed for interpreting the newly introduced information and giving it functional meaning. Even in the case of noncongenitally blind who have had some previous visual experience, one cannot expect that reintroducing visual information to their brains would automatically result in fully sophisticated visual perception, since their “visual” brain regions may now be integrated into other, nonvisual brain networks. Although this is somewhat counterintuitive, evidence for this claim can be found in the relatively successful rehabilitation of deaf and hearing-impaired individuals using cochlear implants (Spelman 2006). Cochlear implants work because patients learn to associate meanings between sounds and their sources by generating new associations, a process that requires explicit teaching. Moreover, such rehabilitation is accompanied and enabled by corresponding plasticity in the auditory cortex (Kral et al. 2006), which is now required to respond to the newly delivered input. Similarly, two case studies of surgical sight restoration after long-term visual deprivation (Gregory and Wallace 1963; Fine et al. 2003) suggest that pure restoration of the lost sensory input may also not suffice in case of vision. The patients in both of these studies showed profound difficulty in recognizing objects, even after a long period of sight and visual training. This indicates that allowing the visual information to enter the brain via a functional retina does not guarantee or enable full or natural visual perception. This can only be accomplished if the
Neurophysiological Mechanisms Underlying Plastic Changes
403
surgical procedure is coupled with specific additional rehabilitation strategies that modulate brain processing, enabling it to extract relevant and functionally meaningful information from neuroprosthetic inputs that should gradually lead to restoration or development of visual functions. Thus, in contrast to the encouraging behavioral outcomes of some cochlear implant patients, it is illusory to expect that such successful sensory restoration can easily generalize to different subpopulations of sensory impaired, such as the visually impaired. More research and development of behavioral rehabilitation may be needed to achieve functional sensory ability in those who once suffered from sensory loss. To fulfill this goal, we will have to overcome more than just surgical or technical challenges that will enable safer medical procedures or more advanced sensory substitution algorithms. Although necessary, such advancements will have to be complemented by knowledge pertaining to brain mechanisms and cognitive functions we want to change or develop using the available rehabilitation techniques. Thus, achieving full sensory restoration will only be possible if we take into account the specificities of cognitive and neural functioning of the sensory impaired, a topic that will be presented in the next part of the chapter.
21.3 NEURAL AND COGNITIVE CONSEQUENCES OF SENSORY LOSS When attempting to understand the minds and brains of individuals who have lost one or more sensory modalities, it is worth starting by considering factors that shape the minds and brains of those whose development and capabilities are considered normal. Regardless of our individual abilities, we are all equipped with nervous systems whose development are constrained by our genetic dispositions, but can be channeled in different directions depending on environmental factors and specific individual experiences and activities. The interaction of all of these factors defines our neural and cognitive functioning. Thus, a substantial reorganization within the nervous system following the loss of a sensory function, regardless of when, why, and in which modality it occurs, is not merely a physiological or “brain” phenomenon that can be understood without taking into account the cognitive challenges and demands imposed by nonstandard sensory input. Rather, in order to achieve the same functional level in their everyday life, those who suffer from sensory loss need to develop strategies that enable them to extract information relevant for achieving their goals from alternative sources typically ignored by the majority of the population. Such adjustments are mediated through sufficient restructuring in other sensory or higher-order cognitive functions. Thus, different cognitive demands lead to different individual experiences and activities, which in turn promote a certain pattern of plastic reorganization within the nervous system that is additionally constrained by genetic and purely physiological factors. Therefore, brain reorganization after sensory loss needs to be considered as a neurocognitive phenomenon that strongly reflects the brain’s intrinsic potential for change as well as altered cognitive demands aimed at compensating for the missing sensory information, both of which are crucial to rehabilitation efforts. In addition, it is important to keep in mind that various subpopulations of individuals suffering from sensory loss and differing in etiology or onset of sensory loss differ in their potential for plasticity as well as the available cognitive resources that can be exploited for dealing with sensory loss. The early onset of sensory loss encountered in congenital conditions triggers the most dramatic cases of plasticity and enables drastic brain restructuring that compensates for the deficits, generating a remarkably different functional network than the one seen in healthy individuals or individuals who sustained brain or peripheral injuries later in life. Congenital blindness and deafness affect large portions of the brain, especially when resulting from peripheral damage (i.e., to the retina, cochlea, or the sensory tracts), which does not injure the brain itself, but instead deprive parts of the brain from their natural input, leaving it essentially unemployed. More than 20% of the cerebral cortex is devoted to analyzing visual information, and a similar portion is devoted to auditory information (but note that some of these cortical areas overlap to some extent as some recent reports suggest: Beauchamp et al. 2004; Calvert 2001; Cappe and Barone 2005; Clavagnier et al. 2004; Schroeder and Foxe 2005; van Atteveldt et al. 2004). Despite the lack of visual or auditory input, the visual and auditory cortices
404
The Neural Bases of Multisensory Processes
of the blind and deaf do not degenerate. Rather, they undergo extensive plasticity resulting in significantly changed neural responsiveness as well as functional involvement in nonvisual/nonauditory cognitive functions. Significant, although typically less extensive, plastic changes may also occur in populations suffering from noncongenital sensory loss. This neuroplasticity is evident both in atypical brain activation in the blind when compared with that of the sighted, as well as in behavioral manifestations, for example, sensory hyperacuity and specific cognitive skills.
21.3.1 Evidence for Robust Plasticity Promoted by Sensory Loss The first set of evidence for the extensive reorganization undergone by the brains of the congenitally blind and deaf can be found in the reports of enhanced sensory and cognitive abilities of such individuals that compensate for their sensory deficits. For example, blind individuals need to compensate for the lack of vision, a modality that normally allows one to “know what is where by looking” (Marr 1982) and is ideal for providing concurrent information about relations of planes and surfaces to each other, drawing attention to relevant external cues and greatly facilitating spatial coding (Millar 1981). Although the blind cannot acquire information needed for object localization and recognition by looking, they still require this information in order to, for example, navigate through space or find and recognize the objects around them. Therefore, they have to acquire this information through alternative sensory or other, strategies. For example, as early as the epoch of the Mishna (about 350 c.e.), it was known that blind individuals possess superior memory abilities compared to sighted (“The traditions cited by Rabbi Sheshet are not subject to doubt as he is a blind man.” Talmud Yerushalmi, tractate Shabbat 6b), which enable them to remember the exact location and identity of stationary objects and the sequence of steps required to complete paths (Raz et al. 2007; Noordzij et al. 2006; Vanlierde and Wanet-Defalque 2004). Such phenomenal memory of the blind was also demonstrated in modern scientific studies (Tillman and Bashaw 1968; Smits and Mommers 1976; Pozar 1982; Pring 1988; Hull and Mason 1995; Röder et al. 2001; D’Angiulli and Waraich 2002; Amedi et al. 2003; Raz et al. 2007). Similarly, it has been shown that the blind have superior tactile and auditory perception abilities: for instance, they are able to better discriminate fine tactile patterns or auditory spatial locations than the sighted, and even to better identify smells (Murphy and Cain 1986; Röder et al. 1999; Grant et al. 2000; Goldreich and Kanics 2003, 2006; Hugdahl et al. 2004; Wakefield et al. 2004; Doucet et al. 2005; Smith et al. 2005; Collignon et al. 2006). Similarly, deaf individuals show improved visual abilities on certain tasks (Bavelier et al. 2006), which indicates that the remaining modalities compensate for the missing one, a phenomenon termed hypercompensation (Zwiers et al. 2001) or cross-modal compensatory plasticity (Rauschecker 2000). However, the blind (or the deaf) do not always perform better on such tasks (Zwiers et al. 2001), suggesting that optimal development of some aspects of sensory processing in the unaffected modalities may depend on, or at least benefit from, concurrent visual (auditory) input. Furthermore, when comparing different populations of the blind, it becomes clear that the identified benefits in some auditory and tactile tasks depend to a great extent on the age at sight loss. Specifically, these advantages are often, but not always, limited to the congenitally and early blind, whereas the performance of the late blinded tends to resemble that of the sighted (Fine 2008), reflecting differences in the potential for neuroplastic reorganization and the amount of visual experience between these populations. However, there is also evidence indicating that compensatory benefits also occur in the late blind, in which case they may be mediated by different neurophysiological mechanisms (Fieger et al. 2006; Voss et al. 2004), as detailed in the next sections. Importantly, although prolonged experience with a reduced number of available sensory modalities lead to such benefits, these do not appear automatically. For example, it has been shown that blind children have significant difficulties with some tasks, especially those requiring reference to external cues, understanding directions and spatial relations between objects. Such tasks are challenging for the blind, as they have compromised spatial representations and rely mostly on self-reference and movement sequences (Millar 1981; Noordzij et al. 2006; Vanlierde and Wanet-Defalque 2004).
Neurophysiological Mechanisms Underlying Plastic Changes
405
Consequently, the blind have problems recognizing potentially useful information needed to perform the mentioned tasks and lack the benefits that could arise from simultaneously available vision. For example, concurrent visual input could facilitate recognition and learning of helpful auditory or somatosensory features given that the existence of redundant or overlapping information from more than one modality is generally associated with guiding attention and enhanced learning of amodal stimulus features (Lickliter and Bahrick 2004). Nevertheless, such recognition of useful cues or calibration of auditory and tactile space is eventually possible even in the absence of vision, as it may be achieved using different cues, for example, those stemming from self-motion (Ashmead et al. 1989, 1998). Importantly, although it may require relatively long training to reach a stage in which the missing sensory input is replaced and compensated for by equivalent information from other modalities, spatial representations that are finally generated on the basis of haptic and auditory input of the blind seem to be equivalent to the visually based ones in the sighted (Röder and Rösler 1998; Vanlierde and Wanet-Defalque 2004). Overall, the findings indicate that the blind, once they learn to deal with the available sensory modalities, can show comparable or superior performance in many tasks when compared to the sighted. This advantage can even be compromised by the presence of visual information, as indicated by inferior performance of the partially blind (Lessard et al. 1998). Thus, the available evidence tends to counter the notion that sensory loss leads to general maladjustment and dysfunction in functions outside the missing modality. Quite the contrary, this general-loss hypothesis should be abandoned in favor of the alternative, compensatory hypothesis suggesting that sensory loss leads to the superior development of the remaining senses (PascualLeone et al. 2005). In the past decades, neural correlates of reported impairment-induced changes in cognitive functions and strategies have been thoroughly studied, providing a wealth of information regarding the brain’s abilities to change. Studies investigating neural processing of congenitally blind and deaf individuals, as well as more invasive animal models of these conditions, show that the brain is capable of robust plasticity reflected in profoundly modified functioning of entire brain networks. Important evidence pertaining to the altered cognitive processing and the functional status of the occipital cortex in the blind stems from electrophysiological studies that investigated nonvisual sensory functions of the blind. These yielded results showing shorter latencies for event-related potentials (ERP) in auditory and somatosensory tasks in the blind in contrast to the sighted, suggesting more efficient processing in these tasks in this population (Niemeyer and Starlinger 1981; Röder et al. 2000). Furthermore, different topographies of the elicited ERP components in the sighted and the blind provided first indications of reorganized processing in the blind, such as to include the engagement of their occipital cortex in nonvisual tasks (Kujala et al. 1992; Leclerc et al. 2000; Rösler et al. 1993; Uhl et al. 1991). Functional neuroimaging studies have collaborated and extended these findings by showing functional engagement of the occipital lobe (visual cortex) of congenitally blind individuals in perception in other modalities (i.e., audition and touch; Gougoux et al. 2005; Kujala et al. 2005; Sathian 2005; Stilla et al. 2008; for a recent review of these findings, see Noppeney 2007), tactile Braille reading (Büchel et al. 1998; Burton et al. 2002; Gizewski et al. 2003; Sadato et al. 1996, 1998), verbal processing (Burton et al. 2002, 2003; Ofan and Zohary 2006; Röder et al. 2002), and memory tasks (Amedi et al. 2003; Raz et al. 2005). Importantly, the reported activations reflect functionally relevant contributions to these tasks, as indicated by studies in which processing within the occipital cortex was transiently disrupted using transcranial magnetic stimulation (TMS) during auditory (Collignon et al. 2007), tactile processing including Braille reading (Cohen et al. 1997; Merabet et al. 2004) as well as linguistic functions (Amedi et al. 2004). Akin to the findings in the blind, it has been shown that the auditory cortex of the congenitally deaf is activated by visual stimuli (Finney et al. 2001), particularly varieties of visual movement (Campbell and MacSweeney 2004). It is important to realize that involvement of unisensory brain regions in cross-modal perception is not only limited to individuals with sensory impairments, but can under certain circumstances also be identified in the majority of the population (Sathian et al. 1997; Zangaladze et al. 1999;
406
The Neural Bases of Multisensory Processes
Amedi et al. 2001, 2005b; Merabet et al. 2004; Sathian 2005), consistent with reports in experimental animals of nonvisual inputs into visual cortex and nonauditory inputs into auditory cortex (Falchier et al. 2002; Rockland and Ojima 2003; Schroeder et al. 2003; Lakatos et al. 2007). In the blind and deaf this involvement is much stronger, because sensory areas deprived of their customary sensory input become functionally reintegrated into different circuits, which lead to profound changes in the affected modality and the system as a whole.
21.3.2 Principles Guiding Reorganization following Sensory Loss Earlier in this section we listed experimental findings stemming from behavioral, electrophysiological, imaging, and TMS studies illustrating such changes. We will now try to offer a systematization of such changes as it may be helpful for understanding the extent and main principles guiding reorganization following sensory loss (for similar attempts of systematization, see, e.g., Röder and Rösler 2004; Grafman 2000; Rauschecker 2008). These include intramodal, multimodal, crossmodal (intermodal), and supramodal changes—those pertaining to the involvement of typically visual areas in processing tactile and auditory information in the blind (or typically auditory areas in processing visual information in the deaf), plastic changes occurring within the cortices normally serving the unaffected modality, changes in multisensory regions and global, whole-brain changes involving more than unisensory and multisensory networks, respectively. Although somewhat autonomous, these different types of changes are in reality strongly interdependent and cannot be separated on the level of either cognitive or neural processing. Intramodal plasticity refers to the changes occurring within one sensory modality as a consequence of altered, either increased or decreased, use of that sensory modality. These changes are reflected, for example, in the superior performance of the blind on auditory or tactile tasks. Studies investigating the neural foundations of this phenomenon indicate a high degree of reorganization of sensory maps in different modalities following local peripheral damage, extensive training, or perceptual learning (Kaas 2000; Buonomano and Johnson 2009; Recanzone et al. 1992). This reorganization includes a coordinated shrinkage of maps representing the unused areas and expansion of those representing the modality/limb experiencing increased use (Rauschecker 2008) and is determined by the amount of stimulation and structure of the input pattern within which competition between the inputs plays an important role (Buonomano and Johnson 2009). Multimodal or multisensory plasticity refers to the reorganization of multisensory areas after sensory loss that arises from the impairment of one and compensatory hyperdevelopment of the remaining sensory modalities. This altered structure of sensory inputs leads to changes in multisensory areas, the development of which is shaped by the convergence of incoming input from unisensory systems (Wallace 2004a). For example, studies investigating the multisensory anterior ectosylvian cortex in congenitally blind cats indicate an expansion of auditory and somatosensory fields into the area usually housing visual neurons (Rauschecker and Korte 1993) as well as sharpened spatial filtering characteristics (Korte and Rauschecker 1993) following blindness. These changes underlie the improved spatial abilities of these animals and may also be crucially important for the development of cross-modal plasticity. Cross-modal (intermodal) plasticity refers to the reassignment of a particular sensory function to another sensory modality, as reflected in, for example, engagement of the visual cortex in processing auditory information. Numerous invasive studies in animals have shown the vast potential for such reorganization, reflecting the fact that most aspects of structure and function of a given brain area are determined by its inputs, not geographic location. For example, it has been shown that typically auditory areas can, after being exposed to visual input through rerouted thalamic fibers normally reaching primary visual areas, develop orientation-sensitive cells with the pattern of connectivity resembling one typically found in the normally developed visual cortex (Sharma et al. 2000) and fulfill the visual functionality of the rewired projections (von Melchner et al. 2000). Similarly, tissue transplanted from the visual into the somatosensory cortex acquires functional
Neurophysiological Mechanisms Underlying Plastic Changes
407
properties of its “host” and does not hold on to its genetic predisposition (Schlaggar and O’Leary 1991). This implies that the cross-modal plasticity observed in the blind is most probably subserved by altered connectivity patterns, as will be further discussed in the next section. Supramodal plasticity refers to changes encompassing areas and brain functions that are typically considered nonsensory. Evidence for such plasticity have been revealed in studies showing involvement of the occipital cortex in memory or language (verb generation or semantic judgments) processing in the blind (Amedi et al. 2003, 2004; Burton et al. 2002, 2003; Ofan and Zohary 2006; Raz et al. 2005; Röder et al. 2000, 2002). This type of plasticity is comparable to cross-modal plasticity and is enabled by altered connectivity patterns between the visual cortex and other supramodal brain regions. When describing and systematizing different types of plastic changes, we want to once again emphasize that these are not mutually independent. They often occur in synchrony and it may occasionally be difficult to categorize a certain type of change as belonging to one of the suggested types. Furthermore, all types of large-scale plasticity depend on or reflect anatomical and functional changes in neural networks and may therefore rely on similar neurophysiological mechanisms. Before describing these mechanisms in more detail and illustrating how they could underlie different types of plastic changes, we will present another important element that needs to be considered with respect to compensating for sensory impairments. Specifically, we will now focus on the fact that all of the mentioned changes in neural networks show large variability between individuals, resulting in corresponding variability in compensatory cognitive and behavioral skills. It is important to consider some of the main sources of this variability, not just so that we can better understand the reorganization following sensory loss in different populations of the blind, but also because this variability has important implications for the potential for successful rehabilitation.
21.3.3 Plasticity following Sensory Loss across the Lifespan When discussing different types of neuroplastic changes and possible mechanisms underlying them, it is important to emphasize that all of these vary significantly depending on the age at onset of blindness, as was briefly discussed in previous sections. These differences reflect several factors: the brain’s potential to change at different periods of development, the amount of experience with visual or auditory processing before sensory loss, and the amount of practice with the remaining senses or some special materials, for example, Braille letters. The most important of these factors reflects the fact that the general potential for any form of plastic changes varies enormously across the lifespan. Although the brain retains some ability to change throughout life, it is generally believed and experimentally corroborated that the nervous system is most plastic during its normal development as well as following brain injury. The developing brain is a highly dynamic system that undergoes several distinct phases from cell formation to the rapid growth and subsequent elimination of unused synapses before finally entering into a more stable phase following puberty (Chechik et al. 1998). The functional assignment of individual brain regions occurring during this time is crucially dependent on synaptic development, which includes drastic changes that often take place in spurts. In the visual cortex, during the first year after birth, the number of synapses grows tremendously and is subsequently scaled down to the adult level around the age of 11 through extensive decreases in synaptic and spine density, dendritic length, or even the number of neurons (Kolb 1995). This process is primarily determined by experience and neural activity: synapses that are used are strengthened whereas those that are not reinforced nor actively used are eliminated. Synaptic development is highly dependent on competition between incoming inputs, the lack of which can result in a decreased level of synaptic revision and persistence of redundant connections in adulthood (De Volder et al. 1997). This process of synaptic pruning represents a fairly continuous and extended tuning of neural circuits and can be contrasted with other types of changes that occur at very short timescales. During such periods of intensified development, (i.e., critical or, more broadly, sensitive periods; Knudsen 2004; Michel and Tyler 2005), the system is the most sensitive to abnormal
408
The Neural Bases of Multisensory Processes
environmental inputs or injuries (Wiesel and Hubel 1963). Thus, injuries affecting different stages of development, even when they occur at a roughly similar ages, may trigger distinct patterns of compensatory neuroplastic changes and lead to different levels of recovery. Specifically, early studies of recovery after visual loss (Wiesel and Hubel 1963, 1965) suggested that vision is particularly sensitive to receiving natural input during early development, and that visual deprivation even for short durations, but at an early developmental stage, may irreversibly damage the ability for normal visual perception at older ages. Conversely, evidence of sparse visual recovery after early-onset blindness (Gregory and Wallace 1963; Fine et al. 2003) demonstrates that this may not necessarily apply in all cases, and some (although not all) visual abilities may be regained later in life. The potential for neuroplasticity after puberty is considered to be either much lower as compared to childhood or even impossible except in cases of pathological states and neural overstimulation (Shaw and McEachern 2000). However, recovery following different types of pathological states occurring in adulthood (Brown 2006; Chen et al. 2002), changes in neuronal counts and compensatory increases in the number of synapses in aging (Kolb 1995), and the profound changes following short periods of blindfolding (Pitskel et al. 2007; Pascual-Leone et al. 2005; Amedi et al. 2006) suggest otherwise. In reconciling these seemingly contradictory conclusions, it is useful to take into account the multifaceted nature of plasticity that includes different forms of changes occurring at different timescales and on different levels of neural functioning. For example, synaptic changes occurring in aging develop over an extended period and in synergy with altered experiences and needs characteristic of later periods in life. The robust, short-term plasticity occurring after blindfolding may arise from the recruitment of already existing, but commonly unused, inhibited, or masked pathways that become available once the source or reason for such masking (e.g., availability of visual input in those who have been blindfolded) is removed. Therefore, some forms of adult plasticity do not reflect “plasticity de novo,” which is characterized by the creation of new connectivity patterns (Burton 2003). In contrast, in pathological states, injuries, or late sensory loss, both of these types of changes can occur. Rapid changes reflecting the unmasking of existing connections occurring in the first phase promote and enable subsequent slow, but more permanent structural changes (Amedi et al. 2005a; Pascual-Leone et al. 2005). This suggests that potentially similar functional outcomes may be mediated by different neural mechanisms whose availability depends on the developmental stage in which they occur. All of these general principles and differences in neuroplasticity across the lifespan can be applied to the more specific case of plasticity following sensory loss. Given that the most extensive plasticity is seen in the congenitally or early-onset blind, it has been suggested that processing advantages and large-scale cortical reorganization might be limited to the congenitally and early blind with the performance of the late blind resembling more that of the sighted (Fine 2008). Similarly, Cohen et al. (1999) suggested that the critical period of susceptibility for significant cross-modal plasticity would end at puberty. However, findings showing a high degree of modifiability of cortical maps even in adulthood (Kaas 1991) as well as those indicating significant reorganization in the occipital cortex of the late blind (Büchel et al. 1998; Burton et al. 2002; Voss et al. 2004) argue against this restriction. They are in line with the previous suggestion that significant potential for plastic changes exists throughout the lifespan, but may differ in the extent and the underlying neurophysiological mechanisms available in different periods of development. Specifically, experience of vision, especially if still available after puberty, shapes both cognition and the brain and this influence is present even after vision is lost. Although the late blind need to reorganize information processing in order to compensate for the lack of visual input, they can employ previously learned visual strategies, for example, visual imagery, which is still available after visual loss (Büchel et al. 1998) much more than the early blind. They also benefit greatly from fully developed multisensory systems, which may explain differences in multisensory plasticity encountered across the populations of the congenitally, early, and late blind. Equivalent benefits and cross-modal connections encountered in the late blind cannot be expected to occur in those who lack the experience of concurrent, often redundant or complementary input from different
Neurophysiological Mechanisms Underlying Plastic Changes
409
sensory modalities. Although the potential for multisensory integration can primarily be seen as a phenomenon that develops through integration of unisensory inputs (Wallace 2004b), it is important to emphasize that this does not imply a serial process in which fully developed individual modalities somehow merge in order to produce multisensory percepts. On the contrary, although some level of development of unisensory processing may be needed for the emergence of multisensory neurons, unisensory and multisensory perception start developing in a highly interdependent manner soon after this initial phase. Furthermore, although multisensory percepts may develop as a consequence of concurrent and correlated inputs from different modalities, they in turn also influence or channel the development and differentiation of single modalities (Lickliter and Bahrick 2004). Specifically, recent findings (Putzar et al. 2007) show that humans deprived of patterned visual input during the first months of life that later had their patterned vision restored show reduced audiovisual interactions. This indicates that adequate multisensory input during early development is indeed necessary for the full development of cross-modal interactions. Similar findings have been found for abnormal cross-modal integration in cochlear implant patients (Schorr et al. 2005). Overall, findings indicate substantial differences in all types of plasticity across congenitally, early, and late blind individuals. These between-group differences are not necessarily the same across all types of plastic changes and brain areas (networks) affected by them, because they depend to a great extent on the interaction between the onset of blindness and the exact stage of development at the time of blindness, which may differ in different brain systems. For example, it is plausible to assume that the ventral and dorsal pathways within the visual systems would be differently influenced by loss of vision at different developmental stages. Thus, systems dedicated to dynamically shifting relations between locations, objects, and events (including the dorsal visual pathway) may develop earlier and be therefore prone to a different pattern of developmental deficits (Neville and Bavelier 2000), comparable to specific findings showing that motion perception develops earlier than object perception (Fine et al. 2003). Finally, although some of the described, more “extreme” examples of plasticity may take years to develop, several studies suggest that withholding visual information for short periods, even a week, may have dramatic results: subjects who were blindfolded for only a week showed posterior occipital lobe activation during Braille reading (Amedi et al. 2006), and during tactile discrimination tasks (Merabet et al. 2008b). This activation was reduced when the blindfold was removed. Hence, not all cross-modal changes require longterm sensory deprivation, or slowly developing altered connectivity patterns; some may result from the previously mentioned unmasking of existing connectivity between the visual and other cortices, which are dormant (or actively inhibited) in normal conditions. It is likely that at least some of the plastic changes require extended periods of sensory deprivation, possibly occurring in the critical or sensitive periods in development. Such dependence may have important implications concerning the ability to restore sight and regain functional vision, as well as for understanding the neural mechanisms explaining the plastic changes evident both in early-onset as well as late-onset blind.
21.3.4 Neurophysiologic Mechanisms Underlying Plastic Changes in the Blind Although the exact etiology of the changes observed in blindness, regardless of its onset, is not yet fully understood, it has been suggested that all levels of connectivity, including connections within local circuits as well as long-range corticocortical and subcortical connections are altered in the blind (Bavelier and Neville 2002). Corroborating this, recent evidence indicates that the visual tracts connecting the visual cortex with the eyes are degenerated in the early-onset blind (Noppeney et al. 2005; Pan et al. 2007; Shimony et al. 2006), while the functional connectivity of the occipital cortex and various other cortical sites, including the supplementary motor area, pre- and postcentral gyri, superior parietal lobule, and the left superior and middle temporal gyri (Yu et al. 2008), is decreased. Therefore, altered connectivity patterns within other cortical or subcortical systems not affected by blindness may underlie the robust plastic changes exhibited in the blind. Several models have been posited, emphasizing the relevance of different types of connectivity in mediating different types
410
The Neural Bases of Multisensory Processes
of plasticity (i.e., cross-modal or multisensory plasticity), the plasticity of different areas, or, as previously described, plastic changes that occur at different onsets of vision loss. Evidence has been provided in support of each such model, suggesting that the individual models may capture different phenomena of relevance and that their combination may offer the full specification of the changes encountered after sensory loss. We will now briefly present models that emphasize subcortical and cortical connectivity changes in different extent, and briefly review theories that aim at explaining the general trends in long-range plasticity changes triggered by sensory loss. Subcortical models of connectivity are based mostly on findings in animal models of plasticity following sensory loss. Such studies in mammals, similar to the studies of rewiring sensory input (Sharma et al. 2000; von Melchner et al. 2000), suggest that the visual cortex may receive sensory input from subcortical sensory stations, which may cause a reorganization of the visual cortex, enabling it to process stimuli from other modalities. Specifically, several studies have shown that congenital blindness (caused by early enucleation) causes a rewiring of tactile and auditory inputs from the thalamic and other brainstem stations in the sensory pathways to the visual cortex (Chabot et al. 2007; Izraeli et al. 2002; Karlen et al. 2006; Laemle et al. 2006; Piche et al. 2007). This rewiring is evident both in the neural connectivity (indicated by the use of anterograde and retrograde tracers) and in the functional properties of the “visual” cortex (examined by electrophysiological recordings), which now starts to exhibit auditory or tactile responses. This type of model may explain the engagement of the visual cortex of blind humans in “low-level” sensory tasks (Kujala et al. 2005; Gougoux et al. 2005) seen in many studies, which constitutes cross-modal or intermodal, plasticity. However, despite the evidence for spontaneous occurrence of such connectivity in mammals, no definite support for such a model has been established in humans as of yet. Corticocortical models of connectivity are currently better grounded to account for the largescale plasticity observed in the blind. Although it was previously assumed that there are no direct connections between sensory modalities, recent anatomical studies in primates indicate the existence of projections from the auditory to the visual cortex and multisensory feedback connections to primary visual areas (Falchier et al. 2002; Rockland and Ojima 2003). Supporting this connectivity in humans, increased functional connectivity between the primary somatosensory cortex and primary visual cortex was found in early-onset blind (Wittenberg et al. 2004). Although direct connectivity from auditory or somatosensory cortex to primary visual cortex may explain some, it may not account for all of its perceptual properties. In addition, such a model may not explain the “high cognitive” component of the compensatory plasticity, as reflected in, for example, the involvement of the visual cortex in verbal memory and language. In order to account for these findings, models of corticocortical connectivity have to be further refined. Specifically, these cannot remain limited only to illustrating the presence or lack of white matter fibers between different regions, but need to address the dynamics of information transfer. In one such model, the so-called inverted hierarchy model (Amedi et al. 2003; Büchel 2003), feedback connectivity is considered to play a crucial role in cross-modal (and supramodal) plasticity. Specifically, connections stemming from temporal, parietal, and frontal lobes may, in the absence of visual input and visual pathway connectivity competition, be responsible for providing nonvisual input to the occipital lobe, enabling its engagement in nonvisual processing. This is particularly true for areas involved in multisensory processing even in the sighted, such as regions within the lateral occipital complex (LOC) that are naturally active both during tactile and visual object recognition (Amedi et al. 2001, 2002). Such areas retain some of their original sensory input following the loss of one modality and may consequently preserve their original functions (i.e., tactile shape recognition, including Braille reading), corresponding to multimodal or multisensory plasticity. The feedback connectivity from these regions to earlier stations in the visual pathways, such as the primary visual cortex, may further expand this network. Since these stations are now even further away from direct sensory input (a similar distance from the sensory receptors as the frontal cortex, as measured by the number of synapses), the model posits they may now begin to engage in even higher cognitive functions, similar to the frontal cortex. In support of this hypothesis, it was demonstrated that the functional connectivity of the
Neurophysiological Mechanisms Underlying Plastic Changes
411
visual cortex with frontal language regions is increased in the blind (Liu et al. 2007). Such changes in connectivity could account for the altered pattern of inputs reaching the occipital cortex, which may in the end determine the morphological and physiological features of this area and enable its functional reassignment to nonsensory tasks. It is still too early to speculate about all implications of the inverted hierarchy approach, particularly in relation to those areas that might be at the top of the postulated hierarchy. On a little less speculative front, recent studies have provided evidence supporting some claims of the hypothesis suggesting increased feedback corticocortical information transfer following sensory loss. For example, it has been shown that the area involved in (visual and auditory) motion processing in the sighted is involved in auditory (Poirier et al. 2006a) as well as tactile (Ricciardi et al. 2007) motion processing in the blind. Similar conclusions can be drawn from findings showing the engagement of the ventral visual pathway typically involved in processing information related to the identification of objects and faces (Ungerleider and Mishkin 1982) in auditorily mediated object recognition, but only if detailed shape information is provided and efficiently extracted (Poirier et al. 2006c; Amedi et al. 2007). All of these results are congruent with the more general notion that cross-modal plasticity occurs in situations where the information originally processed within a certain area is similar, regardless of the input being rerouted into it (Grafman 2000). This implies that each cortical area may operate in a metamodal fashion (PascualLeone and Hamilton 2001), being specialized in a particular type of computation rather than being tied to a specific input modality. However, this type of broad generalization is subject to caution as it is still not clear how such metamodal computations would develop, especially in the case of significantly altered inputs during development such as in the case of congenital blindness. On one hand, the metamodal theory suggests that, in blindness, visual deafferentation may lead to a strengthening of the corresponding input signal from other modalities to the “visual” areas, which will maintain the original cortical operation. This hypothesis predicts that the classical hierarchy (i.e., low-level basic feature analysis in early visual areas, high level object recognition in LOC) is maintained in the blind, who now utilize the tactile (and auditory) modalities. By contrast, the inverted hierarchy theory suggests that, because of the dysfunctional main bottom-up geniculostriatal pathway in the blind, the retinotopic areas (especially V1) will be much farther (in terms of the number of synapses) from the remaining functional sense organs (in the tactile or auditory modalities). This, in turn, would lead to V1 resembling more the prefrontal cortex (which is similarly remote from any direct sensory input), rather than becoming a primary sensory area in the blind. Both these theories may, however, be resolved with regard to the connectivity of the reorganized visual cortex of the blind and the onset of visual loss. The development of computations characteristic of a certain region is strongly dependent on the input that it originally receives. Therefore, in cases of congenital sensory loss, primary sensory areas are less likely to develop computations similar to the ones performed in the typical brain. These differences in connectivity may lead to more excessive developmental changes, causing cortical regions to assume very different computations than their natural roles. The visual cortex of the congenitally blind may correspond to early stations of sensory processing due to auditory or tactile subcortical (or even cortical, see Wittenberg et al. 2004) connectivity as seen in animal models of blindness, or to higher stations in the hierarchy (as predicted by the inverted hierarchy model) if most of the connectivity is indeed from high-order (multisensory) cortical regions. Currently, evidence for both types of plasticity can be found, as the visual cortex of the congenitally blind is activated both by simple perceptual tasks as well as by mnemonic and semantic tasks, but there appear to be differences in the preference for perceptual vs. high-level cognitive functions in different areas of the occipital cortex. Specifically, there is growing evidence that as one goes anteriorly in the ventral visual stream—the weights between the activation to the two tasks tend toward the perceptual tasks, whereas posteriorly in and around calcarine sulcus (V1) there is clear preference to the higher-order verbal memory and language tasks (Raz et al., 2005). However, this issue is not yet resolved and it will greatly benefit from future anatomical connectivity findings in humans. In the case of late-onset blindness, the connectivity of the visual cortex and its development are more typical of the sighted brain (as previously described), and
412
The Neural Bases of Multisensory Processes
reorganization is more likely to be of the high-order corticocortical type, along with some unmasking of subcortical connectivity, which is also apparent in blindfolded normally sighted individuals (Pitskel et al. 2007; Pascual-Leone et al. 2005; Amedi et al. 2006).
21.4 REHABILITATION-INDUCED PLASTICITY At the start of this chapter, we mentioned the most promising rehabilitation techniques available to restore functional vision in the blind. Next, we illustrated how these approaches are used on individuals whose cognition and brains have, due to sensory loss, already undergone drastic changes that depend crucially on the onset of such loss. In this case, plasticity can be viewed as a doubleedged sword. On one hand, it represents a critical property allowing functional adaptation to the lost sensory input. On the other, these same changes need to be modified or partly reversed once the lost sensory input is restored. Importantly, this remodeling does not have to result in a functional organization that would be identical to the one found in the majority of the population. On the contrary, considerable interindividual variability is to be expected in adapting to the implant (e.g., in cochlear implant patients, speech recognition performance ranges from very poor to near perfect; Geers 2006). This is particularly true with regard to the variability in onset of blindness, as was discussed in relation to neural mechanisms of plasticity at different stages of life. Late-onset blind may particularly benefit from reacquiring visual information, as their visual cortex has developed in a manner that would allow it to process such information. Therefore, sensory loss triggers less pronounced reorganization of their visual cortex when compared to that in the early-onset or congenitally blind whose brains undergo more extensive remodeling and therefore encounter more difficulties in adapting to visual information. Furthermore, sensory implants (and, although in a different manner, SSDs) are prone to influence the brain as a system, not just one modality. For example, it was shown that visual information can disrupt the processing of auditory information in newly transplanted cochlear implant patients, most probably due to cross-modal visual processing in the auditory cortex (Champoux et al. 2009), and that “badly performing” cochlear implant patients may have more extended visual responses in their auditory cortex (Doucet et al. 2006). Similarly, cross-modal plasticity in the auditory cortex before the surgery can constrain cochlear implant efficacy (Lee et al. 2001). Therefore, decreasing the amount of nonrelevant visual information and increasing responses to input arriving from the cochlear implant might be a useful part of rehabilitation. Such interference may occur in visual restoration, in all the tasks that are functionally dependent on the occipital lobe of the blind (particularly tasks that can be disrupted by occipital TMS, as described in previous sections). One solution to at least some of these problems might be the integration of SSDs and prostheses as they provide fairly distinct advantages. Specifically, although prostheses may allow entry of visual signals to the brain, SSDs can be very useful in teaching the brain how to interpret the input from the new modality, and perhaps prepare the brain and mind of the blind, in advance, for new sensory information.
21.4.1 Plasticity after SSD Use and Its Theoretical Implications Observing the outcomes of sensory restoration and substitution is not only of practical relevance, because it also provides a unique opportunity to address numerous theoretical questions about perception and the nature of qualia and can teach valuable lessons about brain plasticity and perception. Several studies investigating SSDs use have examined the way blind and sighted brains process sensory-transformed information provided by such devices. Functional properties of multisensory regions can easily be studied using SSDs as a methodology of choice, as SSDs are naturally processed in a multisensory or trans-modal manner. For example, several recent studies used SSDs to test the metamodal processing theory (Pascual-Leone and Hamilton 2001), which states that each brain region implements a particular type of computation regardless of its modality of input. These studies showed that object shape information drives the activation of the LOC in the visual cortex,
Neurophysiological Mechanisms Underlying Plastic Changes
413
regardless of whether it is transmitted in the visual, tactile, or auditory modality (Amedi et al. 2007; Poirier et al. 2006c) in sighted as well as blind individuals (see Figure 21.1b, c). Interestingly, applying TMS to this region can disrupt shape identification using an auditory SSD (Merabet et al. 2008a). In the same way, studies conducted using PSVA in the sighted show that auditorily mediated face perception can activate the visual fusiform face area (Plaza et al. 2009), whereas depth perception activates the occipito-parietal and occipito-temporal regions (Renier et al. 2005). Studying the use of SSDs in a longitudinal fashion also provides a good opportunity to monitor in real time how newly acquired information is learned, and investigate the accompanying cognitive and neural changes. For example, several studies have looked into differential activation before and after learning how to use a specific SSD. One study showed that shape discrimination using the TDU SSD generated activation of the occipital cortex following short training only in earlyonset blind individuals (but not in sighted; Ptito et al. 2005), and that TDU training enables TMS to induce spatially organized tactile sensations on the tongue (Kupers et al. 2006). These studies suggest that the occipital lobe of the blind may be more prone to plasticity or to cross-modal processing even in adulthood when compared to that of the sighted. Cross-modal activation of the visual cortex of sighted subjects was also demonstrated, following training on the PSVA SSD (Poirier et al. 2007a). Although these findings indicating behavioral and imaging findings have been reported for both early- (Arno et al. 2001) and late-onset blind (Cronly-Dillon et al. 1999) and sighted individuals (Poirier et al. 2007a), it has recently been claimed that the recruitment of occipital areas during the use of SSDs could be mediated by different processes or mechanisms in different populations. Specifically, although the early blind might exhibit real bottom-up activation of occipital cortex for tactile or auditory perception, in the late blind and sighted this activation might reflect top-down visual imagery mechanisms (Poirier et al. 2007b). This suggestion is not surprising, given that we have previously given a similar claim with regard to the mechanisms underlying plastic changes following sensory loss itself. Importantly, recent evidence of multisensory integration for object recognition, as shown by using a novel cross-modal adaptation paradigm (Tal and Amedi 2009), may imply that the sighted could share some bottom-up mechanisms of tactile and visual integration in visual cortex. Nevertheless, in addition to relying on different neurophysiological mechanisms, the possible behavioral potential of SSDs may also vary between subpopulations of the blind, as the late-onset blind can better associate the cross-modal input to the properties of vision as they knew it (e.g., they have better knowledge of the 2-D representation of visual pictures, which is useful in most current 2-D SSDs), whereas early blind individuals lack such understanding of the visual world, but may have more highly developed auditory and tactile cross-modal networks and plasticity. This difference in utilizing visual rehabilitation between the two blind groups may be even more valid in the case of sensory restoration. Importantly, this differentiation between early and late-onset blind in SSD use also highlights the potential of introducing such devices as early as possible in development, while the brain is still in its prime with respect to plasticity. Similar to the improved outcomes of cochlear implantation in early childhood (Harrison et al. 2005), it may be of particular interest to attempt to teach young blind children to utilize such devices. Several early attempts to teach blind infants to use the Sonicguide (Kay and Kay 1983) showed some promise, as younger subjects showed more rapid sensitivity to the spatial information provided by the device (Aitken and Bower 1982, 1983) (although with highly variable results; for a discussion, see Warren 1994). However, to our knowledge, only a few preliminary later attempts (Segond et al. 2007; Amedi et al., unpublished observations) have been made to adapt the use of SSDs to children. The training of infants on SSD use may also lead to a more “natural” perception of the sensorily transformed information, perhaps even to the level of synesthesia (Proulx and Stoerig 2006), a condition in which one type of sensory stimulation evokes the sensation of another, commonly in another modality or submodality: for instance, color is associated with letters or numbers, sounds with vision or other sensory combinations. This type of synesthesia may create visual experiences or even visual qualia with regard to the SSD percept. In a recent study (Ward and Meijer 2009) describing the phenomenology of two blind users of a the vOICe SSD, some evidence for its feasibility can be seen in reports of a late-
414
The Neural Bases of Multisensory Processes
blind vOICe user, who reports of synesthetic percepts of vision while using the device, and even synesthetic percepts of color, which is not conveyed by the devices, but is “filled in” by her mind’s eye. Some of her descriptions of her subjective experience illustrate the synesthetic nature of the SSD percept: “the soundscapes seem to trigger a sense of vision for me. . . . It does not matter to me that my ears are causing the sight to occur in my mind” (see more on the vOICe website, http://www .seeingwithsound.com/users.htm). In summary, observing the outcomes of sensory restoration and substitution offers a unique opportunity to address and potentially answer numerous theoretical questions about the fundamental principles of brain organization, neuroplasticity, unisensory processing, and multisensory integration, in addition to their obvious clinical use. Research in these fields may also provide useful insights that can be applied in clinical settings, such as the suggested use of SSDs and sensory recovery at an early developmental stage.
21.5 CONCLUDING REMARKS AND FUTURE DIRECTIONS The findings reviewed in this chapter clearly show the extent of changes that accompany sensory loss. Cognitive functioning as well as brain organization on all spatial scales undergoes profound reorganization that depends on multiple factors, primarily the onset of the sensory impairment. Functionally, many (although not necessarily all) of these changes are beneficial and allow individuals to better adapt and use the available resources to compensate for the loss and function more efficiently in their surroundings. Sensory loss triggers a wide range of such changes underpinned by different neurophysiological mechanisms that we still know very little about. Therefore, an important goal for the future will include elucidating these individual phenomena at various levels and timescales. One critical variable that has so far not received enough attention includes connectivity changes that enable different types of restructuring as well as the factors determining such changes, for instance, the onset of blindness. The major theoretical mission for the future will, however, include bringing all of these changes and suggested mechanisms together. Although difficult, this mission can be successful, as all of these seemingly unrelated types of changes occur in the same dynamic system, are mutually interdependent, and are shaped by each other. Investigating the consequences of sensory loss can also lead to significant advances in our understanding of the fundamental principles of the formation and the development of the nervous system. For example, investigating the brains of visually or auditory impaired individuals can shed light on issues that include specifying criteria for defining and delineating individual cortical areas or determining the extent and the interplay of genetic and environmental determination of brain organization. On a less theoretical note, studying the outcomes of sensory loss is crucially important for developing rehabilitation techniques, the benefits of which cannot be emphasized enough. Optimization of all of these approaches is motivated and enabled by theoretical progress and emerging knowledge about plastic changes following sensory impairment. On the other hand, these approaches should not be viewed as pure applications constrained by basic science. They are developed in synergy and are sometimes even ahead of their theoretical counterpart. For example, the first SSDs were developed at a time when their outcomes would have never been predicted by mainstream theories. Even today, we are fascinated and often surprised by the outcomes of such devices, which strongly inform our theoretical knowledge. Therefore, one major mission for the future will include bringing theory and practice even closer together as they each benefit from the questions posed and answers provided by the other. On a purely practical level, the main direction for the future will include the improvement of current or development of new rehabilitation techniques and approaches aimed at combining these techniques as these may often complement each other. In conclusion, the present chapter reviewed the main findings and illustrated the theoretical and practical importance of studying consequences of sensory loss. All of the described plastic changes that occur in this context indicate that no region or network in the brain is an island and that all types of a lesions or atypical development patterns are bound to influence the system as a whole. The
Neurophysiological Mechanisms Underlying Plastic Changes
415
current challenge is to understand the principles that guide, mechanisms that underlie, and factors that influence such changes, so that this knowledge can be channeled into practical rehabilitation purposes.
REFERENCES Aitken, S., and T. G. Bower. 1982. Intersensory substitution in the blind. J Exp Child Psychol 33: 309–323. Aitken, S., and T. G. Bower. 1983. Developmental aspects of sensory substitution. Int J Neurosci 19: 13–91. Amedi, A., J. Camprodon, L. Merabet et al. 2006. Highly transient activation of primary visual cortex (V1) for tactile object recognition in sighted following 5 days of blindfolding. Paper presented at the 7th Annual Meeting of the International Multisensory Research Forum, University of Dublin. Amedi, A., A. Floel, S. Knecht, E. Zohary, and L. G. Cohen. 2004. Transcranial magnetic stimulation of the occipital pole interferes with verbal processing in blind subjects. Nat Neurosci 7: 1266–1270. Amedi, A., G. Jacobson, T. Hendler, R. Malach, and E. Zohary. 2002. Convergence of visual and tactile shape processing in the human lateral occipital complex. Cereb Cortex 12: 1202–1212. Amedi, A., R. Malach, R. Hendler, S. Peled, and E. Zohary. 2001. Visuo-haptic object-related activation in the ventral visual pathway. Nat Neurosci 4: 324–330. Amedi, A., L. B. Merabet, F. Bermpohl, and A. Pascual-Leone. 2005a. The Occipital Cortex in the Blind. Lessons About Plasticity and Vision. Curr Dir Psychol Sci, 14: 306–311. Amedi, A., N. Raz, P. Pianka, R. Malach, and E. Zohary. 2003. Early ‘visual’ cortex activation correlates with superior verbal memory performance in the blind. Nat Neurosci 6: 758–766. Amedi, A., W. M. Stern, J. A. Camprodon et al. 2007. Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex. Nat Neurosci 10: 687–689. Amedi, A., K. Von Kriegstein, N. M. Van Atteveldt, M. S. Beauchamp, and M. J. Naumer. 2005b. Functional imaging of human crossmodal identification and object recognition. Exp Brain Res 166: 559–571. Arno, P., C. Capelle, M. C. Wanet-Defalque, M. Catalan-Ahumada, and C. Veraart. 1999. Auditory coding of visual patterns for the blind. Perception 28: 1013–1029. Arno, P., A. G. De Volder, A. Vanlierde et al. 2001. Occipital activation by pattern recognition in the early blind using auditory substitution for vision. Neuroimage 13: 632–645. Ashmead, D. H., E. W. Hill, and C. R. Talor. 1989. Obstacle perception by congenitally blind children. Percept Psychophys 46: 425–433. Ashmead, D. H., R. S. Wall, K. A. Ebinger, S. B. Eaton, M. M. Snook-Hill, and X. Yang. 1998. Spatial hearing in children with visual disabilities. Perception 27: 105–122. Auvray, M., S. Hanneton, and J. K. O’Regan. 2007. Learning to perceive with a visuo-auditory substitution system: Localisation and object recognition with ‘The vOICe’. Perception 36: 416–430. Bach-Y-Rita, P. 2004. Tactile sensory substitution studies. Ann N Y Acad Sci 1013: 83–91. Bach-Y-Rita, P., C. C. Collins, F. A. Saunders, B. White, and L. Scadden. 1969. Vision substitution by tactile image projection. Nature 221: 963–964. Bach-Y-Rita, P., K. A. Kaczmarek, M. E. Tyler, and J. Garcia-Lara. 1998. Form perception with a 49-point electrotactile stimulus array on the tongue: A technical note. J Rehabil Res Dev 35: 427–430. Bach-y-Rita, P., and S. W. Kercel. 2003. Sensory substitution and the human–machine interface. Trends Cogn Sci 7: 541–546. Bavelier, D., M. W. Dye, and P. C. Hauser. 2006. Do deaf individuals see better? Trends Cogn Sci 10: 512–518. Bavelier, D., and H. J. Neville. 2002. Cross-modal plasticity: Where and how? Nat Rev Neurosci 3: 443–452. Beauchamp, M. S., B. D. Argall, J. Bodurka, J. H. Duyn, and A. Martin. 2004. Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nat Neurosci 7: 1190–1192. Brelen, M. E., F. Duret, B. Gerard, J. Delbeke, and C. Veraart. 2005. Creating a meaningful visual perception in blind volunteers by optic nerve stimulation. J Neural Eng 2: S22–S28. Brown, J. A. 2006. Recovery of motor function after stroke. Prog Brain Res 157: 223–228. Bubic, A., E. Striem-Amit, and A. Amedi. 2010. Large-scale brain plasticity following blindness and the use of sensory substitution devices. In Multisensory Object Perception in the Primate Brain, ed. J. Kaiser and M. Naumer, part 4, 351–380. Büchel, C. 2003. Cortical hierarchy turned on its head. Nat Neurosci 6: 657–658. Büchel, C., C. Price, R. S. Frackowiak, and K. Friston. 1998. Different activation patterns in the visual cortex of late and congenitally blind subjects. Brain 121(Pt 3): 409–419. Bull, N. D., and K. R. Martin. 2009. Using stem cells to mend the retina in ocular disease. Regen Med 4: 855–864.
416
The Neural Bases of Multisensory Processes
Buonomano, D. V., and H. A. Johnson. 2009. Cortical plasticity and learning: Mechanisms and models. In Encyclopedia of neuroscience, ed. L. R. Squire. London: Academic Press. Burton, H., A. Z. Snyder, J. B. Diamond, and M. E. Raichle. 2002. Adaptive changes in early and late blind: A FMRI study of verb generation to heard nouns. J Neurophysiol 88: 3359–3371. Burton, H. 2003. Visual cortex activity in early and late blind people. J Neurosci 23: 4005–4011. Burton, H., J. B. Diamond, and K. B. McDermott. 2003. Dissociating cortical regions activated by semantic and phonological tasks: A FMRI study in blind and sighted people. J Neurophysiol 90: 1965–1982. Calvert, G. A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cereb Cortex 11: 1110–1123. Campbell, R., and M. MacSweeney. 2004. Neuroimaging studies of cross-modal plasticity and language processing in deaf people. In The handbook of multisensory processes, ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA: MIT Press. Capelle, C., C. Trullemans, P. Arno, and C. Veraart. 1998. A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Trans Biomed Eng 45: 1279–1293. Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. Eur J Neurosci 22: 2886–2902. Chabot, N., S. Robert, R. Tremblay, D. Miceli, D. Boire, and G. Bronchti. 2007. Audition differently activates the visual system in neonatally enucleated mice compared with anophthalmic mutants. Eur J Neurosci 26: 2334–2348. Champoux, F., F. Lepore, J. P. Gagne, and H. Theoret. 2009. Visual stimuli can impair auditory processing in cochlear implant users. Neuropsychologia 47: 17–22. Chechik, G., I. Meilijson, and E. Ruppin. 1999. Neuronal regulation: A mechanism for synaptic pruning during brain maturation. Neural Comput 11: 2061–2080. Chebat, D. R., C. Rainville, R. Kupers, and M. Ptito. 2007. Tactile–‘visual’ acuity of the tongue in early blind individuals. Neuroreport 18: 1901–1904. Chen, R., L. G. Cohen, and M. Hallett. 2002. Nervous system reorganization following injury. Neuroscience 111: 761–773. Chen, S. C., L. E. Hallum, G. J. Suaning, and N. H. Lovell. 2007. A quantitative analysis of head movement behaviour during visual acuity assessment under prosthetic vision simulation. J Neural Eng 4: S108. Clavagnier, S., A. Falchier, and H. Kennedy. 2004. Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness. Cogn Affect Behav Neurosci 4: 117–126. Cohen, L. G., P. Celnik, A. Pascual-Leone, B. Corwell, L. Falz, J. Dambrosia, M. Honda, N. Sadato, C. Gerloff, M. D. Catala, and M. Hallett. 1997. Functional relevance of cross-modal plasticity in blind humans. Nature 389: 180–183. Cohen, L. G., R. A. Weeks, N. Sadato, P. Celnik, K. Ishii, and M. Hallett. 1999. Period of susceptibility for cross-modal plasticity in the blind. Ann Neurol 45: 451–460. Collignon, O., L. Renier, R. Bruyer, D. Tranduy, and C. Veraart. 2006. Improved selective and divided spatial attention in early blind subjects. Brain Res 1075: 175–182. Collignon, O., M. Lassonde, F. Lepore, D. Bastien, and C. Veraart. 2007. Functional cerebral reorganization for auditory spatial processing and auditory substitution of vision in early blind subjects. Cereb Cortex 17: 457–465. Cronin, T., T. Leveillard, and J. A. Sahel. 2007. Retinal degenerations: From cell signaling to cell therapy; preclinical and clinical issues. Curr Gene Ther 7: 121–129. Cronly-Dillon, J., K. Persaud, and R. P. Gregory. 1999. The perception of visual images encoded in musical form: A study in cross-modality information transfer. Proc Biol Sci 266: 2427–2433. Cronly-Dillon, J., K. C. Persaud, and R. Blore. 2000. Blind subjects construct conscious mental images of visual scenes encoded in musical form. Proc Biol Sci 267: 2231–2238. D’angiulli, A., and P. Waraich. 2002. Enhanced tactile encoding and memory recognition in congenital blindness. Int J Rehabil Res 25: 143–145. Dagnelie, G. 2008. Psychophysical evaluation for visual prosthesis. Annu Rev Biomed Eng 10: 339–368. Delbeke, J., M. C. Wanet-Defalque, B. Gerard, M. Troosters, G. Michaux, and C. Veraart. 2002. The microsystems based visual prosthesis for optic nerve stimulation. Artif Organs 26: 232–234. De Volder, A. G., A. Bol, J. Blin, A. Robert, P. Arno, C. Grandin, C. Michel, and C. Veraart. 1997. Brain energy metabolism in early blind subjects: Neural activity in the visual cortex. Brain Res 750: 235–244. Dobelle, W. H. 2000. Artificial vision for the blind by connecting a television camera to the visual cortex. ASAIO J 46: 3–9.
Neurophysiological Mechanisms Underlying Plastic Changes
417
Doucet, M. E., F. Bergeron, M. Lassonde, P. Ferron, and F. Lepore. 2006. Cross-modal reorganization and speech perception in cochlear implant users. Brain 129: 3376–3383. Doucet, M. E., J. P. Guillemot, M. Lassonde, J. P. Gagne, C. Leclerc, and F. Lepore. 2005. Blind subjects process auditory spectral cues more efficiently than sighted individuals. Exp Brain Res 160: 194–202. Dowling, J. 2008. Current and future prospects for optoelectronic retinal prostheses. Eye 23:1999–2005. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. J Neurosci 22: 5749–5759. Fallon, J. B., D. R. Irvine, and R. K. Shepherd. 2008. Cochlear implants and brain plasticity. Hearing Res 238: 110–117. Fernandez, E., P. Ahnelt, P. Rabischong, C. Botella, and F. Garcia-De Quiros. 2002. Towards a cortical visual neuroprosthesis for the blind. IFMBE Proc 3(2): 1690–1691. Fieger, A., B. Röder, W. Teder-Salejarvi, S. A. Hillyard, and H. J. Neville. 2006. Auditory spatial tuning in lateonset blindness in humans. J Cogn Neurosci 18: 149–157. Fine, I. 2008. The behavioral and neurophysiological effects of sensory deprivation. In Blindness and brain plasticity in navigation and object perception, ed. J. J. Rieser, D. H. Ashmead, F. F. Ebner, and A. L. Corn. New York: Taylor and Francis. Fine, I., A. R. Wade, A. A. Brewer et al. 2003. Long-term deprivation affects visual perception and cortex. Nat Neurosci 6: 915–916. Finney, E. M., I. Fine, and K. R. Dobkins. 2001. Visual stimuli activate auditory cortex in the deaf. Nat Neurosci 4: 1171–1173. Geers, A. E. 2006. Factors influencing spoken language outcomes in children following early cochlear implantation. Adv Otorhinolaryngol 64: 50–65. Gizewski, E. R., T. Gasser, A. de Greiff, A. Boehm, and M. Forsting. 2003. Cross-modal plasticity for sensory and motor activation patterns in blind subjects. Neuroimage 19: 968–975. Goldish, L. H., and H. E. Taylor. 1974. The Optacon: A valuable device for blind persons. New Outlook Blind 68: 49–56. Goldreich, D., and I. M. Kanics. 2003. Tactile acuity is enhanced in blindness. J Neurosci 23: 3439–3445. Goldreich, D., and I. M. Kanics. 2006. Performance of blind and sighted humans on a tactile grating detection task. Percept Psychophys 68: 1363–1371. Gougoux, F., R. J. Zatorre, M. Lassonde, P. Voss, and F. Lepore. 2005. A functional neuroimaging study of sound localization: Visual cortex activity predicts performance in early-blind individuals. PLoS Biol 3: e27. Grafman, J. 2000. Conceptualizing functional neuroplasticity. J Commun Disord 33: 345–355; quiz 355-6. Grant, A. C., M. C. Thiagarajah, and K. Sathian. 2000. Tactile perception in blind Braille readers: A psychophysical study of acuity and hyperacuity using gratings and dot patterns. Percept Psychophys 62: 301–312. Gregory, R. L., and J. G. Wallace. 1963. Recovery from early blindness: A case study. In Experimental Psychology Society, Monograph Supplement. 2nd ed. Cambridge, MA: Heffers. Haddock, J. N., and L. Berlin. 1950. Transsynaptic degeneration in the visual system; report of a case. Arch Neurol Psychiatry 64: 66–73. Harrison, R. V., K. A. Gordon, and R. J. Mount. 2005. Is there a critical period for cochlear implantation in congenitally deaf children? Analyses of hearing and speech perception performance after implantation. Dev Psychobiol 46: 252–261. Heyes, A. D. 1984. The Sonic Pathfinder: A new electronic travel aid. J Vis Impair Blind 78: 200–202. Hugdahl, K., M. Ek, F. Takio et al. 2004. Blind individuals show enhanced perceptual and attentional sensitivity for identification of speech sounds. Brain Res Cogn Brain Res 19: 28–32. Hull, T., and H. Mason. 1995. Performance of blind-children on digit-span tests. J Vis Impair Blind 89: 166–169. Izraeli, R., G. Koay, M. Lamish, A. J. Heicklen-Klein, H. E. Heffner, R. S. Heffner, and Z. Wollberg. 2002. Cross-modal neuroplasticity in neonatally enucleated hamsters: Structure, electrophysiology and behaviour. Eur J Neurosci 15: 693–712. Kaas, J. H. 1991. Plasticity of sensory and motor maps in adult mammals. Annu Rev Neurosci 14: 137–167. Kaas, J. H. 2000. The reorganization of somatosensory and motor cortex after peripheral nerve or spinal cord injury in primates. Prog Brain Res 128: 173–179. Karlen, S. J., D. M. Kahn, and L. Krubitzer. 2006. Early blindness results in abnormal corticocortical and thalamocortical connections. Neuroscience 142: 843–858. Kay, L., and N. Kay. 1983. An ultrasonic spatial sensor’s role as a developmental aid for blind children. Trans Ophthalmol Soc N Z 35: 38–42.
418
The Neural Bases of Multisensory Processes
Kleiner, A., and R. C. Kurzweil. 1977. A description of the Kurzweil reading machine and a status report on its testing and dissemination. Bull Prosthet Res 10: 72–81. Knudsen, E. I. 2004. Sensitive periods in the development of the brain and behavior. J Cogn Neurosci 16: 1412–1425. Kolb, B. 1995. Brain plasticity and behavior. Mahwah: Lawrence Erlbaum Associates, Inc. Korte, M., and J. P. Rauschecker. 1993. Auditory spatial tuning of cortical neurons is sharpened in cats with early blindness. J Neurophysiol 70: 1717–1721. Kral, A., J. Tillein, S. Heid, R. Klinke, and R. Hartmann. 2006. Cochlear implants: Cortical plasticity in congenital deprivation. Prog Brain Res 157: 283–313. Kujala, T., K. Alho, P. Paavilainen, H. Summala, and R. Naatanen. 1992. Neural plasticity in processing of sound location by the early blind: An event-related potential study. Electroencephalogr Clin Neurophysiol 84: 469–472. Kujala, T., M. J. Palva, O. Salonen et al. 2005. The role of blind humans’ visual cortex in auditory change detection. Neurosci Lett 379: 127–131. Kupers, R., A. Fumal, A. M. De Noordhout, A. Gjedde, J. Schoenen, and M. Ptito. 2006. Transcranial magnetic stimulation of the visual cortex induces somatotopically organized qualia in blind subjects. Proc Natl Acad Sci U S A 103: 13256–13260. Lacey, S., N. Tal, A. Amedi, and K. Sathian. 2009. A putative model of multisensory object representation. Brain Topogr 21: 269–274. Laemle, L. K., N. L. Strominger, and D. O. Carpenter. 2006. Cross-modal innervation of primary visual cortex by auditory fibers in congenitally anophthalmic mice. Neurosci Lett 396: 108–112. Leclerc, C., D. Saint-Amour, M. E. Lavoie, M. Lassonde, and F. Lepore. 2000. Brain functional reorganization in early blind humans revealed by auditory event-related potentials. Neuroreport 11: 545–550. Lakatos, P., C. M. Chen, M. N. O’Connell, A. Mills, and C. E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53: 279–292. Lamba, D., M. Karl, and T. Reh. 2008. Neural regeneration and cell replacement: A view from the eye. Cell Stem Cell 2: 538–549. Lamba, D. A., M. O. Karl, and T. A. Reh. 2009. Strategies for retinal repair: Cell replacement and regeneration. Prog Brain Res 175: 23–31. Lee, D. S., J. S. Lee, S. H. Oh, S. K. Kim, J. W. Kim, J. K. Chung, M. C. Lee, and C. S. Kim. 2001. Cross-modal plasticity and cochlear implants. Nature 409: 149–150. Lessard, N., M. Pare, F. Lepore, and M. Lassonde. 1998. Early-blind human subjects localize sound sources better than sighted subjects. Nature 395: 278–280. Lickliter, R., and L. E. Bahrick. 2004. Perceptual development and the origins of multisensory responsiveness. In The handbook of multisensory processes, ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA: MIT Press. Linvill, J. G., and J. C. Bliss. 1966. A direct translation reading aid for the blind. Proc IEEE 54: 40–51. Liu, Y., C. Yu, M. Liang et al. 2007. Whole brain functional connectivity in the early blind. Brain 130: 2085–2096. Locker, M., C. Borday, and M. Perron. 2009. Stemness or not stemness? Current status and perspectives of adult retinal stem cells. Curr Stem Cell Res Ther 4: 118–130. MacLaren, R. E., and R. A. Pearson. 2007. Stem cell therapy and the retina. Eye (London) 21: 1352–1359. Marr, D. 1982. Vision. San Francisco: W. H. Freeman. Meijer, P. B. 1992. An experimental system for auditory image representations. IEEE Trans Biomed Eng 39: 112–121. Merabet, L., G. Thut, B. Murray, J. Andrews, S. Hsiao, and A. Pascual-Leone. 2004. Feeling by sight or seeing by touch? Neuron 42: 173–179. Merabet, L. B., J. F. Rizzo, A. Amedi, D. C. Somers, and A. Pascual-Leone. 2005. What blindness can tell us about seeing again: Merging neuroplasticity and neuroprostheses. Nat Rev Neurosci 6: 71–77. Merabet, L. B., L. Battelli, S. Obretenova, S. Maguire, P. Meijer, and A. Pascual-Leone. 2008a. Functional recruitment of visual cortex for sound encoded object identification in the blind. Neuroreport 20: 132–138. Merabet, L. B., R. Hamilton, G. Schlaug et al. 2008b. Rapid and reversible recruitment of early visual cortex for touch. PLoS ONE 3: e3046. Michel, G. F., and A. N. Tyler,. 2005. Critical period: A history of the transition from questions of when, to what, to how. Dev Psychobiol 46: 156–162. Millar, S. 1981. Cross-modal and intersensory perception and the blind. In Intersensory perception and sensory integration, ed. R. D. Walk and H. L. J. Pick. New York: Plenum Press. Murphy, C., and W. S. Cain. 1986. Odor identification: The blind are better. Physiol Behav 37: 177–180.
Neurophysiological Mechanisms Underlying Plastic Changes
419
Neville, H. J., and D. Bavelier. 2000. Specificity of developmental neuroplasticity in humans: Evidence from sensory deprivation and altered language experience. In Toward a theory of neuroplasticity, ed. C. A. Shaw and J. C. Mceachern. New York: Taylor and Francis. Newman, N. M., R. A. Stevens, and J. R. Heckenlively. 1987. Nerve fibre layer loss in diseases of the outer retinal layer. Br J Ophthalmol 71: 21–26. Niemeyer, W., and I. Starlinger. 1981. Do the blind hear better? Investigations on auditory processing in congenital or early acquired blindness: II. Central functions. Audiology 20: 510–515. Noordzij, M. L., S. Zuidhoek, and A. Postma. 2006. The influence of visual experience on the ability to form spatial mental models based on route and survey descriptions. Cognition 100: 321–342. Noppeney, U. 2007. The effects of visual deprivation on functional and structural organization of the human brain. Neurosci Biobehav Rev 31: 1169–1180. Noppeney, U., K. J. Friston, J. Ashburner, R. Frackowiak, and C. J. Price. 2005. Early visual deprivation induces structural plasticity in gray and white matter. Curr Biol 15: R488–R490. Ofan, R. H., and E. Zohary. 2006. Visual cortex activation in bilingual blind individuals during use of native and second language. Cereb Cortex 17: 1249–1259. Ostrovsky, Y., A. Andalman, and P. Sinha. 2006. Vision following extended congenital blindness. Psychol Sci 17: 1009–1014. Ostrovsky, Y., E. Meyers, S. Ganesh, U. Mathur, and P. Sinha. 2009. Visual parsing after recovery from blindness. Psychol Sci 20: 1484–1491. Palanker, D., A. Vankov, P. Huie, and S. Baccus. 2005. Design of a high-resolution optoelectronic retinal prosthesis. J Neural Eng 2: S105–S120. Pan, W. J., G. Wu, C. X. Li, F. Lin, J. Sun, and H. Lei. 2007. Progressive atrophy in the optic pathway and visual cortex of early blind Chinese adults: A voxel-based morphometry magnetic resonance imaging study. Neuroimage 37: 212–220. Pascual-Leone, A., A. Amedi, F. Fregni, and L. B. Merabet. 2005. The plastic human brain cortex. Annu Rev Neurosci 28: 377–401. Pascual-Leone, A., and R. Hamilton. 2001. The metamodal organization of the brain. Prog Brain Res 134: 427–445. Penfield, W., and T. Rasmussen. 1950. The cerebral cortex of man: A clinical study of localization of function. New York: Macmillan. Pezaris, J. S., and R. C. Reid. 2005. Microstimulation in LGN produces focal visual percepts: Proof of concept for a visual prosthesis. J Vis 5: 367. Pezaris, J. S., and R. C. Reid. 2009. Simulations of electrode placement for a thalamic visual prosthesis. IEEE Trans Biomed Eng 56: 172–178. Piche, M., N. Chabot, G. Bronchti, D. Miceli, F. Lepore, and J. P. Guillemot. 2007. Auditory responses in the visual cortex of neonatally enucleated rats. Neuroscience 145: 1144–1156. Pitskel, N. B., L. B. Merabet, C. Ramos-Estebanez, T. Kauffman, and A. Pascual-Leone. 2007. Time-dependent changes in cortical excitability after prolonged visual deprivation. Neuroreport 18: 1703–1707. Plaza, P., I. Cuevas, O. Collignon, C. Grandin, A. G. De Volder, and I. Renier. 2009. Perceiving faces using auditory substitution of vision activates the fusiform face area. Belgian Society for Fundamental and Clinical Physiology and Pharmacology, Spring Meeting 2009. Acta Physiologica 195: S670. Poirier, C., O. Collignon, C. Scheiber et al. 2006a. Auditory motion perception activates visual motion areas in early blind subjects. Neuroimage 31: 279–285. Poirier, C., A. De Volder, D. Tranduy, and C. Scheiber. 2007a. Pattern recognition using a device substituting audition for vision in blindfolded sighted subjects. Neuropsychologia 45: 1108–1121. Poirier, C., A. G. De Volder, and C. Scheiber. 2007b. What neuroimaging tells us about sensory substitution. Neurosci Biobehav Rev 31: 1064–1070. Poirier, C., M. A. Richard, D. T. Duy, and C. Veraart. 2006b. Assessment of sensory substitution prosthesis potentialities in minimalist conditions of learning. Appl Cogn Psychol 20: 447–460. Poirier, C. C., A. G. De Volder, D. Tranduy, and C. Scheiber. 2006c. Neural changes in the ventral and dorsal visual streams during pattern recognition learning. Neurobiol Learn Mem 85: 36–43. Pozar, L. 1982. Effect of long-term sensory deprivation on recall of verbal material. Stud Psychol 24: 311–311. Pring, L. 1988. The ‘reverse-generation’ effect: A comparison of memory performance between blind and sighted children. Br J Psychol 79 (Pt 3): 387–400. Proulx, M. J., and P. Stoerig. 2006. Seeing sounds and tingling tongues: Qualia in synaesthesia and sensory substitution. Anthropol Philos 7: 135–151.
420
The Neural Bases of Multisensory Processes
Proulx, M. J., P. Stoerig, E. Ludowig, and I. Knoll. 2008. Seeing ‘where’ through the ears: Effects of learningby-doing and long-term sensory deprivation on localization based on image-to-sound substitution. PLoS ONE 3: e1840. Ptito, M., S. M. Moesgaard, A. Gjedde, and R. Kupers. 2005. Cross-modal plasticity revealed by electrotactile stimulation of the tongue in the congenitally blind. Brain 128: 606–614. Putzar, L., I. Goerendt, K. Lange, F. Rösler, and B. Röder. 2007. Early visual deprivation impairs multisensory interactions in humans. Nat Neurosci 10: 1243–1245. Rauschecker, J. P. 2000. Developmental neuroplasticity during brain development. In Toward a theory of neuroplasticity, ed. C. A. Shaw and J. C. McEachern. New York: Taylor and Francis. Rauschecker, J. P. 2008. Plasticity of cortical maps in visual deprivation. In Blindness and brain plasticity in navigation and object perception, ed. J. J. Rieser, D. H. Ashmead, F. F. Ebner, and A. L. Corn. New York: Taylor and Francis. Rauschecker, J. P., and M. Korte. 1993. Auditory compensation for early blindness in cat cerebral cortex. J Neurosci 13: 4538–4548. Raz, N., A. Amedi, and E. Zohary. 2005. V1 activation in congenitally blind humans is associated with episodic retrieval. Cereb Cortex 15: 1459–1468. Raz, N., E. Striem, G. Pundak, T. Orlov, and E. Zohary. 2007. Superior serial memory in the blind: A case of cognitive compensatory adjustment. Curr Biol 17: 1129–1133. Recanzone, G. H., M. M. Merzenich, W. M. Jenkins, K. A. Grajski, and H. R. Dinse. 1992. Topographic reorganization of the hand representation in cortical area 3b of owl monkeys trained in a frequencydiscrimination task. J Neurophysiol 67: 1031–1056. Renier, L., O. Collignon, C. Poirier et al. 2005. Cross-modal activation of visual cortex during depth perception using auditory substitution of vision. J Vis 5: 902. Ricciardi, E., N. Vanello, L. Sani et al. 2007. The effect of visual experience on the development of functional architecture in hMT+. Cereb Cortex 17: 2933–2939. Rizzo, J. F., L. Snebold, and M. Kenney. 2007. Development of a visual prosthesis. In Visual Prosthesis and Ophthalmic Devices: New Hope in Sight, ed. J. Tombran-Tink, C. J. Barnstable, and J. F. Rizzo, 71–93. Totowa, NJ: Humana Press. Rockland, K. S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. Int J Psychophysiol 50: 19–26. Röder, B., and F. Rösler. 2004. Compensatory plasticity as consequence of sensory loss. In The handbook of multisensory processes. ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA: Bradford Books, MIT Press. Röder, B., and F. Rösler. 1998. Visual input does not facilitate the scanning of spatial images. J Ment Imagery 22: 127–144. Röder, B., F. Rösler, and H. J. Neville. 2000. Event-related potentials during auditory language processing in congenitally blind and sighted people. Neuropsychologia 38: 1482–1502. Röder, B., F. Rösler, and H. J. Neville. 2001. Auditory memory in congenitally blind adults: A behavioral– electrophysiological investigation. Brain Res Cogn Brain Res 11: 289–303. Röder, B., W. Teder-Salejarvi, A. Sterr, F. Rösler, S. A. Hillyard, and H. J. Neville. 1999. Improved auditory spatial tuning in blind humans. Nature 400: 162–166. Röder, B., O. Stock, S. Bien, H. Neville, and F. Rösler. 2002. Speech processing activates visual cortex in congenitally blind humans. Eur J Neurosci 16: 930–936. Rösler, F., B. Röder, M. Heil, and E. Hennighausen. 1993. Topographic differences of slow event-related brain potentials in blind and sighted adult human subjects during haptic mental rotation. Brain Res Cogn Brain Res 1: 145–159. Sadato, N. 2005. How the blind “see” Braille: Lessons from functional magnetic resonance imaging. Neuroscientist 11: 577–582. Sadato, N., A. Pascual-Leone, J. Grafman, M. P. Deiber, V. Ibanez, and M. Hallett. 1998. Neural networks for Braille reading by the blind. Brain 121: 1213–1229. Sadato, N., A. Pascual-Leone, J. Grafman, V. Ibanez, M. P. Deiber, G. Dold, and M. Hallett. 1996. Activation of the primary visual cortex by Braille reading in blind subjects. Nature 380: 526–528. Sampaio, E., S. Maris, and P. Bach-Y-Rita. 2001. Brain plasticity: ‘Visual’ acuity of blind persons via the tongue. Brain Res 908: 204–207. Sathian, K. 2000. Practice makes perfect: Sharper tactile perception in the blind. Neurology 54: 2203–2204. Sathian, K. 2005. Visual cortical activity during tactile perception in the sighted and the visually deprived. Dev Psychobiol 46: 279–286. Sathian, K., A. Zangaladze, J. M. Hoffman, and S. T. Grafton. 1997. Feeling with the mind’s eye. Neuroreport 8: 3877–3881.
Neurophysiological Mechanisms Underlying Plastic Changes
421
Schlaggar, B. L., and D. D. O’Leary. 1991. Potential of visual cortex to develop an array of functional units unique to somatosensory cortex. Science 252: 1556–1560. Schmidt, E. M., M. J. Bak, F. T. Hambrecht, C. V. Kufta, D. K. O’rourke, and P. Vallabhanath. 1996. Feasibility of a visual prosthesis for the blind based on intracortical microstimulation of the visual cortex. Brain 119(Pt 2): 507–522. Schorr, E. A., N. A. Fox, V. van Wassenhove, and E. I. Knudsen. 2005. Auditory–visual fusion in speech perception in children with cochlear implants. Proc Natl Acad Sci U S A 102: 18748–18750. Schroeder, C. E., J. Smiley, K. G. Fu, T. McGinnis, M. N. O’Connell, and T. A. Hackett. 2003. Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing. Int J Psychophysiol 50: 5–17. Schroeder, C. E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory’ processing. Curr Opin Neurobiol 15: 454–458. Segond, H., D. Weiss, and E. Sampaio. 2007. A proposed tactile vision–substitution system for infants who are blind tested on sighted infants. J Vis Impair Blind 101: 32–43. Sharma, J., A. Angelucci, and M. Sur. 2000. Induction of visual orientation modules in auditory cortex. Nature 404: 841–847. Shaw, C. A., and J. C. Mceachern. 2000. Transversing levels of organization: A theory of neuronal stability and plasticity. In Toward a theory of neuroplasticity, ed. C. A. Shaw and J. C. Mceachern. New York: Taylor and Francis. Shimony, J. S., H. Burton, A. A. Epstein, D. G. McLaren, S. W. Sun, and A. Z. Snyder. 2006. Diffusion tensor imaging reveals white matter reorganization in early blind humans. Cereb Cortex 16: 1653–1661. Sinha, P. 2003. Face classification following long-term visual deprivation. J Vis 3: 104. Smith, M., E. A. Franz, S. M. Joy, and K. Whitehead. 2005. Superior performance of blind compared with sighted individuals on bimanual estimations of object size. Psychol Sci 16: 11–14. Smits, B., and M. J. C. Mommers. 1976. Differences between blind and sighted children on WISC Verbal Subtests. New Outlook Blind 70: 240–246. Spelman, F. A. 2006. Cochlear electrode arrays: Past, present and future. Audiol Neurootol 11: 77–85. Stilla, R., R. Hanna, X. Hu, E. Mariola, G. Deshpande, and K. Sathian. 2008. Neural processing underlying tactile microspatial discrimination in the blind: A functional magnetic resonance imaging study. J Vis 8: 13 11–1319. Tal, N., and A. Amedi. 2009. Multisensory visual–tactile object-related network in humans: Insights from a novel crossmodal adaptation approach. Exp Brain Res 198: 165–182. Tillman, M. H., and W. L. Bashaw. 1968. Multivariate analysis of the WISC scales for blind and sighted children. Psychol Rep 23: 523–526. Troyk, P., M. Bak, J., Berg et al. 2003. A model for intracortical visual prosthesis research. Artif Organs 27: 1005–1015. Uhl, F., P. Franzen, G. Lindinger, W. Lang, and L. Deecke. 1991. On the functionality of the visually deprived occipital cortex in early blind persons. Neurosci Lett 124: 256–259. Ungerleider, L. G., and M. Mishkin. 1982. Two cortical visual systems. In Analysis of visual behavior, ed. D. J. Ingle, M. A. Goodale, and R. J. W. Mansfield. Boston: MIT Press. Van Atteveldt, N., E. Formisano, R. Goebel, and L. Blomert. 2004. Integration of letters and speech sounds in the human brain. Neuron 43: 271–282. Vanlierde, A., and M. C. Wanet-Defalque. 2004. Abilities and strategies of blind and sighted subjects in visuospatial imagery. Acta Psychol (Amst) 116: 205–222. Veraart, C., M. C. Wanet-Defalque, B. Gerard, A. Vanlierde, and J. Delbeke. 2003. Pattern recognition with the optic nerve visual prosthesis. Artif Organs 27: 996–1004. Von Melchner, L., S. L. Pallas, and M. Sur. 2000. Visual behaviour mediated by retinal projections directed to the auditory pathway. Nature 404: 871–876. Voss, P., M. Lassonde, F. Gougoux, M. Fortin, J. P. Guillemot, and F. Lepore. 2004. Early- and late-onset blind individuals show supra-normal auditory abilities in far-space. Curr Biol 14: 1734–1738. Wakefield, C. E., J. Homewood, and A. J. Taylor. 2004. Cognitive compensations for blindness in children: An investigation using odour naming. Perception 33: 429–442. Wallace, M. 2004a. The development of multisensory processes. Cogn Process 5: 69–83. Wallace, M. T. 2004b. The development of multisensory integration. In The handbook of multisensory processes, ed. G. Calvert, C. Spence, and B. E. Stein. Cambridge, MA: MIT Press. Ward, J., and P. Meijer. 2009. Visual experiences in the blind induced by an auditory sensory substitution device. Conscious Cogn 19: 492–500.
422
The Neural Bases of Multisensory Processes
Warren, D. H. 1994. Blindness and children: An individual differences approach. New York: Cambridge Univ. Press. Weiland, J. D., W. Liu, and M. S. Humayun. 2005. Retinal prosthesis. Annu Rev Biomed Eng 7: 361–401. Weiland, J. D., and M. S. Humayun. 2008. Visual prosthesis. Proc IEEE 96: 1076–1084. West, E. L., R. A. Pearson, R. E. MacLaren, J. C. Sowden, and R. R. Ali. 2009. Cell transplantation strategies for retinal repair. Prog Brain Res 175: 3–21. Wiesel, T. N., and D. H. Hubel. 1963. Single-cell responses in striate cortex of kittens deprived of vision in one eye. J Neurophysiol 26: 1003–1017. Wiesel, T. N., and D. H. Hubel. 1965. Comparison of the effects of unilateral and bilateral eye closure on cortical unit responses in kittens. J Neurophysiol 28: 1029–1040. Wittenberg, G. F., K. J. Werhahn, E. M. Wassermann, P. Herscovitch, and L. G. Cohen. 2004. Functional connectivity between somatosensory and visual cortex in early blind humans. Eur J Neurosci 20: 1923–1927. Yu, C., Y. Liu, J. Li et al. 2008. Altered functional connectivity of primary visual cortex in early blindness. Hum Brain Mapp 29(5): 533–543. Zangaladze, A., C. M. Epstein, S. T. Grafton, and K. Sathian. 1999. Involvement of visual cortex in tactile discrimination of orientation. Nature 401: 587–590. Zwiers, M. P., A. J. Van Opstal, and J. R. Cruysberg. 2001. A spatial hearing deficit in early-blind humans. J Neurosci 21: RC142: 1–5.
22
Visual Abilities in Individuals with Profound Deafness A Critical Review Francesco Pavani and Davide Bottari
CONTENTS 22.1 Visual Abilities in Profound Deafness: An Open Challenge for Cross-Modal Plasticity Research................................................................................................................................. 423 22.1.1 Multiple Operational Definitions............................................................................... 425 22.1.2 Making Sense of Heterogeneity................................................................................ 426 22.2 A Task-Oriented Review of Empirical Evidence................................................................... 427 22.2.1 Perceptual Thresholds Tasks..................................................................................... 427 22.2.2 Simple Detection and Lateralization Tasks............................................................... 430 22.2.3 Visual Search Tasks................................................................................................... 432 22.2.4 Visual Discrimination and Identification Tasks........................................................ 434 22.2.4.1 Visual Discrimination with Flanker Interference . .................................... 436 22.2.5 Visual Tasks of Higher Complexity........................................................................... 438 22.3 A Transversal View on Literature.........................................................................................440 22.3.1 Enhanced Reactivity Rather than Enhanced Perceptual Processing........................440 22.3.2 Role of Deaf Sample Characteristics and Visual Stimulus Characteristics Are Relevant but Not Critical........................................................................................... 441 22.3.3 Role of Target Eccentricity and Selective Visual Attention Is Critical but Underspecified........................................................................................................... 441 22.4 Conclusions and Future Directions........................................................................................ 443 Acknowledgments...........................................................................................................................444 References.......................................................................................................................................444
22.1 VISUAL ABILITIES IN PROFOUND DEAFNESS: AN OPEN CHALLENGE FOR CROSS-MODAL PLASTICITY RESEARCH The world is inherently multisensory, and our ability to interact with it largely depends on the capability of our cognitive system to coherently use and integrate such variety of sensory inputs. Consider, for instance, the way in which we monitor the environment. In humans, vision plays a crucial role in informing the cognitive system about the spatial layout of the scene, and in recognizing objects and events. However, during steady fixation of gaze in one direction, the visual field typically extends 100° laterally on either side, 60° upward, and 75° downward (Harrington 1971). This leaves a large portion of the surrounding environment unexplored to vision, unless constant eye, head, and trunk movements are performed. Other distal senses, such as hearing or smell, can overcome this visual field limitation, providing inputs about regions of the environment beyond the boundaries of current visual perception. These additional sensory modalities can inform our cognitive system about stimuli that occur behind our body, are hidden by visual obstacles, or occur 423
424
The Neural Bases of Multisensory Processes
very far in space. In particular, hearing can provide a good estimate of the most likely location in space of the nonvisible stimulus (see Heffner and Heffner 1992 for a cross-species evaluation of the relationship between the ability to localize a sound and the width of the field of best vision). In addition, hearing constantly models the acoustic regularity in the environment and reacts to violations of such regularity, regardless of the current behavioral goal of the individual (Näätänen 1992). Thus, audition constitutes a fundamental guidance for reorienting our exploratory behavior. Efficient integration of sensory inputs from audition and vision is therefore essential for successful exploration of the surrounding environment. The way our cognitive system perceives the multisensory environment in which we live leads to a fundamental question that has long been debated among scientists and philosophers: What are the consequences of the absence of one sensory modality for cognition and multisensory perception? For instance, which are the consequences of long-term auditory deprivation due to profound deafness for the remaining sensory modalities, mainly vision and touch? An interest for this issue can be traced back at least to the seventeenth century (for historical reviews, see Hartmann 1933; Jordan 1961), and two opposing hypotheses have traditionally been put forward to account for the impact of sensory deprivation (i.e., deafness or blindness) on the remaining senses. The first hypothesis is that a substantial deficit in one sensory modality could affect the development and organization of the other sensory systems. We will refer to this first perspective as the perceptual deficit hypothesis. When applied to the case of profound deafness, the perceptual deficit hypothesis predicts poorer visual and tactile perceptual performance in deaf individuals, as compared to the age-matched hearing controls (e.g., Myklebust 1964). This hypothesis was based on the assumption that auditory deficiency can have a direct impact on the development of the other senses. In addition, it assumed that any language impairments resulting from profound deafness would limit hearing-impaired children in their interaction with the world, and result in a cognitive development lag in perceptual and cognitive tasks (Furth 1966). The second hypothesis is that a deficit in one sensory system would make the other modalities more sensitive, vicariously compensating for the loss of one sensory channel (e.g., Gibson 1969). We will refer to this second perspective as the sensory compensation hypothesis. When applied to the case of profound deafness, the sensory compensation hypothesis predicts that the visual and tactile modalities will show enhanced sensitivity. The latter prediction is often stated both in terms of behavioral consequences of deafness, and in terms of its neural outcomes. Specifically, the neural implications of the sensory compensation hypothesis are that the brain areas serving the impaired sensory modality may develop the ability to process perceptual inputs from one or more of the intact sensory systems ( functional reallocation account), or alternatively that brain areas of the remaining senses may acquire enhanced functional and processing capabilities (remaining senses hypertrophy account). After more than 30 years of systematic research conducted mainly on the visual abilities of profoundly deaf individuals, it is apparent that the long-standing debate as to whether perceptual and cognitive functions of deaf individuals are deficient or supranormal is far from being settled. Several reviews of this literature (e.g., Parasnis 1983; Bavelier et al. 2006; Mitchell and Maslin 2007) clearly indicate that deaf and hearing individuals perform comparably on a number of perceptual tasks. As we shall see later (see Section 22.2.1), this conclusion is strongly supported by tasks involving basic perceptual thresholds. Instead, other studies have revealed a differential performance in the two groups, either in the direction of deficient abilities in deaf than hearing participants (e.g., Quittner et al. 2004; Parasnis et al. 2003), or in the direction of supranormal abilities for the deaf population (e.g., Bottari et al. 2010; Loke and Song 1991; Neville and Lawson 1987). In this context, it should perhaps be emphasized that in the absence of clear behavioral differences between deaf and hearing participants, even the most striking differences between the two groups observed at the neural level cannot disentangle between the perceptual deficit hypothesis and the sensory compensation hypotheses. For instance, much of the renewed interest in the study of visual abilities in deaf individuals has been motivated by the seminal work of Neville et al. (1983). In that study, visual evoked potentials (VEPs) recorded from the scalp of eight congenitally deaf adults were significantly larger
Visual Abilities in Individuals with Profound Deafness
425
over both auditory and visual cortices, with respect to those of eight hearing controls, specifically for visual stimuli occurring in the periphery of the visual field (8.3°). Although this pioneering work implies that the lack of auditory experience from an early age can influence the organization of the human brain for visual processing [a finding that was later confirmed and extended by many other studies using different methodologies for the recording of brain responses; e.g., electroencephalogram (EEG): Neville and Lawson 1987; magnetoencephalography: Finney et al. 2003; functional magnetic resonance imaging: Bavelier et al. 2000, 2001], in the absence of a behavioral difference between the two groups it remains potentially ambiguous whether modifications at the neural level are an index of deficiency or compensation. In other words, even if one assumes that larger visual evoked components (e.g., Neville et al. 1983; Neville and Lawson 1987) or stronger bold responses (e.g., Bavelier et al. 2000; 2001) indicate enhanced processing of the incoming input, if this is not accompanied by behavioral enhancement it is difficult to conclude that it really serves some adaptive functional role. Unfortunately, the current evidence in the literature lacks this explicative power. With the only exception of the work by Neville and Lawson (1987), all other neuroimaging studies focused on measures of brain response alone, instead of combined measures of brain response and behavior. Furthermore, conclusive evidence that cortical reorganization serves a functional role can only originate from the observation that interfering with the reorganized brain response [e.g., using transcranial magnetic stimulation (TMS)] impairs the supranormal behavioral performance in the sensory-deprived participants (e.g., see Cohen et al. 1997 for an example of abolished supranormal tactile discrimination in the blind, following disruption of occipital lobe function using TMS).
22.1.1 Multiple Operational Definitions The solution of the controversy between deficient or compensatory behavioral outcomes of profound deafness should first of all rely on a clear operational definition of the concept of “enhanced visual abilities in deaf individuals.” On one hand, the question “Do deaf individuals see better?” (e.g., Rettenbach et al. 1999; Bavelier et al. 2006) is provocatively broad and calls for a specification of the domains of visual perception in which the sensory compensation hypothesis is to be tested for the case of deafness. On the other hand, a definition centered on the sole concept of enhanced sensitivity (e.g., Bross 1979a) is perhaps too limited, as it implies that the compensation hypothesis can only be true whenever discrimination sensitivity of deaf individuals is better than that measured in age-matched hearing controls. The concept of sensitivity refers to the ability of a perceptual system to discriminate a signal (e.g., a target) from noise (e.g., background events), and it is best described within the theoretical framework of the signal detection theory (SDT; Green and Swets 1966). In particular, SDT allows distinguishing sensitivity (expressed by the d′ index) from the observer’s response criterion (expressed by the c or ol β indices). Although the SDT is largely considered a standpoint for the study of perception, it is worth noting that the studies on visual abilities in deaf individuals have very rarely used the SDT approach to describe performance (see Bross 1979a, 1997b; Neville and Lawson 1987; Bosworth and Dobkins 1999, 2002a, 2002b; Bottari et al., in preparation). The first aim of this review is to provide a detailed description of the empirical evidence of visual abilities in profound deafness, structured as a function of the visual tasks that have been adopted by the different investigators and the dependent variable considered in the analyses. We start by describing the studies that investigated perceptual thresholds in the visual and tactile modalities, which gave an operational definition of enhanced visual ability in terms of better low-level sensitivity to the stimulus. Second, we describe studies that centered on simple detection or lateralization (left/right) responses, which gave an operational definition of enhanced visual ability in terms of faster response to a target onset. Third, we review studies that adopted visual search tasks, which gave on operational definition in term of efficiency in searching for a target feature in the visual scene. Fourth, we review reports that centered on discrimination and identification of suprathreshold stimuli, which gave an operational definition of enhanced ability in terms of better recognition
426
The Neural Bases of Multisensory Processes
of perceptual events. Finally, we conclude with a section on visual tasks of higher complexity that extended the operational definition to include the contribution of visual working memory and dual task performance.
22.1.2 Making Sense of Heterogeneity In addition to the controversy between “deficit” and “compensation” accounts, another critical issue in this research domain concerns the understanding of which aspect may be transversal to the different behavioral tasks, and may possibly explain the heterogeneity of the empirical results. The first transversal aspect that may account for the heterogeneity of the results is the diversity in the deaf sample characteristics. As originally pointed out by Hoemann (1978), in choosing deaf participants several studies have not controlled for differences in the amount of hearing loss, etiology of deafness, time from deafness onset at testing, and language(s) or mode(s) of communication used by deaf participants (see also Parasnis 1983). Recently, Bavelier and colleagues (2006) suggested that these differences in the deaf population sample can largely account for the heterogeneity in the literature. Specifically, they argued that studies reporting deficient visual functions in deaf than hearing individuals typically included deaf participants with heterogeneous background, whereas studies that have documented enhanced visual functions only included “deaf native signers” (i.e., individuals with no associated central nervous system damage and born profoundly deaf to deaf parents; Bavelier et al. 2006, p. 512). This specific deaf group achieves language development milestones at the same rate and time as hearing individuals, thus giving the opportunity to investigate the effects of auditory deprivation at the net of other confounding factors, such as language deprivation or atypical cognitive development due to communication deficiencies. As we shall see later (see Section 22.3.2), although a selection of deaf participants on the basis of the criteria proposed by Bavelier et al. (2006) has great methodological benefits, it appears unlikely that the heterogeneity in the empirical evidence can be reduced to this aspect alone. Furthermore, restricting the analysis only to “deaf native signers” would greatly limit generalization of the results, as this subgroup represents only 5% of the total deaf population (at least in the United States; see Mitchell and Karchmer 2002). The second important aspect that has often been emphasized as potential source of heterogeneity in the empirical evidence is the visual characteristics of the target stimulus. Several authors (e.g., Armstrong et al. 2002; Bavelier et al. 2006; Neville and Bavelier 2002) have proposed that enhanced visual abilities in deaf individuals may emerge selectively for the analysis of visual features that are preferentially processed within the visual-for-action pathway (also termed “motion pathway”), associated with the dorsal visual stream (Milner and Goodale 1995). For instance, an event-related potential (ERP) study by Armstrong and colleagues (2002) revealed enhanced cortical responses (larger N1 components) in deaf than in hearing adults in response to task-irrelevant motion stimuli at peripheral locations. Importantly, when cortical activity was compared between groups for stimuli varying along the color dimension (a visual feature preferentially processed by the ventral visual stream), enhanced cortical responses for deaf than hearing participants were no longer evident. Motion stimuli have also been shown to activate the MT+ complex more strongly in deaf than in hearing individuals using functional neuroimaging (Bavelier et al. 2000, 2001), and to activate the right auditory cortex in the deaf participants (Fine et al. 2005; Finney et al. 2001). The third aspect that has systematically been described as critical for enhanced visual abilities in deaf people is the eccentricity of the visual stimulus. The main working hypothesis for several investigations in this field has been that any visual enhancement in deaf individuals should emerge particularly for visual stimuli appearing toward the periphery of the visual field (e.g., Parasnis 1983; Neville and Lawson 1987). This prediction stems from the observation that, under normal conditions, the auditory system provides important information about the events that occur outside the field of view. Therefore, in the absence of audition, visual processing might recalibrate to favor visual events outside the fovea, in the attempt to monitor the environment through peripheral vision (e.g., Loke and Song 1991; Parasnis and Samar 1985). As shall be shown, a number of independent
Visual Abilities in Individuals with Profound Deafness
427
studies have provided general support to the hypothesis that peripheral regions of the visual field have a different status for deaf individuals with respect to hearing controls. However, the actual visual eccentricities associated with the terms “central,” “perifoveal,” and “peripheral” considerably varied across the different studies. Researchers have referred to stimulus location as “central” both when the stimulus was presented directly at fixation (e.g., Poizner and Tallal 1987) and when it was perifoveal (e.g., Neville and Lawson 1987). More critically, the term “peripheral” has been applied to locations in the visual field ranging from 3° of eccentricity (e.g., Chen et al. 2006) to 20° or more (e.g., Colmenero et al. 2004; Loke and Song 1991; Stevens and Neville 2006). As pointed out by Reynolds (1993), this ambiguity in the adopted terminology originate from the fact that the boundaries of the foveal region (up to 1.5° from fixation) are well defined by anatomical structures, whereas the distinction between perifoveal and peripheral visual field is not. Finally, most researchers have suggested that spatial selective attention plays a key role in modulating visual responses in deaf individuals (e.g., Bavelier et al. 2006; Dye et al. 2008; Loke and Song 1991; Neville and Lawson 1987; Parasnis and Samar 1985; Sladen et al. 2005). This suggestion originated from the studies that examined attention orienting in deaf and hearing participants (e.g., Colmenero et al. 2004; Parasnis and Samar 1985) and found that deaf individuals pay less of a cost when detecting a target occurring at invalidly cued locations. Furthermore, a potential difference in selective attention has been proposed by those studies that examined the interference of flankers on target discrimination (Proksch and Bavelier 2002; Sladen et al. 2005) and found that deaf individuals were more susceptible to peripheral flankers than hearing controls. Finally, the suggestion that employment of selective attention resources is the key requisite for revealing differences between deaf and hearing participants has emerged from the empirical observation that differences between deaf individuals and hearing controls have sometimes emerged specifically when attention was endogenously directed to the target (e.g., Bavelier et al. 2000; Neville and Lawson 1987; but see Bottari et al. 2008). However, whether all aspects of visual enhancement in deaf individuals are necessarily linked to allocation of selective attention in space is still a matter of debate. Furthermore, it is well acknowledged that selective spatial attention is not a unitary mechanism, and at least two functionally and anatomically distinct mechanisms of spatial attention have been identified (Corbetta and Shulman 2002; Jonides 1981; Mayer et al. 2004; Posner 1980). Visual attention can be oriented to an object or a location in a bottom-up fashion, because an abrupt change in visual luminance at the retinal level has occurred in a specific region of the visual field. This type of attention orienting is entirely automatic and has typically been referred to as exogenous orienting. Alternatively, visual attention can be summoned to an object or a location because of its relevance for the behavioral goal of the individual. This type of top-down attention orienting is voluntary and strategic, and has typically been referred to as endogenous orienting. Whether one or both of the components of selective attention are changed as a consequence of deafness remains an open question. Thus, whenever the claim that “early deafness results in a redistribution of attentional resources to the periphery” is made (e.g., Dye et al. 2008, p. 75), one should also ask which aspect of selective attention (endogenous, exogenous, or both) is changed by profound deafness. In sum, four distinct transversal aspects may contribute to explain the heterogeneity of the empirical results in the different behavioral tasks: diversity in the deaf sample characteristics, visual characteristics of the target stimulus, target eccentricity, and role of selective spatial attention. The second aim of the present review is to reevaluate the empirical evidence in support of these four different (but possibly interrelated) aspects in modulating visual abilities in deaf individuals.
22.2 A TASK-ORIENTED REVIEW OF EMPIRICAL EVIDENCE 22.2.1 Perceptual Thresholds Tasks One of the first studies to investigate perceptual thresholds in deaf individuals was conducted by Bross (1979a), who tested brightness discrimination sensitivity in six deaf and six hearing children
428
The Neural Bases of Multisensory Processes
(11 years old on average) for two circular patches of white light presented at 4.8° of eccentricity, on opposite sides with respect to the participant’s body midline. Initially, the just noticeable difference (JND) between the two patches was measured for each participant. Then, brightness for one of the two stimuli (variable) was set to 0.75 JND units above or equal to the other (standard), and participants were instructed to indicate whether the variable stimulus was brighter or equal in apparent brightness with respect to the standard. In the latter task, the probability that the variable stimulus was brighter than the standard changed between blocks, from less likely (0.25), to equal (0.50), to more likely (0.75). Deaf and hearing participants showed comparable JNDs for brightness discrimination. However, their sensitivity in the forced-choice task was better than hearing controls, as measured by d′. Intriguingly, deaf performance was entirely unaffected by the probability manipulation (i.e., deaf participants maintained a stable criterion, as measured by β), unlike hearing controls who became more liberal in their criterion as stimulus probability increased. However, the same two groups of participants showed comparable sensitivity (d′) when retested in a second study with largely comparable methods (Bross 1979b). In addition, in one further study adapting the same paradigm for visual-flicker thresholds, no difference between deaf and hearing controls emerged it terms or d′ or β (Bross and Sauerwein 1980). This led Bross and colleagues (Bross 1979a, 1997b; Bross and Sauerwein 1980) to conclude that no enhanced sensory sensitivity is observed in deaf children, in disagreement with the sensory compensation hypothesis. Finney and Dobkins (2001) reached a similar conclusion when measuring contrast sensitivity to moving stimuli in 13 congenital or early deaf adult participants (all signers), 14 hearing subjects with no signing experience, and 7 hearing subjects who signed from birth [Hearing Offspring of Deaf parents (HOD)]. Stimuli were black and white moving sinusoidal gratings presented for 300 ms to the left or to the right of one visual marker, and the participant’s task was to report whether the stimulus appeared to the left or to the right of the marker. Five markers were visible throughout the task (the central fixation cross and four dots located at 15° of eccentricity with respect to fixation). The stimulus could appear next to any of the five markers, thus forcing participants to distribute their visual attention across several visual locations. The luminance contrast required to yield 75% correct performance was measured for each participant across a range of 15 different combinations of spatial and temporal frequency of the stimulus. Regardless of all these manipulations, deaf, hearing, and HOD participants performed comparably on both central and peripheral stimuli, leading to the conclusion that neither deafness nor sign-language use lead to overall increases or decreases in absolute contrast sensitivity (Finney and Dobkins 2001, p. 175). Stevens and Neville (2006) expanded this finding by showing that contrast sensitivity was comparable in 17 congenital deaf and 17 hearing individuals, even for stimuli delivered in the macula of the participant, at 2° around visual fixation (see also Bavelier et al. 2000, 2001, for further evidence of comparable luminance change detection in deaf and hearing individuals). Interestingly, a between-group difference was instead documented when the task was changed to unspeeded detection of a small (1 mm) white light, moving from the periphery to the center of the visual field. In this kinetic perimetry task, deaf participants showed an enlarged field of view (about 196 cm2) with respect to hearing controls (180 cm2), regardless of stimulus brightness. The latter finding suggests that perceptual thresholds may differ for deaf and hearing individuals when motion stimuli are employed. However, three further investigations (Bosworth and Dobkins 1999, 2002a; Brozinsky and Bavelier 2004) that examined the performance of deaf and hearing participants in motion discrimination tasks indicate that this is not always the case. Bosworth and Dobkins (1999) tested 9 congenital or early deaf (all signers) and 15 hearing (nonsigner) adults in a motion direction–discrimination task. The stimulus consisted of a field of white dots presented within a circular aperture, in which a proportion of dots (i.e., signal dots) moved in a coherent direction (either left or right), whereas the remaining dots (i.e., noise dots) moved in a random fashion. Similar to the study of Finney and Dobkins (2001), stimuli were either presented at central fixation, or 15° to the left or to the right of fixation. Participants were instructed to report the direction of motion with a key press, and the proportion of coherent motion signal yielding 75% correct
Visual Abilities in Individuals with Profound Deafness
429
performance was measured for each participant. Mean thresholds did not differ between deaf and hearing controls, regardless of stimulus eccentricity (central or peripheral), stimulus duration (250, 400, or 600 ms) and vertical location of the lateralized stimuli (upper or lower visual field). The only between group difference concerned the performance across the two visual hemifields. Deaf participants exhibited a right visual field (RVF) advantage, whereas hearing controls exhibited a slight left visual field (LVF) advantage. The latter finding, however, reflected the signing experience rather than auditory deprivation, and resulted from the temporal coincidence between visual and linguistic input in the left hemisphere of experienced signers, as subsequently shown by the same authors (Bosworth and Dobkins 2002b). A convergent pattern of result emerged from the study by Bosworth and Dobkins (2002a), in which 16 deaf signers (12 congenital), 10 hearing signers, and 15 hearing controls were asked to detect, within a circular aperture, the direction of motion of a proportion of dots moving coherently (leftward or rightward), whereas the remaining dots moved in a random fashion. The proportion of dots moving coherently varied across trials, to obtain a threshold of the number of moving coherently dots necessary to yield the 75% of correct discriminations. The results showed that all group of participants performed comparably in terms of thresholds suggesting that deafness does not modulate the motion threshold. Convergent findings also emerged from a study by Brozinsky and Bavelier (2004), in which 13 congenitally deaf (signers) and 13 hearing (nonsigner) adults were asked to detect velocity increases in a ring of radially moving dots. On each trial, dots accelerated in one quadrant and participants indicated the location of this velocity change in a four-alternative forced choice. Across experiments, the field of dots extended between 0.5° and 8°, or between 0.4° and 2° (central field), or between 12° and 15° (peripheral field). The temporal duration of the velocity change yielding to 79% correct was measured for each participant. Regardless of whether the dots moved centrally or peripherally, velocity thresholds were equivalent for deaf and hearing individuals. Similar to the study by Bosworth and Dobkins (1999), deaf signers displayed better performance in the RVF than the LVF, again as a possible result of their fluency in sign language. Equivalent performance in deaf and hearing individuals has been documented also when assessing temporal perceptual thresholds (e.g., Bross and Sauerwein 1980; Poizner and Tallal 1987; Nava et al. 2008; but see Heming and Brown 2005). Poizner and Tallal (1987) conducted a series of experiments to test temporal processing abilities in 10 congenitally deaf and 12 hearing adults. Two experiments examined flicker fusion thresholds for a single circle flickering on and off at different frequencies, or for two circles presented in sequence with variable interstimulus interval (ISI) (Poizner and Tallal 1987; Experiments 1 and 2). One additional experiment tested temporal order judgment abilities for pairs or triplets of visual targets presented in sequence (Poizner and Tallal 1987; Experiment 3). All visual targets appeared from the same central spatial location on the computer screen and participants were asked to report the correct order of target appearance. No difference between deaf and hearing participants emerged across these tasks. More recently, Nava et al. (2008) tested 10 congenital or early deaf adults (all signers), 10 hearing controls auditory-deprived during testing, and 12 hearing controls who were not subjected to any deprivation procedure, in a temporal order judgment for pairs of visual stimuli presented at perifoveal (3°) or peripheral (8°) visual eccentricities. Regardless of stimulus eccentricity, temporal order thresholds (i.e., JNDs) and points of subjective simultaneity did not differ between groups. Notably, however, faster discrimination responses were systematically observed in deaf than hearing participants, especially when the first of the two stimuli appeared at peripheral locations (Nava et al. 2008). Finally, one study testing perceptual threshold for frequency discrimination in the tactile modality also confirmed the conclusion of comparable perceptual thresholds in deaf and hearing individuals (Levanen and Hamdorf 2001). Six congenitally deaf (all signers) and six hearing (nonsigners) adults were asked to decide whether the frequency difference between a reference stimulus (at 200 Hz) and a test stimulus (changing in interval between 160 and 250 Hz) was “rising” or “falling.” The frequency difference between the two stimuli that yielded 75% correct responses was measured
430
The Neural Bases of Multisensory Processes
for each participant. Although the frequency difference threshold was numerically smaller for deaf than hearing participants, no statistically significant difference emerged. In sum, the studies that have adopted perceptual thresholds to investigate the consequences of deafness on vision and touch (i.e., used an operational definition of better performance in terms of better low-level sensitivity to the stimulus) overall documented an entirely comparable performance between deaf and hearing individuals. Importantly, these findings emerged regardless of whether hearing-impaired participants were congenitally deaf born from deaf parents or early deaf. One clear example of this is the comparison between the study by Poizner and Tallal (1987) and Nava et al. (2008), which tested genetically versus early deaf on a comparable temporal order judgment task, and converged to the same conclusion. The absence of a difference at the perceptual level also emerged regardless of stimulus feature and eccentricity, i.e., regardless of whether target stimuli were static (e.g., Bross 1979a, 1979b) or moving (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004), and regardless of whether they appeared at central (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004; Poizner and Tallal 1987; Stevens and Neville 2006) or peripheral locations (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004; Nava et al. 2008). Finally, making the stimulus location entirely predictable (Bross 1979a; Poizner and Tallal 1987) or entirely unpredictable (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004) also had no effect, indicating that comparable performance of deaf and hearing participants was not modulated by the direction of selective visual attention in the scene. The only notable discrepancy with respect to this very consistent pattern of results is the observation of Stevens and Neville (2006) that deaf individuals possess a larger field of view with respect to hearing controls in the kinetic perimetry task. It would be interesting to examine whether this finding can also be replicated with stationary target at the extreme visual periphery.
22.2.2 Simple Detection and Lateralization Tasks Another approach to the study of visual abilities in profound deafness has been the direct assessment of the reactivity of deaf individuals in response to simple visual events or the assessment of their lateralization abilities (left vs. right response). One important aspect to note concerning this seemingly simple tasks, is that any advantage measured using these procedures could reflect faster processing of the perceptual events, faster response preparation or release, or a combination of the two. Many of the early studies on visual abilities in deaf individuals that aimed to test visual speed (e.g., the classic article by Doehring and Rosenstein 1969, entitled “Speed of visual perception in deaf children”; see also Olson 1967; Hartung 1970) actually examined unspeeded discriminations and visual memory abilities for stimuli presented tachistoscopically. Thus, they are not directly informative about the speed of visual processing and the speed of response in deaf people. Loke and Song (1991) were among the first to compared 20 congenital or early-deaf high school students and 19 hearing controls, in a task requiring simple detection of an asterisk briefly appearing on the computer screen. The asterisk was presented either at fixation (0.5°), or in the visual periphery (25°), and the task was always performed in monocular vision. The results documented faster responses for deaf than hearing controls (85 ms on average), selectively for targets appearing at peripheral locations. Interestingly, a similar between-group difference was also numerically evident for central locations (38 ms), and perhaps fell short of significance because of the very limited number of trials in each experimental condition (20 trials overall, 10 for each target location). Two years later, Reynolds (1993) also examined a group of 16 adult participants with early deafness (before 3 years of age, all signers) and 16 hearing controls, in two speeded detection tasks to visual stimuli presented using a tachistoscope. In one task (baseline measure; Reynolds 1993, p. 531), simple detection response times (RTs) were recorded in response to a black circular target, presented for 70 ms directly at fixation, in the absence of any peripheral stimulus. In the other task, participants were required to make a speeded bilateral key press to indicate the side of a perifoveal target (4°), by pressing a button located to the left or to the right of the starting position
Visual Abilities in Individuals with Profound Deafness
431
of the responding finger (the purpose of the simultaneous bilateral response was to balance hemispheric motoric activity in the task). Perifoveal targets consisted of six simple shapes (e.g., circle, square, triangle, diamond) that could be presented alone or simultaneously with task-irrelevant shapes of increasing complexity (from basic shapes to human faces or letters) delivered at fixation. Immediately after stimulus detection, participants were also required to identify the shape of the peripheral stimulus. Two results are noteworthy: first, simple detection of the foveal circle (baseline task) was faster for deaf than hearing participants (70 ms on average); second, simple detection and subsequent discrimination of the peripheral shapes also confirmed faster RTs for deaf than hearing participants (56 ms), but failed to show any between-group difference in identification accuracy (see Section 22.2.4 for further discussion of this study). More recently, Bottari et al. (in preparation) asked 11 congenital or early deaf (all signers) and 11 hearing adults (non signers) to press the space bar of the computer keyboard to the appearance of a small black circle, delivered for 48 ms on the computer screen at 3° or 8° of eccentricity. The results showed that deaf were faster than hearing controls (56 ms on average) at detecting the onset of the visual target, regardless of whether it appeared at 3° or 8°. Similarly, Bottari et al. (2010) asked a different group of 11 congenital or early deaf (all signers) and 11 hearing controls (non signers) to detect a circle open on the left or right side, presented for 48 ms at the 3° or 8° from central fixation. Stimuli were now corrected in size as a function of their eccentricity, and trials per condition were increased from 24 to 96 to increase statistical power. The results of this second study entirely supported those of Bottari et al. (in preparation), and showed a response time advantage in deaf than hearing participants (44 ms on average) that again was not spatially selective, i.e., it emerged regardless of target location instead of appearing only for peripheral targets (Loke and Song 1991). One further finding of the study by Bottari and colleagues (2010) was that the overall RT advantage for deaf participants emerged together with a differential response time ratios in the two groups as a function of target location. Hearing controls paid a significant RT cost when responding to peripheral than central target, whereas deaf individuals performed comparably across the two target locations. This suggests that advantages in reactivity and advantages in peripheral processing may be two dissociable aspects of enhanced visual processing in deaf individuals (see Section 22.3.3 for further discussion of this point). Other studies measuring speeded simple detection or speeded target lateralization in deaf people also manipulated the direction of attention before target onset, typically adapting the cue–target paradigm developed by Posner (1980). The first study to adopt this manipulation was conducted by Parasnis and Samar in 1985. They tested 20 hearing and 20 congenitally deaf college students (all signers and born from deaf parents) in a task requiring a speeded bimanual response (see Reynolds 1993) to indicate the side of a black unfilled circle, presented for 100 ms at 2.2° from central fixation. The stimulus was preceded by an arrow indicating the correct target side 80% of the times, or by a neutral cross signaling equal probability of the target on either side. In addition, across blocks, the peripheral target was presented with concurrent stimulation at fixation (five black crosses; i.e., foveal load condition) or alone (no load condition). Unlike the simple detection studies described above, the results of this experiment showed no overall RT advantage for deaf than hearing participants (in fact, there was even a trend for slower RTs in deaf than participants overall). Furthermore, all participants showed RT benefits and costs, with respect to the neutral trials, when the target appeared at the cued or the uncued location, respectively. However, deaf participants paid less cost than hearing controls when responding to targets at the uncued locations under the foveal load condition. Parasnis and Samar (1985) interpreted this finding as evidence of more efficient “redirecting of attention from one part of the visual field to another in the presence of interfering foveal stimulation,” and concluded that “developmental experience involving a visual–spatial language and/or a predominantly visual (as contrasted with visual plus auditory) perception of the world leads to selective and ecologically useful alterations in attentional control of perceptual processes” (Parasnis and Samar 1985, p. 321). The results and conclusions of the classic study by Parasnis and Samar (1985) created the basis for the widespread notion that attention reorienting is more efficient in deaf than hearing individuals.
432
The Neural Bases of Multisensory Processes
However, two further contributions that also examined simple detection of visual stimuli in the presence of attentional cues suggest a more complex framework. Colmenero et al. (2004) asked 17 deaf (all signers with prelingually deafness) and 27 hearing adults to press a key whenever an “O” appeared on the computer screen. The target appeared for 150 ms, at 20° of eccentricity to the left or the right of central fixation, and was preceded by a vertical mark delivered at the exact target location (valid condition, 53% of the trials), on the opposite side with respect to the target (invalid condition, 13% of the trials) or on both sides (neutral condition, 33% of the trials). Stimulus onset asynchrony (SOA) between cue and target ranged between 125 and 250 ms. Note that the use of peripheral informative cues in this paradigm inevitably mixed exogenous and endogenous cueing of attention within the same task. Deaf participants were faster than hearing control at detecting the target (43 ms on average). Furthermore, the analysis of RT costs and benefits, for invalid and valid cues, respectively, revealed that both attentional effects were larger in hearing than deaf participants. In a second experiment, Colmenero and colleagues (2004) examined whether performance in the two groups differed when the SOA between the lateralized cue and the target was extended to 350 or 850 ms. With such long SOAs, hearing individuals typically show a cost at detecting targets occurring at the cued location, which is interpreted as inhibition to reexplore locations where attention has been previously oriented [i.e., inhibition of return (IOR); Klein 2000)]. The results of this second experiment revealed less enduring IOR in deaf than in hearing participants, again suggesting a different role of attention orienting in the hearing-deprived population. Chen et al. (2006) asked 16 congenitally deaf and 22 hearing adults to detect the occasional appearance of a dot, presented at perifoveal locations (3°; see also Section 22.2.4 for a full description of the design of this study). The dot appeared with equal probability to the right or to the left of fixation and was preceded by a valid or invalid exogenous cues. As in the study of Colmenero et al. (2004), the SOA between the lateralized cue and the target was in the typical range for IOR (i.e., 900 ms). Although IOR effects were again observed, these did not differ between the two groups. However, the results revealed that detection of perifoveal targets was systematically faster in deaf than in hearing participants (59 ms on average) regardless of the attention condition (i.e., valid or invalid; Chen et al. 2006, Experiment 1). In sum, two relevant aspects emerge from the studies that adopted an operational definition of better visual performance in deaf individuals in terms of enhanced reactivity to the stimulus. First, all reports (with the sole exception of the speeded lateralization study by Parasnis and Samar 1985) documented a response speed advantage in deaf than hearing individuals. Figure 22.1 summarizes this result graphically, by plotting the percentage difference in RTs between hearing and deaf participants with respect to the mean RT of the hearing group, in the different studies and as a function of stimulus eccentricity. With the sole exception of point [3] corresponding to the study by Parasnis and Samar (1985), all data points are above zero, indicating that deaf participants were faster than the hearing controls (on average, 13% faster with respect to the hearing group; see legend to Figure 22.1 for exact RT differences in milliseconds). Importantly, this response advantage in deaf participants emerged regardless of whether the target appeared directly at fixation or at locations further toward the periphery. This supranormal performance of deaf individuals in terms of response speed was also uninfluenced by the preceding attention cueing condition (e.g., Colmenero et al. 2004; Chen et al. 2006). The second relevant aspect concerns the effect of attentional instructions on the performance of deaf people. Deaf participants can benefit from valid cueing of spatial selective attention (Parasnis and Samar 1985), but at the same time there is evidence that their performance may be less susceptible to invalid attention orienting (e.g., Parasnis and Samar 1985; Colmenero et al. 2004) or IOR (Colmenero et al. 2004; but see Chen et al. 2006) than hearing controls.
22.2.3 Visual Search Tasks One further operational definition of better visual ability in deaf individuals has been in the terms of faster search times when a prespecified target has to be found among distractors. In the visual
433
Visual Abilities in Individuals with Profound Deafness
[1] [2]
[6]
[7]
[6]
[7]
50%
[5]
[2]
20
25
25%
Hearing are faster
[4] [1]
0
5
10
15
[3]
30
0%
–25%
Visual eccentricity
Percentage difference with respect to the mean RT of the hearing group
Deaf are faster
Simple detection or localization tasks
–50%
FIGURE 22.1 Difference in RT between hearing and deaf individuals (expressed as a percentage of mean RT of hearing group) across different studies, as a function of target eccentricity (in degrees). Multiple data points from the same study (e.g., see point [2]) refer to targets at different eccentricities. Positive values on Y-axis indicate faster response time in deaf than in hearing controls. Foveal (up to 1.5°), perifoveal (from 1.5° to 5°), and peripheral eccentricities (beyond 5°) are indicated in plot by shadings of different hues of gray. However, note that only boundaries of foveal visual field are clearly specified by anatomical landmarks; thus, the distinction between perifoveal and peripheral regions is instead conventional (we adopted here the distinction proposed by Reynolds 1993; see Section 22.1.2). Actual RT difference are as follows: [1] Reynolds (1993): 70 ms at 0°, 56 ms at 4°; [2] Loke and Song (1991): 38 ms at 0.5°, 85 ms at 25°; [3] Parasnis and Samar (1985): −58 ms at 2.2°; [4] Chen et al. (2006): 59 ms at 3°; [5] Colmenero et al. (2004): 43 ms at 20°; [6] Bottari et al. (in preparation): 52 ms at 3°, 59 ms at 8°; [7] Bottari et al. (2010): 54 ms at 3°, 59 ms at 8°.
perception literature, visual search tasks have classically been employed to distinguish perceptual processes requiring attention from perceptual processes occurring preattentively. When response time is unaffected by the number of distractors in the array, the search is typically described as preattentive (i.e., it does not require attention shift to the target in order to produce the response). By contrast, when response time increases as a function of the number of distractors in the array, the search is assumed to require serial attention shifts to the various items (Treisman 1982). Henderson and Henderson (1973) were the first to compare the abilities of deaf and hearing children (12.5 to 16.5 years old) in a visual search task that required searching for a target letter in a letter array containing capital and lowercase letters. Although they found that the two groups did not differ in the visual search task, it should be noted that the high similarity between the target and the distractors inevitably determined a serial search in both groups. Several years later, Stivalet and colleagues (1998) also adopted a visual search task to examine visual processing in congenitally deaf and hearing adults. Unlike Henderson and Henderson (1973), they manipulated the complexity of the search by asking participants to detect the presence or absence of a Q among O’s (easier search, because the target contains a single identifying feature) or of an O among Q’s (harder search, because the target is lacking a feature with respect to the distractors). Moreover, to obtain a measure of visual processing time, which could be separate from the time required for motor program retrieval and response initiation/execution, all stimuli were masked after a variable interval and the dependent variable was the duration of the interval between stimuli and mask sufficient to reach 90% correct. Notably, all stimuli were presented within the perifoveal region, at an eccentricity ranging between 4.1° and 4.9°. When searching for Q among Os (easier search), both groups performed a parallel search that was unaffected by the number of distractors (4, 10, or 16). By contrast, when searching for an O among Qs (harder search), deaf adults proved
434
The Neural Bases of Multisensory Processes
substantially more efficient than hearing controls, with their visual search time (9 ms/letter) falling within the range of parallel processing (Enns and Rensink 1991), unlike hearing participants (22 ms/letter). Further evidence along the same direction came from a visual search study by Rettenbach and colleagues (1999). They tested eight deaf and eight hearing adults, in a series of visual search task of different complexity. Unlike the study by Stivalet and colleagues (1998), the stimuli covered a wide visual angle, both vertically (20°) and horizontally (26°), thus spanning from central to peripheral locations. The results revealed more efficient visual search in deaf than hearing adults. Interestingly, when the same study was repeated in children and adolescents, deaf participants systematically underperformed with respect to the age-matched hearing controls (see also Marendaz et al. 1997), suggesting a potential developmental trajectory in the development of different visual search abilities in deaf individuals. In sum, the studies that evaluated visual search abilities in deaf and hearing controls indicate that the range for parallel processing is ampler in deaf than hearing controls (Stivalet et al. 1998; Rettenbach et al. 1999). Furthermore, this enhanced visual ability appears to be independent of the spatial location of the stimuli, as it emerged for perifoveal (Stivalet et al. 1998) as well as peripheral stimuli (Rettenbach et al. 1999). However, the reconciliation of visual search findings with the observation of less susceptibility of deaf participants to invalid cueing or IOR (e.g., Parasnis and Samar 1985; Colmenero et al. 2004) is not straightforward. As we shall discuss later (see Section 22.3.3), assuming that both visual search and cueing effects can be accounted for by faster reorienting of attention implies a description of better visual search in deaf individuals in terms of faster and more efficient movement of the attention spotlight in space. This interpretation, however, is at odds with the description of better search as being the result of preattentive processing.
22.2.4 Visual Discrimination and Identification Tasks One common aspect between the simple detection tasks described in Section 22.2.2 and the easy visual search tasks described in Section 22.2.3 (e.g., easy search of a Q among O’s), is that both these tasks can in principle be performed without attention shifts (i.e., under distributed attention; e.g., see Bravo and Nakayama 1992; Sagi and Julesz 1984). Instead, shifts of spatial attention are certainly required to perform complex visual search tasks or to perform visual discrimination tasks. Discrimination or identification of a visual target requires binding of the multiple target features, and therefore inevitably rely on selective attention processing (e.g., Turatto et al. 2007). Furthermore, discriminating one stimulus from another implies some sort of perceptual matching with a template held in working memory. In this respect, adopting discrimination and identification task for the study of visual abilities in deaf individuals clearly implies taking a step forward in the examination of visual cognition in this auditory deprived population. Early studies on visual discrimination in deaf individuals assessed the ability of this population in discriminating colors or complex shapes. For instance, Heider and Heider (1940) tested prelingually deaf and hearing children in a color sorting task, in which participants had to select a range of hues that could match a given standard color. Performance in the two groups was comparable, and in fact deaf children selected a wider range of hues compared to hearing children. Similarly, Suchman (1966) compared the ability of deaf and hearing individuals in an oddity discrimination task, which required the identification of an odd stimulus among other items. When the odd stimulus differed in color (5% white increase or decrease in hues), deaf participants had higher accuracy scores than hearing controls. By contrast, when the odd stimulus differed in shape (4° of internal angle with respect to the other simple shapes) hearing controls discriminated better than deaf participants. Hartung (1970) used tachistoscopic presentation to show prelingually deaf and hearing children a series of English or Greek trigrams. The task was to determine if a particular letter appeared in each trigram and to reproduce the English trigram. Although deaf performed worse than hearing children with the English trigrams, no discrimination difference emerged with the
Visual Abilities in Individuals with Profound Deafness
435
unfamiliar Greek trigrams, suggesting that any discrimination difference between groups reflected linguistic rather than perceptual difficulties. A seminal works that adopted a visual discrimination task was conducted by Neville and Lawson in 1987. In that study, behavioral and ERP responses were recorded while 12 congenitally deaf adults (all signers, with at least one deaf parent) and 12 aged-match hearing controls performed a discrimination of direction of motion for suprathreshold visual stimuli. Visual stimuli were white squares presented at central (just above fixation) or peripheral locations (18° to the right or to the left of central fixation), with an ISI from trial onset ranging randomly between 280 and 480 ms. On 80% of the trials (termed “standards”), a single square appeared at one of these predetermined locations for 33 ms. On the remaining 20% of the trials (termed “deviants”), the square jumped slightly to one of eight possible immediately adjacent locations after the first 33 ms. The participant’s task consisted in discriminating the direction of this moving square in deviant trials. Importantly, although participants fixated centrally throughout the experimental session, they were also requested to orient their attention to one of the three possible target locations (center, left, or right) across blocks. In terms of behavioral performance, deaf were faster than hearing controls (on average 70 ms) at discriminating moving targets at the peripheral locations; by contrast, no between-group difference in RT emerged for targets occurring at central locations. Instead, the two groups performed comparably in terms of sensitivity (d′): although hearing individuals showed better discrimination ability in RVF than LVF, deaf participants showed the opposite pattern. In terms of EEG response, three main findings were reported. First, the visual evoked component, termed P1 (i.e., positivity peaking at about 100 ms after the stimulus presentation), was comparable between groups regardless of whether the stimulus was standard or deviant, and regardless of stimulus location and attention condition. Second, a larger amplitude in the N1 component emerged in deaf than in hearing controls, when standard or deviant targets appeared at attended peripheral locations. These greater increases in cortical response due to attentional engagement in deaf than hearing controls were recorded over the occipital electrodes and in the left parietal and temporal regions. Third, the overall amplitude of N1 was larger over the right than left hemisphere in hearing controls, but larger over the left than right hemisphere in deaf individuals. VEPs in response to central standards and targets were instead comparable between groups. In summary, the result of the study by Neville and Lawson (1987) suggested that deaf can outperform hearing individuals in terms of reactivity (but not sensitivity) when discriminating the direction of motion for targets presented at peripheral locations. In addition, because VEP differences emerged in response to both static and moving stimuli (i.e., standard and targets, respectively) specifically in the condition of attentional engagement to peripheral locations, Neville and Lawson (1987) concluded that deafness modulates the neural system that mediates spatial attention. However, later empirical evidence has shown that a similar N1 modulation can be also documented for target monitored in distributed attention (Armstrong et al. 2002), thus challenging the conclusion that differences between deaf and hearing controls emerge selectively under conditions of focused attention. Another study that evaluated discrimination performance in deaf and hearing participants adopting moving stimuli was conducted by Bosworth and Dobkins (2002a; see also Bosworth and Dobkins 2002b). These authors evaluated 16 profoundly deaf signers (12 congenital), 10 hearing signers, and 15 hearing nonsigners in a direction-of-motion discrimination task. Participants were required to discriminate the direction of motion of coherent moving dots presented among random moving dots, within a single or multiple displays appearing in one or all the quadrants of the monitor. The coherent motion threshold for each participant was the number of coherently moving dots that yielded 75% correct discriminations. In addition to the number of presented displays, two other conditions were manipulated: the presence or absence of endogenous cueing (a 100% predictive spatial cue, delivered before display presentation) and stimulus duration (200 or 600 ms). Results showed no overall better performance in deaf than hearing participants when discriminating direction of motion. Intriguingly, deaf individuals tended to be faster yet less accurate than the other groups, suggesting a possible speed–accuracy trade-off in deaf but not hearing participants. The
436
The Neural Bases of Multisensory Processes
analyses also revealed that direction-of-motion thresholds were less affected by cueing of attention in deaf individuals than in hearing controls (regardless of signing abilities). Furthermore, when the stimuli lasted for 600 ms, performance for the deaf group paradoxically improved with multiple rather than single displays, unlike hearing participants. Both these findings may indicate better capture of attention by a discontinuity in a complex visual scene in deaf than hearing participants, given enough time for the perceptual analysis. Finally, in a recent study conducted in our laboratory (Bottari et al. 2010), we asked 11 congenital or early deaf and 11 hearing controls to perform a speeded shape discrimination for visual targets presented at one of eight possible locations (at 3° or 8° from central fixation). Targets were open circles lasting for 48 ms and participants were required to discriminate whether the circle was open on the left or on the right side. The result of this study showed comparable performance between deaf and hearing individuals in terms of the RT measure, even if deaf participants showed numerically faster RTs. Interestingly, deaf individuals performed worse than hearing controls in terms of accuracy, suggesting different speed–accuracy trade-off in the deaf group (see also Bosworth and Dobkins 2002a). In sum, the tasks requiring perceptual discrimination for suprathreshold stimuli did not provide consistent evidence in support of the notion of enhanced abilities in deaf than in hearing controls. When adopting static stimuli, better accuracy in deaf individuals compared to hearing controls have been documented only for discrimination of colour changes (Suchman 1966). Instead, the studies that required shape discrimination for static visual events failed to show any enhanced abilities in deaf individuals (Hartung 1970; Bottari et al. 2010). When adopting moving stimuli, faster RTs in deaf subjects than in hearing participants have been documented only by Neville and Lawson (1987), selectively for events at peripheral locations. Instead, Bosworth and Dobkins (2002a) showed an overall comparable performance between deaf and hearing controls when discriminating coherence of motion. 22.2.4.1 Visual Discrimination with Flanker Interference A series of experiments adopting discrimination or identification tasks also evaluated the effect of entirely task-irrelevant competing stimuli on discrimination performance. The main rationale underlying this manipulation is that any bias for processing peripheral events more than central ones in the deaf population should emerge as larger interference effects of peripheral distracting information on central targets (or, conversely, as smaller interference effects of central distractors on peripheral targets). One of the first examples of this experimental paradigm is the study by Reynolds (1993). In addition to the speeded lateralization task already described in Section 22.2.2, deaf and hearing participants were required to identify the figures that appeared 4° to the left or right of central fixation. Target figures were presented alone or together with concurrent stimuli delivered at fixation (simple shapes, outline drawings of familiar objects or letters). Overall, no recognition accuracy advantage emerged in deaf than in hearing participants (62% vs. 58% correct). The only difference between deaf and hearing controls emerged with respect to hemifield of stimulus presentation. Deaf participants showed an LVF advantage in identification accuracy when concurrent stimuli at fixation were absent or were simple shapes, and an RVF advantage when concurrent stimuli at fixation consisted of drawings or letter stimuli. The reversed pattern of results emerged in hearing controls. One influential study that also examined identification with concurrent distractors at central and peripheral locations has been conducted several years later by Proksch and Bavelier (2002). In three experiments, they tested deaf students (all congenital and signers) and hearing controls (including a group of participants born from deaf parents, who learned sign language in infancy) in a speeded shape identification task. The target shape (square or diamond) was presented inside one of six circular frames, arranged around fixation in a ring of 2.1° of radius. In each trial, a distracting shape was presented concurrently with the target, either in the center of the screen (0.5° to the left or right of fixation) or outside the target ring (4.2° to the left or right of fixation). The distractor was an item from the target set, either compatible (e.g., target: diamond; distractor: diamond) or incompatible
Visual Abilities in Individuals with Profound Deafness
437
(e.g., target: diamond; distractor: square), or else a neutral shape. Finally, a variable number (0, 1, 3, or 5) of filler shapes was introduced in the empty circular frames of the target ring to manipulate perceptual load across trials. Participants were instructed to identify the target as quickly as possible, while ignoring all other distracting shapes. Overall, target identification proved longer for deaf than hearing participants (Experiment 1: 765 vs. 824 ms, respectively; Experiment 3: 703 vs. 814 ms). All experiments consistently revealed the interfering effect of perceptual load and lateralized distractors on RT performance. Critically, however, peripheral distractors proved more distracting for deaf individuals, whereas central ones were more distracting for hearing controls (regardless of whether they were signers). This led Proksch and Bavelier (2002) to conclude that “the spatial distribution of visual attention is biased toward the peripheral field after early auditory deprivation” (p. 699). A related study was conducted by Sladen and colleagues (2005), using the classic flanker interference task developed by Eriksen and Eriksen (1974). Ten early deaf (onset before 2 years of age, all signers) and 10 hearing adults were asked a speeded identification of a letter (H or N) presented either in isolation (baseline) or surrounded by four response-compatible letters (two on either side; e.g., HHHHH) or response-incompatible letters (e.g., NNHNN). Letters were presented 0.05°, 1°, or 3° apart from each other. The results showed that letter discrimination was faster in hearing than in deaf participants in each of the experimental conditions including the baseline (e.g., between 50 and 81 ms difference, for incompatible stimuli), but this was accompanied by more errors in the hearing group during incompatible trials. Interestingly, the two groups also differed in their performance with the 1° spacing between target and flankers: the incongruent flanker cost emerged for both groups, but was larger in deaf than in hearing participants. Again, this finding is compatible with the notion that deaf individuals may have learned to “focus their visual attention in front of them in addition to keeping visual resources allocated further out in the periphery” (Sladen et al. 2005, p. 1536). The study by Chen et al. (2006), described in Section 22.2.2, also adopted a flanker interference paradigm. On each trial, participants were presented with a raw of three horizontally aligned boxes, of which the central one contained the target and the side ones (arranged 3° on either side) contained the distractors. The task required a speeded discrimination among four different colors. Two colors were mapped onto the same response button, whereas the other two colors were mapped onto a different response button. Simultaneous to target presentation, a flanker appeared in one of the lateral boxes. The flanker was either identical to the target (thus leading to no perceptual conflict and no motor response conflict), or different in color with respect to the target but mapped onto the same motor response (thus leading only to a perceptual conflict) or different in color with respect to the target and mapped onto a different response than the target (thus leading to perceptual and motor conflict). Finally, spatial attention to the flanker was modulated exogenously by changing the thickness and brightness of one of the lateral boxes at the beginning of each trial. Because the time interval between this lateralized cue and the target was 900 ms, this attentional manipulation created an IOR effect (see also Colmenero et al. 2004). Overall, color discrimination was comparable between groups in terms of reaction times (see also Heider and Heider 1940). However, the interference determined by the flankers emerged at different levels (perceptual vs. motor response) in deaf and hearing participants, regardless of the cueing condition. Hearing participants displayed a flanker interference effect both for flankers interacting at the perceptual and response levels. In contrast, deaf participants showed flanker interference effects at the response level, but not at the perceptual level. Finally, Dye et al. (2007) asked 17 congenitally deaf and 16 hearing adults to perform a speeded discrimination about the direction of a central arrow (pointing left or right) presented 1.5° above or below central fixation and flanked by peripheral distractors (other arrows with congruent or incongruent pointing directions, or neutral lines without arrowheads). A cue consisting of one or two asterisks presented 400 ms before the onset of the arrows oriented attention to central fixation, to the exact upcoming arrow location, or to both potential arrow locations (thus alerting for stimulus
438
The Neural Bases of Multisensory Processes
appearance without indicating the exact target location). The findings showed comparable effects of orienting spatial cues in hearing and deaf individuals, as well as comparable alerting benefits. Interestingly, when the number of flanker arrows was reduced to 2 and their relative distance from the central arrow was increased to 1°, 2°, or 3° of visual angle, deaf participants displayed stronger flanker interference effects in RTs compared to hearing controls. In sum, the studies that measured allocation of attentional resources in the visual scene using flanker interference tasks showed larger interference from distractors in deaf than in hearing participants (Proksch and Bavelier 2002; Sladen et al. 2005; Chen et al. 2006; Dye et al. 2007). However, although Proksch and Bavelier (2002) showed enhanced distractor processing in deaf than in hearing adults at 4.2° from central fixation, Sladen et al. (2005) showed enhanced distractor processing at 1° from central fixation, but comparable distractor processing at 3°. Finally, Dye et al. (2007) showed increased flanker interference in deaf than in hearing controls regardless of whether the two distracting items were located at 1°, 2°, or 3° from fixation. These mixed results suggest that some characteristics of the visual scene and task, other than just the peripheral location of the distractors, could play a role. These additional characteristics might include the degree of perceptual load, the amount of crowding, or the relative magnification of the stimuli.
22.2.5 Visual Tasks of Higher Complexity Beyond visual discrimination or identification tasks, our attempt to relate single experimental paradigms with single operational definitions of better visual abilities in deaf individuals becomes inevitably more complex. For instance, the visual enumeration test and the Multiple Object Tracking test recently adopted by Hauser and colleagues (2007), the change detection task adopted by our group (Bottari et al. 2008, in preparation), or the studies on speech-reading ability of deaf individuals (e.g., Bernstein et al. 2000; Mohammed et al. 2005) can hardly be reduced to single aspects of visual processing. Nonetheless, we report these studies in detail because they are informative about the selectivity of the performance enhancements observed in the deaf population. Hauser and colleagues (2007) evaluated 11 congenital deaf and 11 hearing controls in an enumeration task, asking participants to report on a keyboard the number of briefly flashed static targets in a display, as quickly and accurately as possible. The task was either conducted with a field of view restricted to 5° around fixation or with a wider field of view of 20° around fixation. In such enumeration tasks, participants typically display a bilinear performance function, with fast and accurate performance with few items (subitizing range), and a substantially greater cost in terms of reaction times and accuracy as the number of items increase. The results of Hauser et al. (2007) showed comparable subitizing performance in deaf and hearing individuals, regardless of which portion of visual field was evaluated. A second experiment conducted on 14 congenital deaf and 12 hearing control, adapted the Multiple Object Tracking test (Pylyshyn 1989). In this task, participants are presented with a number of moving dots of which a subset is initially cued. When the cues disappear, participants are required to keep track of the dots that were initially cued until one of the dots in the set is highlighted. Participants have to indicate whether such dot was also cued at the beginning of the trial. Despite that this task was performed on a wide field of view, to maximize the possibility any difference between deaf and hearing participants, no sensitivity difference emerged. The authors concluded that “early deafness does not enhance the ability to deploy visual attention to several different objects at once, to dynamically update information in memory as these objects move through space, and to ignore irrelevant distractors during such tracking” (Hauser et al. 2007, p. 183). Two studies from our group evaluated the ability of deaf and hearing individuals to discriminate between the presence or absence of a change in a visual scene (Bottari et al. 2008, in preparation). In these studies, two visual scenes were presented one after the other in each experimental trial, separated by an entirely blank display. Each visual scene comprised four or eight line-drawing images, half of which were arranged at 3° from central fixation and the other half were arranged
Visual Abilities in Individuals with Profound Deafness
439
at 8°. On 50% of the trials, the second scene was entirely identical to the first (i.e., no change occurred), whereas on the other 50% of the trials one drawing in the first scene changed into a different one in the second scene. The participant’s task was to detect whether the change was present or absent. When comparing two alternating visual scenes, any change is typically detected without effort because it constitutes a local transient that readily attracts exogenous attention to the location where the change has occurred (O’Regan et al. 1999, 2000; Turatto and Brigeman 2005). However, if a blank image is interposed between the two alternating scenes (as in the adopted paradigm), any single part of the new scene changes with respect to the previous blank image, resulting in a global rather than local transient. The consequence of this manipulation is that attention is no longer exogenously captured to the location of change, and the change is noticed only through a strategic (endogenous) scan of the visual scene (the so-called “change blindness” effect; Rensink 2001). Thus, the peculiarity of this design was the fact that all local transients related to target change or target onset were entirely removed. This produced an entirely endogenous experimental setting, which had never been adopted in previous visual tasks with deaf people (see Bottari et al. 2008 for further discussion of this point). The result of two studies (Bottari et al. 2008, in preparation) revealed that sensitivity to the change in deaf and hearing adults was comparable, regardless of change in location (center or periphery), suggesting that the sensitivity to changes in an entirely transient-free context is not modulated by deafness. Furthermore, this conclusion was confirmed also when the direction of endogenous attention was systematically manipulated between blocks by asking participants to either focus attention to specific regions of the visual field (at 3° or 8°) or to distribute spatial attention across to the whole visual scene (Bottari et al. 2008). In sum, even visual tasks tapping on multiple stages of nonlinguistic visual processing (and particularly visual working memory) do not reveal enhanced processing in deaf than in hearing controls. Once again, the absence of supranormal performance was documented regardless of the eccentricity of the visual stimulation. Furthermore, the results of Bottari et al. (2008) indicate that focusing endogenous attention is not sufficient to determine a between-group difference. It remains to be ascertained whether the latter results (which is at odds with the behavioral observation of Neville and Lawson 1987 and with the neural observation of Bavelier et al. 2000) might be the consequence of having removed from the scene all target-related transients that could exogenously capture the participant’s attention. A different class of complex visual tasks in which deaf individuals were compared to hearing controls evaluated speech-reading abilities (also termed lip-reading). Initial studies on speech-reading suggested that this ability was considerably limited in hearing controls (30% words or fewer correct in sentences according to Rönnberg 1995) and that “the best totally deaf and hearing-impaired subject often perform only as well as the best subjects with normal hearing” (Summerfield 1991, p. 123; see also Rönnberg 1995). However, two later contributions challenged this view and clearly showed that deaf individuals can outperform hearing controls in speech-reading tasks. Bernstein et al. (2001) asked 72 deaf individuals and 96 hearing controls to identify consonant–vowel nonsense syllables, isolated monosyllabic words and sentences presented through silent video recordings of a speaker. The results showed that deaf individuals were more accurate than hearing controls, regardless of the type of the verbal material. In agreement with this conclusion, Auer and Bernstein (2007) showed a similar pattern of results in a study that evaluated identification of visually presented sentences in an even larger samples of deaf individuals and hearing controls (112 and 220, respectively). It is important to note that both studies did not include deaf individuals who used sign language as preferential communication mode, thus relating these enhanced lip-reading skills to the extensive training that deaf individuals had throughout their lives. For the purpose of the present review, it is important to note that speechreading is a competence that links linguistic and nonlinguistic abilities. Mohammed and colleagues (2005) replicated the observation that deaf individuals outperform hearing controls in lip-reading skills. Furthermore, they showed that the lip-reading performance of deaf individual (but not hearing controls) correlated with the performance obtained in a classical coherence motion test (see also Bosworth and Dobkins 1999; Finney and Dobkins 2001), despite that the overall visual motion thresholds were
440
The Neural Bases of Multisensory Processes
entirely comparable between the two groups (in agreement with what we reported in Section 22.2.1). In sum, lip-reading is a visual skill that systematically resulted enhanced in deaf individuals compared to hearing controls. Intriguingly, in deaf individuals this skill appears to be strongly interconnected with the ability of perceiving motion in general, supporting the notion that visual motion perception has a special role in this sensory-deprived population.
22.3 A TRANSVERSAL VIEW ON LITERATURE The first aim of the present review was to provide a detailed report on the empirical evidence about visual abilities in profound deafness, organized as a function of task. This served the purpose of highlighting the different operational definitions of “better visual ability” adopted in the literature and examined the consistency of the findings across tasks. The second aim was to evaluate to what extent four distinct aspects, which are transversal to the different behavioral tasks, can contribute to the understanding of the heterogeneity of the empirical findings. In particular, the aspects considered were: (1) diversity in deaf individuals sample characteristics; (2) visual characteristics of the target stimulus; (3) target eccentricity; and (4) the role of selective spatial attention.
22.3.1 Enhanced Reactivity Rather than Enhanced Perceptual Processing One aspect that clearly emerges from our task-based review of the literature is that the operational definitions of better visual abilities in deaf individuals in terms of enhanced perceptual processing of the visual stimulus do not reveal systematic differences between deaf and hearing controls. This conclusion is particularly clear in all those studies that examined perceptual processing for stimuli at or near threshold (see Section 22.2.1), but it is also confirmed by studies that required discrimination or identification for stimuli above thresholds (see Section 22.2.4) and by studies that also took the role of visual working memory into account (see Section 22.2.5). In the case of discrimination and identification tasks, only one report has shown a behavioral discrimination advantage for deaf than hearing controls (e.g., see the RT difference for stimuli at peripheral locations in the work of Neville and Lawson 1987), whereas in all the remaining studies a between-group difference emerged only in the way attention instructions or flankers impacted on the performance of deaf and hearing participants, but not in terms of overall processing advantage for the deaf group. In striking contrast with this pattern of results, almost all studies adopting simple detection or lateralization tasks have shown a reactivity advantage (occurring in a range between 38 and 85 ms) in deaf over hearing participants. Furthermore, when these studies are considered, collectively enhanced reactivity in deaf participants do not appear to be modulated by stimulus eccentricity in any obvious way (see Figure 22.1). Finally, although attentional manipulations did impact on simple detection performance (e.g., Colmenero et al. 2004; Chen et al. 2006), the between-group difference did not emerge selectively as a function of the attentional condition. The observation that better visual abilities in deaf individuals emerge mainly for tasks designed around speeded simple detection of the stimulus, rather than tasks designed around discrimination performance, suggests that profound deafness might not result in enhanced perceptual representation of visual events. Instead, any modification of visual processing in deaf individuals may occur at the level of visual processing speed, or at the level response selection/generation or at both these stages (for further discussion of this point, see Bottari et al. 2010). Prinzmetal and colleagues (2005, 2009) recently proposed that performance enhancement could either reflect perceptual channel enhancement or perceptual channel selection. Although channel enhancement would result in better sensory representation of the perceptual events, channel selection would only result in faster processing. We suggest that enhanced visual abilities in deaf individuals may reflect channel selection more than channel enhancement, and that enhanced reactivity may be the core aspect of the compensatory cross-modal plasticity occurring in this sensory-deprived population. In the context of the present review, it is also interesting to note that Prinzmetal and colleagues (2005, 2009) have
Visual Abilities in Individuals with Profound Deafness
441
associated channel enhancements with endogenous attention selection, but channel selection with exogenous attention capture (see also Section 22.3.3).
22.3.2 Role of Deaf Sample Characteristics and Visual Stimulus Characteristics Are Relevant but Not Critical Several investigators have suggested that the heterogeneity of the results observed in the literature on visual abilities in deaf individuals might reflect the diversities in the characteristics of deaf participants recruited across the different studies (e.g., Bavelier et al. 2006; Hoemann 1978). Although this perspective appears very likely, to the best of our knowledge systematic studies on the impact of critical variables, such as deafness onset (early or late) or preferred communication mode, on the deaf visual skill have not been conducted. Similarly, the exact role of amount of hearing loss and etiology of deafness remains to be ascertained. Our review indicates that the vast majority of investigations have tested congenital or early deaf participants, using primarily sign language. However, our review also challenges the idea that sample characteristics alone can account for the variability in the results. Even those studies that restricted the population to “deaf native signers” (Bavelier et al. 2006) did not find systematically better abilities in deaf than in hearing controls. For instance, Hauser and colleagues (2007) pointed out that the comparable performance between deaf and hearing controls in their visual enumeration and visual working memory tasks emerged despite the fact that the population of deaf native signers tested in the study was identical to that recruited in previous studies from the same research group that instead documented enhanced performance with respect to hearing controls (Hauser et al. 2007, p. 184). Specificity of the target stimulus characteristics is also unlikely to explain the heterogeneity of the findings. The hypothesis that motion stimuli are more effective than the static one in determining enhanced visual abilities in deaf individuals is, at the very least, controversial in light of the current review of the literature. Studies adopting perceptual threshold tasks consistently documented comparable performance between deaf and hearing participants regardless of whether the stimuli were static (as in Bross 1979a, 1979b) or moving (e.g., Bosworth and Dobkins 1999; Brozinsky and Bavelier 2004; but see Stevens and Neville 2006). Instead, in simple detection tasks, enhanced reactivity in deaf than in hearing participants have been documented primarily with static stimuli. Finally, using complex visual tasks tapping on working memory capacities, Hauser and colleagues (2007) showed comparable performance between deaf and hearing individuals regardless of whether stimuli were stationary (enumeration task) or moving (Multiple Object Tracking task). One piece of evidence that could support the notion that motion stimuli are more effective than static ones in eliciting differences between the two groups is the observation that discrimination for moving stimuli at the visual periphery (18°) is better for deaf than hearing participants (Neville and Lawson 1987), whereas discrimination for static stimuli also appearing toward the periphery (8°) is not (Bottari et al. 2010). However, the evident discrepancy in stimulus location between the two studies prevents any definite conclusion, which could only be obtained by running a direct comparison of deaf and hearing performance using stimuli differing in the motion/static dimension, while other variables are held fixed.
22.3.3 Role of Target Eccentricity and Selective Visual Attention Is Critical but Underspecified The present review supports the notion that representation of the visual periphery in the profoundly deaf might indeed be special. It is clearly more often the case that differences between the two groups emerged for stimuli delivered at peripheral than central locations (e.g., Loke and Song 1991; Bottari et al. 2010, in preparation; Neville and Lawson 1987). However, it is also clear that the central or peripheral location of the stimulus is not a definite predictor of whether deaf and hearing
442
The Neural Bases of Multisensory Processes
participants will differ in their performance. Better performance in deaf than in hearing participants has been documented with both central and peripheral stimuli (e.g., see Section 22.2.2). Conversely, threshold tasks proved ineffective in showing between-group differences, regardless of whether stimuli were delivered centrally or peripherally. Thus, the question of what exactly is special in the representation of peripheral stimuli in deaf individuals has not yet been resolved. One observation relevant to this problem may be the recent finding from our group that the differential processing for central and peripheral locations in deaf and hearing people emerge independently from orienting of attention. Bottari et al. (2010) showed no RT cost when processing peripheral than central items in deaf participants, unlike hearing controls. Importantly, this occurred in a task (simple detection) that requires no selective allocation of attentional resources (Bravo and Nakayama 1992). This implies a functional enhancement for peripheral portions of the visual field that cannot be reduced to the differential allocation of attentional resources alone (see also Stevens and Neville 2006 for related evidence). Because the cost for peripheral than central processing in hearing controls is classically attributed to the amount of visual neurons devoted to the analysis of central than peripheral portion of the visual field (e.g., Marzi and Di Stefano 1981; Chelazzi et al. 1988), it can be hypothesized that profound deafness can modify the relative proportion of neurons devoted to peripheral processing or their baseline activity. Note that assuming a different neural representation of the peripheral field also has implication for studies that examined the effects of peripheral flankers on central targets (e.g., Proksch and Bavelier 2002; Sladen et al. 2005), that is, it suggests that the larger interference from peripheral flankers in deaf individuals could at least partially result from enhanced sensory processing of these stimuli, rather than attentional bias to the periphery (similar to what would be obtained in hearing controls by simply changing the size or the saliency of the peripheral flanker). The final important aspect to consider is the role of selective attention in enhanced visual abilities of deaf individuals. Our review of the literature concurs with the general hypothesis that deafness somehow modulates selective visual attention (e.g., Parasnis 1983; Neville and Lawson 1987; Bavelier et al. 2006; Mitchell and Maslin 2007). However, it also indicates that any further development of this theoretical assumption requires a better definition of which aspects of selective attention are changed in this context of cross-modal plasticity. To date, even the basic distinction between exogenous and endogenous processes have largely been neglected. If this minimal distinction is applied, it appears that endogenous orienting alone does not necessarily lead to better behavioral performance in deaf than in hearing controls. This is, first of all, illustrated by the fact that endogenous cueing of spatial attention (e.g., using a central arrow, as Parasnis and Samar 1985 have done) can produce similar validity effects in deaf and hearing individuals. Furthermore, a recent study by Bottari et al. (2008), which examined endogenous orienting of attention in the absence of the exogenous captures induced by target onset, revealed no difference whatsoever between deaf and hearing participants, regardless of whether attention was focused to the center, focused to the periphery, or distributed across the entire visual scene. By contrast, several lines of evidence suggest that the exogenous component of selective attention may be more prominent in deaf than in hearing people. First, studies that have adapted the cue–target paradigm have shown more efficient detection in deaf than in hearing adults, when the target occurs in a location of the visual field that have been made unattended (i.e., invalid; see Parasnis and Samar 1985; Colmenero et al. 2004, Experiment 1; Bosworth and Dobkins 2002a). Second, paradigms that adopted an SOA between cue and target that can lead to IOR also revealed that deaf participants are less susceptible to this attention manipulation and respond more efficiently to targets appearing at the supposed inhibited location with respect to controls (e.g., Colmenero et al. 2004, Experiment 2). Finally, deaf participants appear to be more distracted than hearing controls by lateralized flankers that compete with a (relatively) more central target (Dye et al. 2008; Proksch and Bavelier 2002; Sladen et al. 2005), as if the flanker onset in the periphery of the visual field can capture exogenous attention more easily. In the literature on visual attention in deaf individuals, the latter three findings have been interpreted within the spotlight metaphor for selective attention (Posner 1980), assuming faster shifts of
Visual Abilities in Individuals with Profound Deafness
443
visual attention (i.e., faster reorienting) in deaf than in hearing participants. However, this is not the only way in which attention can be conceptualized. A well-known alternative to the spotlight metaphor of attention is the so-called gradient metaphor (Downing and Pinker 1985), which assumes a peak of processing resources at the location selected (as a result of bottom-up or top-down signals) as well as a gradual decrease of processing resources as the distance from the selected location increases. Within this alternative perspective, the different performance in deaf participants during the attention tasks (i.e., enhanced response to targets at the invalid locations, or more interference from lateralized flankers) could reflect a less steep gradient of processing resources in the profoundly deaf. Although it is premature to conclude in favor of one or the other metaphor of selective attention, we believe it is important to consider the implications of assuming one instead of the other. For instance, the gradient metaphor could provide a more neurally plausible model of selective attention. If one assumes that reciprocal patterns of facilitation and inhibition in the visual cortex can lead to the emergence of a saliency map that can contribute to the early filtering of bottom-up inputs (e.g., Li 2002), the different distribution of exogenous selective attention in deaf individuals could represent a modulation occurring at the level of this early saliency map. Furthermore, assuming a gradient hypothesis may perhaps better reconcile the results obtained in the studies that adopted the cue–target and flanker paradigms in deaf individuals, with the results showing more efficient visual search pattern in this population. Within the gradient perspective, better visual search for simple features or faster detection of targets at invalidly cued locations could both relate to more resources for preattentive detection of discontinuities in deaf individuals.
22.4 CONCLUSIONS AND FUTURE DIRECTIONS When taken collectively, the past 30 years of research on visual cognition in deaf individuals may, at first sight, appear heterogeneous. However, our systematic attempt to distinguish between the different operational definitions of “better visual abilities” in deaf individuals proved useful in revealing at least some of the existing regularities in this literature and specify under which context the compensatory hypothesis is consistently supported. First, the remarkable convergence of findings in the studies that adopted simple detection tasks and the mixed findings of the studies that adopted discrimination paradigms (either for near-threshold or suprathreshold stimuli), suggests that enhanced visual abilities in deaf individuals might be best conceptualized as enhanced reactivity to visual events, rather than enhanced perceptual representations. In other words, deaf individuals “do not see better,” but react faster to the stimuli in the environment. If this conclusion is true, reactivity measures may prove more informative than accuracy reports when comparing deaf and hearing controls, even when discrimination tasks are adopted. This raises the issue of which may be the neural basis for enhanced reactivity in deaf individuals and at which processing stage it may emerge (i.e., perceptual processing, response preparation/ execution, or both). In addition, it raises the question of which functional role enhanced reactivity may play in real life. In this respect, the multisensory perspective that we have introduced at the beginning of this chapter may be of great use for understanding the ecological relevance of this phenomenon. If audition constitutes a fundamental guidance for reorienting our exploratory behavior and it is a dedicated system for detecting and reacting to discontinuities, one could hypothesize that faster reactivity to the visual events in deaf individuals may primarily serve the purpose of triggering orienting responses. Because all the evidence we have reviewed in this chapter originated from paradigms in which overt orienting was completely prevented, this question remains open for future research. The second consistency that emerged from our review concerns the modulation that profound deafness determines the representation of peripheral visual space and visual attention. Although a number of evidence in the literature converges in supporting this conclusion, the challenge for future research is the better specification of the operational description of both these concepts. Without such an effort, the concepts of enhanced peripheral processing and enhanced visual attention are
444
The Neural Bases of Multisensory Processes
at risk of remaining tautological redefinitions of the empirical findings. As discussed above for the example of selective attention, even a minimal description of which aspects of selective attention may be changed by profound deafness or a basic discussion about of the theoretical assumptions underlying the notion of selective attention can already contribute to the generation of novel predictions for empirical research.
ACKNOWLEDGMENTS We thank two anonymous reviewers for helpful comments and suggestions on an earlier version of this manuscript. We are also grateful to Elena Nava for helpful comments and discussion. This work was supported by a PRIN 2006 grant (Prot. 2006118540_004) from MIUR (Italy), a grant from Comune di Rovereto (Italy), and a PAT-CRS grant from University of Trento (Italy).
REFERENCES Armstrong, B., S. A. Hillyard, H. J. Neville, and T. V. Mitchell. 2002. Auditory deprivation affects processing of motion, but not colour. Brain Research Cognitive Brain Research 14: 422–434. Auer, E. T., Jr., and L. E. Bernstein. 2007. Enhanced visual speech perception in individuals with early-onset hearing impairment. Journal of Speech Language and Hearing Research 50(5):1157–1165. Bavelier, D., C. Brozinsky, A. Tomman, T. Mitchell, H. Neville, and G. H. Liu. 2001. Impact of early deafness and early exposure to sign language on the cerebral organization for motion processing. Journal of Neuroscience 21: 8931–8942. Bavelier, D., M. W. G. Dye, and P. C. Hauser. 2006. Do deaf individuals see better? Trends in Cognitive Science 10: 512–518. Bavelier, D., A. Tomann, C. Hutton, T. V. Mitchell, D. P. Corina, G. Liu, and H. J. Neville. 2000. Visual attention to the periphery is enhanced in congenitally deaf individuals. Journal of Neuroscience 20: 1–6. Bernstein, L. E., M. E. Demorest, and P. E. Tucker. 2000. Speech perception without hearing. Perception & Psychophysics 62: 233–252. Bernstein, L. E., E. T. Auer Jr., and P. E. Tucker. 2001. Enhanced speechreading in deaf adults: Can short-term training/practice close the gap for hearing adults? Journal of Speech, Hearing, and Language Research 44: 5–18. Bosworth, R. G., and K. R. Dobkins. 1999. Left-hemisphere dominance for motion processing in deaf signers. Psychological Science 10: 256–262. Bosworth, R. G., and K. R. Dobkins. 2002a. The effect of spatial attention on motion processing in deaf signers, hearing signers, and hearing nonsigners. Brain and Cognition 4: 152–169. Bosworth, R. G., and K. R. Dobkins. 2002b. Visual field asymmetries for motion processing in deaf and hearing signers. Brain and Cognition 4: 152–169. Bottari, D., M. Turatto, F. Bonfioli, C. Abbadessa, S. Selmi, M. A. Beltrame, and F. Pavani. ������������������� 2008. Change blindness in profoundly deaf individuals and cochlear implant recipients. Brain Research 1242: 209–218. Bottari, D., E. Nava, P. Ley, and F. Pavani. 2010. Enhanced reactivity to visual stimuli in deaf individuals. Restorative Neurology and Neuroscience 28: 167–179. Bottari, D., M. Turatto, and F. Pavani. In preparation. Visual change perception and speeded simple detection in profound deafness. Bravo, M. Y., and K. Nakayama. 1992. The role of attention in different visual search tasks. Perception & Psychophysics 51: 465–472. Bross, M. 1979a. Residual sensory capacities of the deaf: A signal detection analysis of a visual discrimination task. Perceptual Motor Skills 1: 187–194. Bross, M. 1979b. Response bias in deaf and hearing subjects as a function of motivational factors. Perceptual Motor Skills 3: 779–782. Bross, M., and H. Sauerwein. 1980. Signal detection analysis of visual flicker in deaf and hearing individuals. Perceptual Motor Skills 51: 839–843. Brozinsky, C. J., and D. Bavelier. 2004. Motion velocity thresholds in deaf signers: Changes in lateralization but not in overall sensitivity. Brain Research Cognitive Brain Research 21: 1–10. Chelazzi, L., C. A. Marzi, G. Panozzo, N. Pasqualini, G. Tassi������������������������������������������������� nari, and L.������������������������������������� Tomazzoli. ������������������������� 1988. Hemiretinal difference in speed of light detection esotropic amblyopes. Vision Research 28(1): 95–104.
Visual Abilities in Individuals with Profound Deafness
445
Chen, Q., M. Zhang, and X. Zhou. 2006. Effects of spatial distribution of attention during inhibition of return IOR on flanker interference in hearing and congenitally deaf people. Brain Research 1109: 117–127. Cohen, L. G., P. Celnik, A. Pascual-Leone, B. Corwell, L. Falz, J. Dambrosia, J. et al. 1997. Functional relevance of cross-modal plasticity in blind humans. Nature 389: 180–183. Colmenero, J. M., A. Catena, L. J. Fuentes, and M. M. Ramos. 2004. Mechanisms of visuo-spatial orienting in deafness. European Journal Cognitive Psychology 16: 791–805. Corbetta, M., and G. L. Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nature Review Neuroscience 3: 201–21 Doehring, D. G., and J. Rosenstein. 1969. Speed of visual perception in deaf children. Journal of Speech and Hearing Research 12:118–125. Downing, C. J., and S. Pinker. 1985. The spatial structure of visual attention. In Attention and Performance Posner, ed. M. I. Posner and O. S. M. Marin, 171–187. Hillsdale, NJ: Erlbaum. Dye, M. W., P. C. Hauser, and D. Bavelier. 2008. Visual skills and cross-modal plasticity in deaf readers: Possible implications for acquiring meaning from print. Annals of the New York Academy of Science 1145: 71–82. Dye, M. W. G., D. E. Baril, and D. Bavelier. 2007. Which aspects of visual attention are changed by deafness? The case of the Attentional Network Test. Neuropsychologia 45: 1801–1811. Enns, J. T., and R. A. Rensink. 1991. Preattentive recovery of three-dimensional orientation from line-drawings. Psychological Review 98: 335–351. Eriksen, B. A., and C. W. Eriksen. 1974. Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics 16: 143–149. Fine, I., E. M. Finney, G. M. Boynton, and K. R. Dobkins. 2005. Comparing the effects of auditory deprivation and sign language within the auditory and visual cortex. Journal of Cognitive Neuroscience 17: 1621–1637. Finney, E. M., and K. R. Dobkins. 2001. Visual contrast sensitivity in deaf versus hearing populations: exploring the perceptual consequences of auditory deprivation and experience with a visual language. Cognitive Brain Research 11(1): 171–183. Finney, E. M., I. Fine, and K. R. Dobkins. 2001. Visual stimuli activate auditory cortex in the deaf. Natural Neuroscience 4(12): 1171–1173. Finney, E. M., B. A. Clementz, G. Hickok, and K. R. Dobkins. 2003. Visual stimuli activate auditory cortex in deaf subjects: Evidence from MEG. Neuroreport 11: 1425–1427. Furth, H. 1966. Thinking without language: Psychological implications of deafness. New York: Free Press. Suchman, R. G. 1966. Color–form preference, discriminative accuracy and learning of deaf and hearing children. Child Development 37(2): 439–451. Gibson, E. 1969. Principles of perceptual learning and development. New York: Meredith. Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley. Harrington, D. O. 1971. The visual fields. St. Louis, MO: CV Mosby. Hartmann, G. W. 1933. Changes in visual acuity through simultaneous stimulation of other sense organs. Journal of Experimental Psychology 16:393–407. Hartung, J. E. 1970. Visual perceptual skill, reading ability, and the young deaf child. Exceptional Children 36(8): 603–638. Hauser, P. C., M. W. G. Dye, M. Boutla, C. S. Gree, and D. Bavelier. 2007. Deafness and visual enumeration: Not all aspects of attention are modified by deafness. Brain Research 1153: 178–187. Heider, F., and G. Heider. 1940. Studies in the psychology of the deaf. Psychological Monographs 52: 6–22. Heffner R. S., and H. E. Heffner. 1992. Visual factors in sound localization in mammals. Journal of Comparative Neurology. 317(3): 219–32. Heming, J. E., and L. N. Brown. 2005. Sensory temporal processing in adults with early hearing loss. Brain and Cognition 59: 173–82. Henderson, S. E., and L. Henderson. 1973. Levels of visual-information processing in deaf and hearing children. American Journal of Psychology 86(3): 507–521. Hoemann, H. 1978. Perception by the deaf. In Handbook of perception: Perceptual ecology, vol. 10, ed. E. Carterette and M. Friedman, 43–64. NewYork: Academic Press. Jonides, J. 1981. Voluntary versus automatic control over the mind’s eye’s movement. In Attention and performance, Vol. IX, ed. L. B. Long and A. D. Baddeley, 187–203. Hillsdale, NJ: Erlbaum. Jordan, T. E. 1961. Historical notes on early study of the deaf. Journal of Speech Hearing Disorders 26:118–121. Klein, R. M. 2000. Inhibition of return. Trends in Cognitive Science 4: 138–147.
446
The Neural Bases of Multisensory Processes
Levanen, S., and D. Hamdorf. 2001. Feeling vibrations: Enhanced tactile sensitivity in congenitally deaf humans. Neuroscience Letters 301: 75–77. Li, Z. 2002. A saliency map in primary visual cortex. Trends in Cognitive Science 1: 9–16. Loke, W. H., and S. Song. 1991. Central and peripheral visual processing in hearing and nonhearing individuals. Bulletin of the Psychonomic Society 29: 437–440. Marendaz, C., C. Robert, and F. Bonthoux. 1997. Deafness and attentional visual search: A developmental study. Perception A: 26. Marzi, C. A., and M. Di Stefano. 1981. Hemiretinal differences in visual perception. Documenta Ophthalmologica Proceedings Series 30: 273–278. Mayer, A. R., J. M. Dorflinger, S. M. Rao, and M. Seidenberg. 2004. Neural networks underlying endogenous and exogenous visual–spatial orienting. Neuroimage 2: 534–541 Milner, A. D., and M. A. Goodale. 1995. The visual brain in action. Oxford, UK: Oxford Univ. Press. Mitchell, R. E., and M. A. Karchmer. 2002. Demographics of deaf education: More students in more places. American Annals of the Deaf 151(2, issue 2006): 95–104. Washington, DC: Gallaudet Univ. Press. Mitchell, T., and M. T. Maslin. 2007. How vision matters for individuals with hearing loss. International Journal of Audiology 46(9): 500–511. Mohammed T., R. Campbell, M. MacSweeney, E. Milne, P. Hansen, and M. Coleman. 2005. Speechreading skill and visual movement sensitivity are related in deaf speechreaders. Perception 34: 205–216. Myklebust, H. 1964. The psychology of deafness. New York: Grune and Stratton. Näätänen, R. 1992. Attention and brain function. Hillsdale, NJ: Erlbaum. Nava, E., D. Bottari, M. Zampini, and F. Pavani. 2008. Visual temporal order judgment in profoundly deaf individuals. Experimental Brain research 190(2): 179–188. Neville, H. J., and D. S. Lawson. 1987. Attention to central and peripheral visual space in a movement detection task: an event related potential and behavioral study: II. Congenitally deaf adults. Brain Research 405: 268–283. Neville, H. J., and D. Bavelier. 2002. Human brain plasticity: Evidence from sensory deprivation and altered language experience. Progress in Brain Research 138: 177–188. Neville, H. J., A. Schmidt, and M. Kutas. 1983. Altered visual-evoked potentials in congenitally deaf adults. Brain Research 266(1): 127–132. O’Regan, J. K., H. Deubel, J. J. Clark, and R. A. Rensink. 2000. Picture changes during blinks: Looking without seeing and seeing without looking. Visual Cognition 7: 191–212. O’Regan, J. K., R. A. Rensink, and J. J. Clark. 1999. Change-blindness as a result of “mudsplashes.” Nature 398: 34. Olson, J. R. 1967. A factor analytic study of the relation between the speed of visual perception and the language abilities of deaf adolescents. Journal of Speech and Hearing Research 10(2): 354–360. Parasnis, I. 1983. Visual perceptual skills and deafness: A research review. Journal of the Academy of Rehabilitative Audiology 16: 148–160. Parasnis, I., and V. J. Samar. 1985. Parafoveal attention in congenitally deaf and hearing young adults. Brain and Cognition 4: 313–327. Parasnis, I., V. J. Samar, and G. P. Berent. 2003. Deaf adults without attention deficit hyperactivity disorder display reduced perceptual sensitivity and elevated impulsivity on the Test of Variables of Attention T.O.V.A. Journal of Speech Language and Hearing Research 5: 1166–1183. Poizner, H., and P. Tallal. 1987. Temporal processing in deaf signers. Brain and Language 30: 52–62. Posner, M. 1980. Orienting of attention. The Quarterly Journal of Experimental Psychology 32: 3–25. Prinzmetal, W., C. McCool, and S. Park. 2005. Attention: Reaction time and accuracy reveal different mechanisms. Journal of Experimental Psychology: General 134: 73–92. Prinzmetal, W., A. Zvinyatskovskiy, P. Gutierrez, and L. Dilem. 2009. Voluntary and involuntary attention have different consequences: The effect of perceptual difficulty. The Quarterly Journal of Experimental Psychology 2: 352–369. Proksch, J., and D. Bavelier. 2002. Changes in the spatial distribution of visual attention after early deafness. Journal of Cognitive Neuroscience 14: 687–701. Pylyshyn, Z.W. 1989. The role of location indexes in spatial perception: A sketch of the FINST spatial-index model. Cognition 32: 65–97. Quittner, A. L., P. Leibach, and K. Marciel. 2004. The impact of cochlear implants on young deaf children: New methods to assess cognitive and behavioral development. Archives of Otolaryngology Head Neck and Surgery 5: 547–554. Rensink, R. A. 2001. Change blindness: Implications for the nature of attention. In Vision and Attention, ed. M. R. Jenkin and L. R., 169–188. New York: Springer.
Visual Abilities in Individuals with Profound Deafness
447
Rettenbach, R., G. Diller, and R. Sireteanu. 1999. Do deaf people see better? Texture segmentation and visual search compensate in adult but not in juvenile subjects. Journal of Cognitive Neuroscience 5: 560–583. Reynolds, H. 1993. Effects of foveal stimulation on peripheral visual processing and laterality in deaf and hearing subjects. American Journal of Psychology 106(4): 523–540. Rönnberg, J. 1995. Perceptual compensation in the deaf and blind: Myth or reality? In Compensating for psychological deficits and declines, ed. R. A. Dixon and L. Backman, 251–274. Mahwah, NJ: Erlbaum. Sagi, D., and B. Julesz. 1984. Detection versus discrimination of visual orientation. Perception 13(5): 619–628. Sladen, D., A. M. Tharpe, D. H. Ashmead, D. W. Grantham, and M. M. Chun. 2005. Visual attention in deaf and normal hearing adults: effects of stimulus compatibility. Journal of Speech Language and Hearing Research 48: 1–9. Stevens, C., and H. Neville. 2006. Neuroplasticity as a double-edged sword: Deaf enhancements and dyslexic deficits in motion processing. Journal of Cognitive Neuroscience 18: 701–714. Stivalet, P., Y. Moreno, J. Richard, P. A. Barraud, and C. Raphael. 1998. Differences in visual search tasks between congenitally deaf and normally hearing adults. Brain Research Cognitive Brain Research 6: 227–232. Suchman, R. G. 1966. Color–form preference, discriminative accuracy and learning of deaf and hearing children. Child Development 2: 439–451. Summerfield, Q. 1991. Visual perception of phonetic gestures. In Modularity and the motor theory of speech perception, ed. I. G. Mattingly and M. Studdert-Kennedy, 117–137. Hillsdale, NJ: Erlbaum. Treisman, A. 1982. Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology Human Perceptual Performance 2: 94–214. Turatto, M., and B. Brigeman. 2005. Change perception using visual transients: Object substitution and deletion. Experimental Brain Research 167: 595–608. Turatto, M., M. Valsecchi, L. Tamè, and E. Betta. 2007. Microsaccades distinguish between global and local visual processing. Neuroreport 18:1015–1018.
23 A Multisensory Interface for Peripersonal Space
Body–Object Interactions Claudio Brozzoli, Tamar R. Makin, Lucilla Cardinali, Nicholas P. Holmes, and Alessandro Farnè CONTENTS 23.1 Multisensory and Motor Representations of Peripersonal Space.......................................... 449 23.1.1 Multisensory Features of Peripersonal Space: Visuo- Tactile Interaction around the Body.....................................................................................................................449 23.1.1.1 Premotor Visuo-Tactile Interactions........................................................... 450 23.1.1.2 Parietal Visuo-Tactile Interactions.............................................................. 450 23.1.1.3 Subcortical Visuo-Tactile Interaction......................................................... 451 23.1.1.4 A Visuo-Tactile Network............................................................................ 452 23.1.1.5 Dynamic Features of PpS Representation.................................................. 452 23.1.2 Motor Features of PpS: Visuo-Tactile Interaction around the Acting Body............. 452 23.1.3 A Multisensory–Motor Network for Body–Object Interactions in PpS.................... 454 23.2 Multisensory-Based PpS Representation in Humans............................................................ 455 23.2.1 PpS Representation in Humans................................................................................. 455 23.2.1.1 PpS Representation in Neuropsychological Patients.................................. 455 23.2.1.2 PpS Representation in Neurotypical Participants....................................... 456 23.2.2 A Multisensory Interface for Body–Objects Interactions......................................... 458 23.3 Conclusion.............................................................................................................................460 Acknowledgments...........................................................................................................................460 References.......................................................................................................................................460
23.1 MULTISENSORY AND MOTOR REPRESENTATIONS OF PERIPERSONAL SPACE 23.1.1 M ultisensory Features of Peripersonal Space: Visuo- Tactile Interaction around the Body The binding of visual information available outside the body with tactile information arising, by definition, on the body, allows the representation of the space lying in between, which is often the theater of our interactions with objects. The representation of this intermediate space has become known as “peripersonal space” (Rizzolatti et al. 1981b, 1981c). The definition of peripersonal space (PpS hereafter) originates from single-unit electrophysiological studies in macaque monkeys, based on a class of multisensory, predominantly visual–tactile neurons. Over the years, such neurons have been identified in several regions of the monkey brain, including premotor area 6, parietal areas (Broadmann’s area 7b and the ventral intraparietal area, VIP), and the putamen (Fogassi et al. 1999; Graziano 2001; Rizzolatti et al. 1997). The most relevant characteristic of these neurons, for present 449
450
The Neural Bases of Multisensory Processes
purposes, is that, in addition to responding both to visual and tactile stimulation (referred to here as visuo-tactile), their visually evoked responses are modulated by the distance between the visual object and the tactile receptive field (RF). This allows for the coding of visual information that is dependent, or centered, on the body part that contains the tactile RF. 23.1.1.1 Premotor Visuo-Tactile Interactions The most detailed series of studies on the properties of visuo-tactile neurons have been performed in the premotor cortex. Neurons in the F4 subregion of inferior area 6 in ventral premotor cortex (Matelli et al. 1985) are strongly responsive to tactile stimulation. They are characterized by relatively large tactile RFs located primarily on the monkey’s face, neck, arm, hand, or both hands and face (e.g., in the peribuccal region; Gentilucci et al. 1988; Rizzolatti et al. 1981a). A large proportion (85%) of the tactile neurons in this area also discharges in response to visual stimuli. According to the depth of the visual RFs extending out from the body, these bimodal neurons were originally subdivided into pericutaneous (54%) and distant peripersonal neurons (46%). The pericutaneous neurons responded best to stimuli presented a few centimeters from the skin (10 cm or less; Rizzolatti et al. 1981b), whereas the distant peripersonal neurons responded to stimuli within reach of the monkey’s arms. We will refer to both as “peripersonal” visuo-tactile neurons throughout the text. Therefore, an important property of these neurons (and neurons in other PpS-related areas; see below) is that their visual RFs are limited in depth from the tactile RFs (in most cases from ~5 to ~50 cm). The visual RFs are generally independent of gaze direction (Fogassi et al. 1992; Gentilucci et al. 1983), being spatially related instead to the body parts on which the tactile RFs are located. Moreover, when the arm is moved under the monkey’s view, the visual RF follows the body part, being “anchored” to the tactile RF thus keeping a rough spatial match between the locations of the visual RF and the arm with every displacement (Graziano et al. 1994, 1997; Figure 23.1). Although less numerous, visuo-tactile neurons are also present in the rostral subregion F5 of area 6, and have smaller tactile RFs than F4 neurons. The tactile RFs are frequently located on the face, the hand, or both. However, the visual properties of these neurons were shown to be quite different: even though stimuli presented close to the body resulted in stronger responses, the size of the stimuli appeared to be a more critical factor in driving the activity of F5 neurons (Rizzolatti et al. 1988; Rizzolatti and Gentilucci 1988). 23.1.1.2 Parietal Visuo-Tactile Interactions The posterior parietal lobe of the macaque brain contains two subregions with visuo-tactile properties: area 7b of the inferior posterior parietal lobe and the ventral section of the intraparietal sulcus (VIP). As in the premotor cortex, electrophysiological studies in awake monkeys revealed that visuo-tactile integration in these areas arises at the single unit level (Hyvärinen and Poranen 1974; Hyvärinen 1981; Leinonen et al. 1979; Leinonen and Nyman 1979; Mountcastle et al. 1975; Robinson et al. 1978; Robinson and Burton 1980a, 1980b).* Within area 7b, most neurons were responsive to tactile stimuli, and presented a gross somatotopic organization, with separate face, arm, and hand representations (Hyvärinen and Shelepin 1979; Hyvärinen 1981; Robinson and Burton 1980a). Within the face and arm regions of this map, visuo-tactile cells (33%) have been reported (Hyvärinen and Poranen 1974; Hyvärinen and Shelepin 1979; Hyvärinen 1981; Leinonen et al. 1979; Leinonen and Nyman 1979). What is the function of these responses? Researchers initially interpreted these visual responses as an “anticipatory activation” that appeared before the neuron’s tactile RF was touched (Hyvärinen and Poranen 1974, p. 675). Importantly, a close correspondence between the tactile and visual RFs has been documented, especially for tactile RFs on the arm * A possibly earlier report can be attributed to Sakata and colleagues (1973, p. 100). In this study about the functional organization of area 5, the authors stated: “Even the relatively rare neurons which we could activate visually were more powerfully driven by somatosensory stimuli.” However, no further detail or discussion was offered concerning the limitation in depth of the visual RF.
451
Peripersonal Space (a)
1
2
3
4
Tactile RF Visual stimulus trajectory
(c)
= Arm right = Arm left
(b) 1
2
3
4
Response of neuron (spike/s)
70 60 50 40 30 20 10 0
1
2
3
4
Stimulus trajectory
FIGURE 23.1 Representation of visual stimuli in hand-based coordinates. Visual responses of a typical premotor neuron with a tactile RF (hatched) on forearm and hand, and a visual RF within 10 cm of tactile RF. On each trial, the arm contralateral to neuron was fixed in one of two positions: (a) on the right (light gray symbols and lines) or (b) on the left (dark gray symbols and lines) and visual stimulus was advanced along one of four trajectories (numbered 1–4). (c) Responses of neuron to four stimulus trajectories when the arm was visible to the monkey were recorded for both positions. When the arm was fixed on the right, response was maximal for trajectory 3, which was approaching the neuron’s tactile RF. When the arm was fixed on the left, maximal response shifted with the hand to trajectory 2, which was now approaching the tactile RF. This example shows that neurons in the monkey’s premotor cortex represent visual information with respect to the tactile RF. (Modified from Graziano, M. S. A. In Proceedings of the National Academy of Sciences of the United States of America, 1999.)
(Leinonen et al. 1979). That is, these neurons’ activation was shown to be dependent on the distance of the effective visual stimulus from the body part. Most of these neurons responded to visual stimuli moving toward the monkey, within about 10 cm of the tactile RF (although in some cases, stimulation presented further away, but still within a reachable distance, was also effective). Multisensory neurons have also been found in the monkey area VIP, in the fundus of the intraparietal sulcus (Avillac et al. 2005; Colby and Duhamel 1991; Colby et al. 1993; Duhamel et al. 1998). VIP neurons respond to tactile and visual stimulation presented within a few centimeters of the tactile RF. Unlike area 7b neurons, tactile RFs in VIP are primarily located on the face and head, and visual RFs are anchored to a region of space around the face (Colby et al. 1993). 23.1.1.3 Subcortical Visuo-Tactile Interaction Pools of multisensory neurons have also been found in subcortical structures of the macaque brain. The multisensory encoding of events has been well established in the superior colliculus (Stein and Meredith 1993; Wallace and Stein 2007). Such collicular activity, however, seems not to be devoted primarily to representing the space near the body (for a full discussion of the properties and functional roles of multisensory neurons in the superior colliculus, see Chapter 11 and Chapter 15). The putamen, on the other hand, seems to be a relevant region for the visuo-tactile processing of events in the space around the body (Graziano and Gross 1993, 1994, 1995). Visuo-tactile neurons in the putamen with tactile RFs on the arm, hand, and face are somatotopically organized. Just as for the cortical visuo-tactile neurons, the visual and tactile RFs in the putamen show a rough spatial correspondence, with the visual RFs being anchored to the tactile ones. Most of the neurons also
452
The Neural Bases of Multisensory Processes
responsive to visual stimuli, as long as they are presented close to the tactile RF. A large portion (82%) of face neurons responds best to visual stimuli presented in a region of space within 10–20 cm from the tactile RF. Neurons with tactile RFs on the arm and hand present even more shallow visual RFs around the hand (up to 5 cm; Graziano and Gross 1993). 23.1.1.4 A Visuo-Tactile Network The neurophysiological findings described in the previous sections define a set of at least four distinctive areas with similar visuo-tactile responses: premotor inferior area 6, parietal areas 7b and VIP, and the putamen. These areas are heavily interconnected, forming a tight network (Matelli and Luppino 2001; Rizzolatti et al. 1997, 1998). Neurons in this network share some common features: (1) The visual responses lie primarily within a head–face or arm–hand centered somatosensory representation of the body. (2) Visual stimuli moving near the monkey modulate the neurons’ responses stronger than farther stimuli. This suggests that these neurons allow for body part–centered coding of visual stimuli within sectors of space adjacent to the tactile surface. This network possesses all of the necessary properties to bind together external visual information around the body and tactile information on a specific body part (Fogassi et al. 1992; Graziano and Gross 1993; Rizzolatti et al. 1997). 23.1.1.5 Dynamic Features of PpS Representation An important characteristic of some visuo-tactile areas is the dynamic property of their visual RFs. Fogassi and colleagues (1996) found that the depth of the visual RFs of F4 visuo-tactile neurons can increase with increases in the velocity (20–80 cm/s) of a visual stimulus approaching the cutaneous RF. This property could be crucial for preparing and/or executing actions toward nearby objects. Iriki and colleagues (1996) revealed that, after training monkeys to use a rake as a tool to reach food pellets placed outside their reaching space, some neurons in the post-central gyrus (somewhat extending into the intraparietal sulcus) began to display visual responses. In addition, although concerns have been raised in this respect (Holmes and Spence 2004), such visual responses appeared to be modulated by active, but not by passive, tool use. The newly acquired visual RFs seemed to have expanded toward the tool tip. A few minutes after the active tool use, the visual RFs apparently shrank back to their original size. In other words, the dynamic aspects of the visual RF may depend on the execution of specific motor actions (Rizzolatti et al. 1998). An interesting recent finding showed that visuo-tactile neurons within area 7b and VIP also respond when another individual’s body part is approached by a visual stimulus (Ishida et al. 2009). Similarly to the visuo-tactile neurons described above, these “body-matching neurons” respond to visual stimuli presented near the tactile RF. Moreover, the neurons are responsive to a visual stimulus presented close to the corresponding body part of another individual (a human experimenter) being observed by the monkey. For instance, a neuron displaying a tactile RF on the arm not only responded to a visual stimulus presented close to the monkey’s own arm, but also to visual stimuli presented close to another individual’s arm. For some of these neurons, this matching property seems to be independent of the position of the observed individual with respect to the observing monkey (up to 35° of rotation).
23.1.2 Motor Features of PpS: Visuo-Tactile Interaction around the Acting Body Why should the brain maintain a representation of the space around the body separate from a representation of far extrapersonal space? One possibility is that this dichotomy stems purely from perceptual aims, giving a “greater” perceptual salience to visual events occurring in the vicinity of the body. Following this idea, the parieto-frontal network, together with the putamen, would code visual space with individual body parts as its reference. This is suggested by the sensory properties of this set of neurons, responding selectively for visual information close to the body. However, we believe that this interpretation does not fully describe the potential functional applications of this
Peripersonal Space
453
system, since it does not correspond with some of the evidence described above. First, it may be difficult to interpret the complex tactile RFs of some of these neurons (e.g., single neurons in area F4 that represent both the hand and face, as reported by Rizzolatti et al. 1981a, 1981b). Second, it does not account for the dynamic changes in their visual RFs, as observed in cases of objects approaching the body (Fogassi et al. 1996). More critically, a purely perceptual account does not fit with the presence of such bimodal neurons in a predominantly “motor” area, such as the premotor cortex. Numerous visuo-tactile neurons in inferior area 6 (Gentilucci et al. 1988; Rizzolatti et al. 1981c, 1987, 1988, 1997; Rizzolatti and Gentilucci 1988), parietal areas 7b (Hyvärinen 1981; Hyvärinen and Poranen 1974; Hyvärinen and Shelepin 1979; Leinonen 1980; Leinonen et al. 1979; Leinonen and Nyman 1979; Robinson et al. 1978), and the putamen (Crutcher and DeLong 1984) respond not only to passive visual and tactile stimulation, but also during motor activity. These findings raise the more compelling possibility that the multisensory representation of PpS serves some motor function. Objects in the vicinity of the body are indeed more relevant by virtue of the possible interactions our body can establish with them (Graziano et al. 1993; Rizzolatti et al. 1997, 1998). Therefore, hand-centered representation of PpS provides us with extremely valuable information regarding the spatial position of objects with respect to our hands. Here follows a description of the motor aspects associated with PpS brain areas, as revealed by electrophysiological studies in macaque monkeys. The premotor cortex has both direct (Martino and Strick 1987) and indirect (Godschalk et al. 1984; Matsumura and Kubota 1979; Muakkassa and Strick 1979; Pandya and Vignolo 1971) access to the control of upper limb movements, via projections to the spinal cord and the primary motor cortex, respectively. The motor properties of neurons in the inferior premotor cortex support a role for this structure in a perception–action interface. In particular, the visual responses of some neurons within this area are enhanced when a reaching movement is performed toward an object (Godschalk et al. 1985), as well as during reaching and grasping movements of the arm and hand (Godschalk et al. 1981, 1985; Kurata et al. 1985; Kurata and Tanji 1986; Rizzolatti and Gentilucci 1988) and mouth (Rizzolatti et al. 1981c). Moreover, neurons in this area show a rather precise degree of motor representation. Proximal and distal movements are represented separately (in areas F4/F1 and area F5, respectively), with the proximal neurons mostly activated for arm and face movements. (Gentilucci et al. 1988; Kurata and Tanji 1986; Murata et al. 1997; Raos et al. 2006; Rizzolatti et al. 1987, 1988; Rizzolatti and Gentilucci 1988). Crucially, the passive RFs and the active movements appear to share related functional roles: neurons with visuo-tactile RFs on the face also discharged during arm reaching movements toward the upper part of space that corresponds to its visual RF. This suggests that the sensory and motor responses are expressed in a common reference frame for locating objects in the space close to the body and for guiding movements toward them. We believe that such a complex motor mechanism cannot subserve a purely perceptual function. Parietal area 7b also has motor properties. As in the premotor cortex, parietal motor functions seem to be related to approaching movements of a body part toward an object (Gardner et al. 2007; Lacquaniti and Caminiti 1998; Rizzolatti et al. 1997). Indeed, the posterior parietal cortex is part of the dorsal stream of action-oriented visual processing (Milner and Goodale 1995), and both inferior and superior parietal lobules are interconnected with the premotor cortex (see above). Ablation and reversible inactivation studies in monkeys have shown a direct relationship between the PpS network and motor responses. These studies tested for the behavioral consequences of a lesion within premotor and posterior parietal areas, where visuo-tactile neurons have been found. Interestingly, lesions to both the anterior or posterior parts of this network seem to produce very similar patterns of motor impairments, most of which affect, in particular, the execution of visually guided reaching actions (Battaglini et al. 2002; Deuel and Regan 1985; Ettlinger and Kalsbeck 1962; Faugier-Grimaud et al. 1978; Gallese et al. 1994; Halsban and Passingham 1982; Moll and Kuypers 1977; Rizzolatti et al. 1983). After premotor ablation, for instance, the monkeys were unable to reach when the movement required the monkey to avoid an obstacle with the contralesional arm. Arm movements were executed without correctly taking into account visual information within
454
The Neural Bases of Multisensory Processes
PpS (Battaglini et al. 2002; Moll and Kuypers 1977). Similarly, removal of postarcuate regions in the premotor cortex where the mouth is represented (presumably in area F4), caused a severe impairment in grasping with the mouth (Rizzolatti et al. 1983). Attentional deficits have also been reported after selective damage to visuo-tactile parietal and premotor regions (Rizzolatti et al. 1983) in the form of spatial hemineglect and extinction. The monkeys appeared to be unaware of visual (or tactile) stimuli presented in the contralesional space. Crucially, this deficit was selective for the space around the body. Subregion F5 of the inferior area 6 is also characterized by the presence of “mirror” neurons, a special class of motor neurons with visual properties. These neurons are selective for the execution of a specific motor act, such as precision grasping. They also discharge when the monkey observes another monkey or a human executing the same action (di Pellegrino et al. 1992; Gallese et al. 1996; Rizzolatti et al. 1996).* Relevant for this chapter is a recent study that showed selectivity in certain mirror neurons for actions performed within the observer’s PpS rather than in its extrapersonal space (peripersonal mirror neurons, Caggiano et al. 2009). A different subpopulation of mirror neurons showed the opposite preference (i.e., selectivity for actions performed in extrapersonal space, rather than PpS). Moreover, peripersonal and extrapersonal space appeared to be defined according to a functional criterion: When accessibility to PpS was limited (e.g., by placing a screen in front of the monkey), the responses of several peripersonal mirror neurons were reduced during observation of actions performed in the inaccessible portion of the space. That is, when PpS was inaccessible for action, it has been represented as farther extrapersonal space. Indeed, in such circumstances, extrapersonal mirror neurons started to respond to observation of actions performed in the inaccessible PpS.
23.1.3 A Multisensory–Motor Network for Body–Object Interactions in PpS The above reviewed studies provide a large body of indirect evidence in favor of the proposal that this parieto-frontal network binds together visual and tactile information in order to generate an appropriate motor program toward objects in the world. We would like to suggest that the occurrence of multisensory and motor processing within the same area provides an interface between perception and action. What kind of body–object interactions can body-centered PpS representation subserve? PpS has traditionally been suggested to play a role in guiding hand actions toward objects within reaching distance (Bremmer 2005; Fogassi and Luppino 2005; Graziano 1999; Maravita et al. 2003; Maravita 2006; Rizzolatti et al. 1987). Indeed, the evidence described above seems to support the involvement of some PpS areas in reaching and grasping. Another intriguing possibility that has recently been investigated is the involvement of the PpS network in defensive (re)actions. By acting as an anticipatory sensory–motor interface, PpS may serve for the early detection of potential threats approaching the body (Fogassi et al. 1996) in order to drive involuntary defensive movements (Cooke and Graziano 2004; Graziano and Cooke 2006). The most direct evidence in favor of this hypothesis comes from cortical electrical stimulation studies (although concerns have been raised in this respect; see Strick 2002; Graziano et al. 2002). Electrical stimulation of the ventral premotor cortex and the VIP (Graziano and Cooke 2006) has been reported to elicit a pattern of movements that is compatible with defensive arm movements and the withdrawal of the arm or the head (Cooke and Graziano 2003). However, the same anticipatory features may also have evolved to serve voluntary object-oriented actions (Gardner et al. 2007; Rizzolatti et al. 1981a, 1981b, 1997). In support of this view are the results of the described electrophysiological recording studies, showing the motor properties of both parietal and periarcuate visuo-tactile neurons, whose discharges are * A first report of neurons responding while the monkey was watching an action performed by another individual is already present in an early electrophysiological study over the parietal area 7b (Leinonen 1980, p. 305) : “[…] two cells discharged when the monkey grasped an object […] or when the monkey saw an investigator grasp an object.”
455
Peripersonal Space
mostly correlated with reaching and grasping movements (see Section 23.1.2). The two hypotheses (involuntary and voluntary object-oriented actions) are not mutually exclusive and one could speculate that a fine-grained and sophisticated function could have developed from a more primordial defensive machinery, using the same visuo-tactile spatial coding of the PpS (see the “neuronal recycling hypothesis,” as proposed by Dehaene 2005). This hypothetical evolutionary advancement could lead to the involvement of the PpS mechanisms in the control of the execution of voluntary actions toward objects. Some comparative data showed, for instance, that prosimian sensory areas corresponding to the monkeys’ parietal areas already present some approximate motor activity. The most represented movements are very stereotyped limb retractions that are associated with avoidance movements (Fogassi et al. 1994).
23.2 MULTISENSORY-BASED PPS REPRESENTATION IN HUMANS Several studies support the existence of a similar body part–centered multisensory representation of the space around the body in the human brain. In this respect, the study of a neuropsychological condition called “extinction” (Bender 1952; Brozzoli et al. 2006) has provided considerable insight into the behavioral characteristics of multisensory spatial representation in the human brain (Ladavas 2002; Ladavas and Farnè 2004; Legrand et al. 2007). Evidence for visuo-tactile interactions is also available in healthy people, in the form of distance-modulated interference exerted by visual over tactile stimuli (Brozzoli et al. 2009a, 2009b; Spence et al. 2004a, 2008). The crucial point of these studies is the presence, both in the brain-damaged and healthy populations, of stronger visuo-tactile interactions when visual stimuli are presented in near, as compared to far space. These studies thus support the idea that the human brain also represents PpS through an integrated visuo-tactile system (Figure 23.2).
23.2.1 PpS Representation in Humans 23.2.1.1 PpS Representation in Neuropsychological Patients Extinction is a pathological sign following brain damage, whereby patients fail to perceive contralesional stimuli only under conditions of double simultaneous stimulation, thus revealing the competitive nature of this phenomenon (di Pellegrino and De Renzi 1995; Driver 1998; Ward et al.
Head- and hand-centered peripersonal space Arm-centered reaching space
FIGURE 23.2 Peripersonal space representation. Head- and hand-centered peripersonal space (dark gray areas) with respect to reaching space (light gray region). (Modified from Cardinali, L. et al., In Encyclopedia of Behavioral Neuroscience, 2009b.)
456
The Neural Bases of Multisensory Processes
1994). A number of studies have shown that extinction can emerge when concurrent stimuli are presented in different sensory modalities: A visual stimulus presented near to the ipsilesional hand can extinguish a touch delivered on the contralesional hand (di Pellegrino et al. 1997; see also Costantini et al. 2007, for an example of cross-modal extinction within a hemispace). Crucially, such crossmodal visuo-tactile extinction appears to be stronger when visual stimuli are presented in near as compared to far space, thus providing neuropsychological support for the idea that the human brain represents PpS through an integrated visuo-tactile system. Moreover, in accordance with the findings from the electrophysiological studies described in the previous section, visual responses to stimuli presented near the patient’s hand remain anchored to the hand when it is moved to the opposite hemispace. This evidence suggests that PpS in humans is also coded in a hand-centered reference frame (di Pellegrino et al. 1997; Farnè et al. 2003). A converging line of evidence suggests that the space near the human face is also represented by a multisensory mechanism. We demonstrated that visuo-tactile extinction can occur by applying visual and tactile stimuli on the patient’s face (Farnè et al. 2005b). Interestingly, the extinction was strongest when the homologous body part was being stimulated (i.e., left and right cheeks, rather than left hand and right cheek), suggesting that different spatial regions, adjacent to different body parts, are represented separately (Farnè et al. 2005b). In a further study, we presented four extinction patients with visual stimuli near and far from the experimenter’s right hand, as well as from their own right hands (Farnè et al., unpublished data). Although the visual stimulus presented near the patients’ hands successfully extinguished the touch on the patients’ left hand, no cross-modal extinction effect was found to support a possible body-matching property of the human PpS system. This discrepancy with the evidence reported in the electrophysiological literature might stem from the fact that we used a more radical change in orientation between the observer’s own and the observed hands (more than 35°; see Section 23.1.1). Finally, we have shown that the human PpS also features plastic properties, akin to those demonstrated in the monkey: Visual stimuli presented in far space induced stronger cross-modal extinction after the use of a 38-cm rake to retrieve (or act upon) distant objects (Farnè and Làdavas 2000; see also Berti and Frassinetti 2000; Bonifazi et al. 2007; Farnè et al. 2005c, 2007; Maravita and Iriki 2004). The patients’ performance was evaluated before tool use, immediately after a 5-min period of tool use, and after a further 5- to 10-min resting period. Far visual stimuli were found to induce more severe contralesional extinction immediately after tool use, compared with before tool use. These results demonstrate that, although near and far spaces are separately represented, this spatial division is not defined a priori. Instead, the definition of near and far space may be derived functionally, depending on movements that allow the body to interact with objects in space.* 23.2.1.2 PpS Representation in Neurotypical Participants In healthy participants, most of the behavioral evidence for the hand-centered visuo-tactile representation of near space derives from a visuo-tactile interference (VTI) paradigm. In this series of studies, participants were asked to discriminate between two locations of a tactile stimulus, while an irrelevant visual distractor was delivered at a congruent or incongruent location. The overall effect was a slowing in response times for the incongruent trials, as compared with the congruent ones (Pavani and Castiello 2004; Spence et al. 2004b, 2008). More relevant here is the fact that the interference exerted when the visual distractor was presented near to as compared to far from the tactile targets. In analogy with the cross-modal extinction studies, the VTI was stronger when the visual information occurred close to the tactually stimulated body part rather than in far space (for reviews, see Spence et al. 2004b, 2008). Using the same approach, the effect of tool use on VTI in near and far space has been studied in healthy individuals (Holmes et al. 2004, 2007a, * We have recently studied the effects of tool use on the body schema (Cardinali et al. 2009c). We have found that the representation of the body has been dynamically updated with the use of the tool. This dynamic updating of the body schema during action execution may serve as a sort of skeleton for PpS representation (for a critical review of the relationship between human PpS and body schema representations, see Cardinali et al. 2009a).
Peripersonal Space
457
2007b, 2008), with some differences in results as compared to studies conducted in neurological patients, as described above (see also Maravita et al. 2002). Evidence for the existence of multisensory PpS is now accumulating from neuroimaging studies in healthy humans. These new studies provide further support for the homologies between some of the electrophysiological evidence reviewed above and the PpS neural mechanisms in the human brain. Specifically, brain areas that represent visual and tactile information on and near to the hand and face in body-centered coordinates have been reported to be the anterior section of the intraparietal sulcus and the ventral premotor cortex (Bremmer et al. 2001; Makin et al. 2007; Sereno and Huang 2006). These findings correspond nicely with the anatomical locations of the monkey visuotactile network. Moreover, recent studies have identified the superior parietal occipital junction as a potential site for representing near-face and near-hand visual space (Gallivan et al. 2009; Quinlan and Culham 2007). This new evidence extends our current knowledge of the PpS neural network, and may guide further electrophysiological studies to come. Although using functional brain imaging enabled us to demonstrate that multiple brain areas in both sensory and motor cortices modulate their responses to visual stimuli based on their distance from the hand and face, it did not allow us to determine the direct involvement of such representations in motor processing. In a series of experiments inspired by the macaque neurophysiological literature, we recently examined the reference frames underlying rapid motor responses to real, three-dimensional objects approaching the hand (Makin et al. 2009). We asked subjects to make a simple motor response to a visual “Go” signal while they were simultaneously presented with a taskirrelevant distractor ball, rapidly approaching a location either near to or far from their responding hand. To assess the effects of these rapidly approaching distractor stimuli on the excitability of the human motor system, we used single pulse transcranial magnetic stimulation, applied to the primary motor cortex, eliciting motor evoked potentials (MEPs) in the responding hand. As expected, and across several experiments, we found that motor excitability was modulated as a function of the distance of approaching balls from the hand: MEP amplitude was selectively reduced when the ball approached near the hand, both when the hand was on the left and on the right of the midline. This suppression likely reflects the proactive inhibition of a possible avoidance responses that is elicited by the approaching ball (see Makin et al. 2009). Strikingly, this hand-centered suppression occurred as early as 70 ms after ball appearance, and was not modified by the location of visual fixation relative to the hand. Furthermore, it was selective for approaching balls, since static visual distractors did not modulate MEP amplitude. Together with additional behavioral measurements, this new series of experiments provides direct and converging evidence for automatic hand-centered coding of visual space in the human motor system. These results strengthen our interpretation of PpS as a mechanism for translating potentially relevant visual information into a rapid motor response. Together, the behavioral and imaging studies reviewed above confirm the existence of brain mechanisms in humans that are specialized for representing visual information selectively when it arises from near the hand. As highlighted in the previous section on monkey research, a strong binding mechanism of visual and tactile inputs has repeatedly been shown also in humans. Importantly, these converging results have refined and extended our understanding of the neural processes underlying multisensory representation of PpS, namely, by identifying various cortical areas that are involved in different sensory–motor aspects of PpS representation, and the time course of handcentered processing. The tight relationship between motor and visual representation of near space in the human brain led us most recently to an intriguing question: Would the loss of a hand through amputation (and therefore the inability of the brain to represent visual information with respect to it) lead to changes in visual perception? We recently discovered that hand amputation is indeed associated with a mild visual “neglect” of the amputated side: Participants with an amputated hand favored their intact side when comparing distances in a landmark position-judgment task (Makin et al. 2010). Importantly, this bias was absent when the exact same task was repeated with the targets placed in far space. These results thus suggest that the possibility for action within near space shapes the actor’s spatial
458
The Neural Bases of Multisensory Processes
perception, and emphasize the unique role that PpS mechanisms may play as a medium for interactions between the hands and the world.
23.2.2 A Multisensory Interface for Body–Objects Interactions Until recently, the characteristics of visuo-tactile PpS in humans had been assessed exclusively, whereas the relevant body parts were held statically. Even the most “dynamic” properties of PpS, such as tool-use modulation of the visuo-tactile interaction, have been studied in the static phase before or after the active use of the tool (Farnè et al. 2005a; Holmes et al. 2007b; Maravita et al. 2002). An exception could be found in studies showing dynamic changes of PpS during tasks such as line bisection (e.g., Berti and Frassinetti 2000), although multisensory integration was not measured in these studies. However, if the PpS representation is indeed directly involved in body–object interactions, then modulations of visuo-tactile interaction should be found without needing the use of any tools. On the contrary, the visuo-tactile interaction, or the dynamic “remapping” of near space, should be a basic, primary property that only secondarily can be generalized to tool use (see Brozzoli et al. 2009b). In this respect, the execution of a voluntary free-hand action, for instance reaching toward an object, should induce a rapid online remapping of visuo-tactile spatial interactions, as the action unfolds. To test this hypothesis in humans, we conceived a modified version of the VTI paradigm described above, where multisensory interactions were also assessed during the dynamic phases of an action. We asked a group of healthy participants to perform two tasks within each trial. The first task was perceptual, whereby participants discriminated the elevation (up or down) of a tactile target delivered to a digit on one hand (index finger or thumb) trying to ignore task-irrelevant visual distractor presented on a target object. The second motor task consisted of grasping the target object, which was presented in four different orientations, with the index finger and thumb in a precision grip. The visuo-tactile stimulation was presented at one of three different timings with respect to the execution of the action: either in a static phase, when the grasping hand had not yet moved; at the onset of the movement (0 ms); or in the early execution phase (200 ms after movement onset). When participants performed the action with the tactually stimulated hand, the VTI was enhanced (i.e., there was more interference from the visual distractor on the tactile task) as compared to the static phase (Figure 23.3a). This effect was even more pronounced when the visuotactile interaction was assessed during the early execution phase of the grasping. Crucially, if the same action was performed with the nonstimulated hand, no multisensory modulation was observed, even though both hands displayed comparable kinematic profiles (Brozzoli et al. 2009b; see Figure 23.3b). This result provided the first evidence that, in humans, a motor-evoked remapping of PpS occurs, which is triggered by the execution of a grasping action: As in the monkey brain (see Section 23.1.1), the human brain links sources of visual and tactile information that are spatially separated at the action onset, updating their interaction as a function of the phase of the action. Our brain updates the relationship between visual and tactile information well before the hand comes into contact with the object, since the perceptual reweighting is already effective at the very early stage of the action (Figure 23.3a and b). The finding that such visuo-tactile reweighting was observed selectively when both perceptual and grasping tasks concerned the same hand, not only confirms the hand-centered nature of the PpS, but critically extends this property to ecological and adaptive dynamic situations of voluntary manipulative actions. Furthermore, the kinematics analysis revealed possible parallels between the motor and perceptual performances, showing that a difference in the kinematic pattern was reflected by a difference in the perceptual domain (for details, see Brozzoli et al. 2009b). It is worth noting that the increase in VTI that was triggered by the action, even if already present at the very onset of the movement (Figure 23.3a and b), kept increasing during the early execution phase. That is, an even stronger interference of visual on tactile information was revealed, as the action unfolded in time and space. This suggests that performing a voluntary action triggers a continuous monitoring of action space, which keeps “assisting” the motor execution of the action during its whole dynamic phase.
459
Peripersonal Space
100 90 80 70 60 50 40 30 20 10 0
*
*
*
*
CCE (ms)
(a)
Static
(b)
Start
Execution Z (mm)
Z (mm)
500
400
300 200 Y (mm)
100
Z (mm)
300
300
300
200
200
200
100
100
100
0
500
400
300 200 Y (mm)
100
0
500
400
300 200 Y (mm)
100
0
FIGURE 23.3 (See color insert.) Grasping actions remap peripersonal space. (a) Action induces a reweighting of multisensory processing as shown by a stronger VTI at action Onset (55 ms) compared to Static condition (22 ms). The increase is even more important (79 ms) when stimulation occurs in early Execution phase (200 ms after action starts). (b) Dynamics of free hand grasping; schematic representation of estimated position of the hand in the instant when stimulation occurred, for the static condition (blue panel), exactly at onset of movement (yellow panel) or during the early execution phase (light blue panel). Wrist displacement (green trajectory) and grip evolution (pink trajectories) are shown in each panel. (Modified from Brozzoli, C. et al., NeuroReport, 20, 913–917, 2009b.)
To investigate more deeply the relationship between PpS remapping and the motor characteristics of the action, we tested whether different multisensory interactions might arise as a function of the required sensory–motor transformations. We would expect that action-dependent multisensory remapping should be more important whenever action performance requires relatively more complex sensory–motor transformations. In a more recent study (Brozzoli et al. 2009a), we asked a group of healthy participants to perform either grasping movements (as in Brozzoli et al. 2009b) or pointing movements. For both movements, the interaction between task-irrelevant visual information on the object and the tactile information delivered on the acting hand increased in the early component of the action (as reflected in a higher VTI), thus replicating our previous findings. However, a differential updating of the VTI took place during the execution phase of the two action types. Although the VTI magnitude was further increased during the execution phase of the grasping action (with respect to movement onset), this was not the case in the pointing action. In other words, when the hand approached the object, the grasping movement triggered stronger visuo-tactile interaction than pointing. Thus, not only a continuous updating of PpS occurs during action execution, but this remapping varies with
460
The Neural Bases of Multisensory Processes
the characteristics of the given motor act. If (part of) the remapping of PpS is already effective at the onset of the motor program, the perceptual modulation will be kept unchanged. But in the case of relatively complex object-oriented interactions such as grasping, the remapping of PpS will be dynamically updated with respect to the motor command.
23.3 CONCLUSION The studies reviewed in this chapter uncover the multisensory mechanisms our brain uses in order to directly link between visual information available outside our body and tactile information on our body. In particular, electrophysiological studies in monkeys revealed that the brain builds a body parts–centered representation of the space around the body, through a network of visuo-tactile areas. We also reviewed later evidence suggesting a functionally homologous representation of PpS in humans, which serves as a multisensory interface for interactions with objects in the external world. Moreover, the action-related properties of PpS representation feature a basic aspect that might be crucial for rapid and automatic avoidance reactions, that is, a hand-centered representation of objects in near space. We also showed that PpS representation is dynamically remapped during action execution, as a function of the sensory–motor transformations required by the action kinematics. We therefore suggested that PpS representation may also play a major role in voluntary action execution on nearby objects. These two hypotheses (involuntary and voluntary object-oriented actions) are not mutually exclusive and one could speculate that, from a more primordial defensive function of this machinery, a more fine-grained and sophisticated function could have developed using the same, relatively basic visuo-tactile spatial computational capabilities. This development could lead to its involvement in the control of the execution of voluntary actions toward objects.
ACKNOWLEDGMENTS This work was supported by European Mobility Fellowship, ANR grants no. JCJC06_133960 and RPV08085CSA, and INSERM AVENIR grant no. R05265CS.
REFERENCES Avillac, M., S. Denève, E. Olivier, A. Pouget, and J. R. Duhamel. 2005. Reference frames for representing visual and tactile locations in parietal cortex. Nature Neuroscience 8: 941–949. Battaglini, P. P., A. Muzur, C. Galletti, M. Skrap, A. Brovelli, and P. Fattori. 2002. Effects of lesions to area V6A in monkeys. Experimental Brain Research 144: 419–422. Bender, M. 1952. Disorders in perception. Springfield, IL: Thomas. Berti, A., and F. Frassinetti. 2000. When far becomes near: Remapping of space by tool use. Journal of Cognitive Neuroscience 12: 415–420. Bremmer, F. 2005. Navigation in space—the role of the macaque ventral intraparietal area. Journal of Physiology 566: 29–35. Bremmer, F. et al. 2001. Polymodal motion processing in posterior parietal and premotor cortex: A human fMRI study strongly implies equivalencies between humans and monkeys. Neuron 29: 287–296. Brozzoli, C., L. Cardinali, F. Pavani, and A. Farnè. 2009a. Action specific remapping of peripersonal space. Neuropsychologia, in press. Brozzoli, C., M. L. Demattè, F. Pavani, F. Frassinetti, and A. Farnè. 2006. Neglect and extinction: Within and between sensory modalities. Restorative Neurology Neuroscience 24: 217–232. Brozzoli, C., F. Pavani, C. Urquizar, L. Cardinali, and A. Farnè. 2009b. Grasping actions remap peripersonal space. NeuroReport 20: 913–917. Bonifazi, S., A. Farnè, L. Rinaldesi, and E. Ladavas. 2007. Dynamic size-change of peri-hand space through tool-use: Spatial extension or shift of the multi-sensory area. Journal of Neuropsychology 1: 101–114. Caggiano, V., L. Fogassi, G. Rizzolatti, P. Thier, and A. Casile. 2009. Mirror neurons differentially encode the peripersonal and extrapersonal space of monkeys. Science 324: 403–406. Cardinali, L., C. Brozzoli, and A. Farnè. 2009a. Peripersonal space and body schema: Two labels for the same concept? Brain Topography 21: 252–260
Peripersonal Space
461
Cardinali, L., C. Brozzoli, and A. Farnè. 2009b. Peripersonal space and body schema. In Encyclopedia of Behavioral Neuroscience, ed. G. F. Koob, M. Le Moal, and R. R. Thompson, 40, Elsevier Science Ltd. Cardinali, L., F. Frassinetti, C. Brozzoli, C. Urquizar, A. Roy, and A. Farnè. 2009c. Tool-use induces morphological up-dating of the body schema. Current Biology 19: R478–R479. Colby, C. L., and J. R. Duhamel. 1991. Heterogeneity of extrastriate visual areas and multiple parietal areas in the macaque monkey. Neuropsychologia 29: 517–537. Colby, C. L., J. R. Duhamel, and M. E. Goldberg, 1993. Ventral intraparietal area of the macaque: Anatomic location and visual response properties. Journal of Neurophysiology 69: 902–914. Cooke, D. F., and M. S. Graziano. 2003. Defensive movements evoked by air puff in monkeys. Journal of Neurophysiology 90: 3317–3329. Cooke, D. F., and M. S. Graziano. 2004. Sensorimotor integration in the precentral gyrus: Polysensory neurons and defensive movements. Journal of Neurophysiology 91: 1648–1660. Costantini, M., D. Bueti, M. Pazzaglia, and S. M. Aglioti. 2007. Temporal dynamics of visuo-tactile extinction within and between hemispaces. Neuropsychology 21: 242–250. Crutcher, M. D., and M. R. DeLong. 1984. Single cell studies of the primate putamen: II. Relations to direction of movement and pattern of muscular activity. Experimental Brain Research 53: 244–258. Dehaene, S. 2005. Evolution of human cortical circuits for reading and arithmetic: The “neuronal recycling” hypothesis. In From Monkey Brain to Human Brain, ed. S. Dehaene, J. R. Duhamel, M. Hauser, and G. Rizzolatti, 133–157. Cambridge, MA: MIT Press. Deuel, R. K., and D. J. Regan. 1985. Parietal hemineglect and motor deficits in the monkey. Neuropsychologia 23: 305–314. di Pellegrino, G., and De Renzi, E. 1995. An experimental investigation on the nature of extinction. Neuropsychologia, 33: 153–170. di Pellegrino, G., L. Fadiga, L. Fogassi, V. Gallese, and G. Rizzolatti. 1992. Understanding motor events: A neurophysiological study. Experimental Brain Research 91: 176–180. di Pellegrino, G., E. Ladavas, and A. Farné. 1997. Seeing where your hands are. Nature 21: 730. Driver, J. 1998. The neuropsychology of spatial attention. In Attention, ed. H. Pashler, 297–340. Hove: Psychology Press. Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral Intraparietal area of the macaque: Congruent visual and somatic response properties. Journal of Neurophysiology 79: 126–136. Ettlinger, G., and J. E. Kalsbeck. 1962. Changes in tactile discrimination and in visual reaching after successive and simultaneous bilateral posterior parietal ablations in the monkey. Journal of Neurology, Neurosurgery and Psychiatry 25: 256–268. Farnè, A. et al. 2003. Visuo-motor control of the ipsilateral hand: Evidence from right brain–damaged patients. Neuropsychologia 41: 739–757. Farnè, A., S. Bonifazi, and E. Ladavas. 2005a. The role played by tool-use and tool-length on the plastic elongation of peri-hand space: A single case study. Cognitive Neuropsychology 22: 408–418. Farnè, A., Demattè, M., and Ladavas, E. 2003. Beyond the window: Multisensory representation of peripersonal space across a transparent barrier. Journal of Physiology Paris 50: 51–61. Farnè, A., M. L. Demattè, and E. Ladavas. 2005b. Neuropsychological evidence of modular organization of the near peripersonal space. Neurology 13: 1754–1758. Farnè, A., A. Iriki, and E. Ladavas. 2005c. Shaping multisensory action-space with tools: Evidence from patients with cross-modal extinction. Neuropsychologia 43: 238–248. Farnè, A., and E. Ladavas. 2000. Dynamic size-change of hand peripersonal space following tool use. NeuroReport 11: 1645–1649. Farnè, A., A. Serino, and E. Ladavas. 2007. Dynamic size-change of peri-hand space following tool-use: Determinants and spatial characteristics revealed through cross-modal extinction. Cortex 43: 436–443. Faugier-Grimaud, S., C. Frenois, and D. G. Stein. 1978. Effects of posterior parietal lesions on visually guided behavior in monkeys. Neuropsychologia 16: 151–168. Fogassi, L. et al. 1992. Space coding by premotor cortex. Experimental Brain Research 89: 686–690. Fogassi, L. et al. 1996. Coding of peripersonal space in inferior premotor cortex (area F4). Journal of Neurophysiology 76: 141–157. Fogassi, L., V. Gallese, M. Gentilucci, G. Luppino, M. Matelli, and G. Rizzolatti. 1994. The fronto-parietal cortex of the prosimian Galago: Patterns of cytochrome oxidase activity and motor maps. Behavioral Brain Research 60: 91–113. Fogassi, L., and G. Luppino. 2005. Motor functions of the parietal lobe. Current Opinion in Neurobiology 15: 626–631.
462
The Neural Bases of Multisensory Processes
Fogassi, L., V. Raos, G. Franchi, V. Gallese, G. Luppino, and M. Matelli. 1999. Visual responses in the dorsal premotor area F2 of the macaque monkey. Experimental Brain Research 128: 194–199. Gallese, V., L. Fadiga, L. Fogassi, and G. Rizzolatti. 1996. Action recognition in the premotor cortex. Brain 119: 593–609. Gallese, V., A. Murata, M. Kaseda, N. Niki, and H. Sakata. 1994. Deficit of hand preshaping after muscimol injection in monkey parietal cortex. NeuroReport 5: 1525–1529. Gallivan, J. P., C. Cavina-Pratesi, and J. C. Culham. 2009. Is that within reach? fMRI reveals that the human superior parieto-occipital cortex encodes objects reachable by the hand. Journal of Neuroscience 29: 4381–4391. Gardner, E. P. et al. 2007. Neurophysiology of prehension: I. Posterior parietal cortex and object-oriented hand behaviors. Journal of Neurophysiology 97: 387–406. Gentilucci, M. et al. 1988. Somatotopic representation in inferior area 6 of the macaque monkey. Experimental Brain Research 71: 475–490. Gentilucci, M., C. Scandolara, I. N. Pigarev, and G. Rizzolatti. 1983. Visual responses in the postarcuate cortex (area 6) of the monkey that are independent of eye position. Experimental Brain Research 50: 464–468. Godschalk, M., R. N. Lemon, H. G. Nijs, and H. G. Kuypers. 1981. Behaviour of neurons in monkey periarcuate and precentral cortex before and during visually guided arm and hand movements. Experimental Brain Research 44: 113–116. Godschalk, M., R. N. Lemon, H. G. Kuypers, and R. K. Ronday. 1984. Cortical afferents and efferents of monkey postarcuate area: An anatomical and electrophysiological study. Experimental Brain Research 56: 410–424. Godschalk, M., R. N. Lemon, H. G. Kuypers, and J. van der Steen. 1985. The involvement of monkey premotor cortex neurones in preparation of visually cued arm movements. Behavioral Brain Research 18: 143–157. Graziano, M. S. A. 1999. Where is my arm? The relative role of vision and proprioception in the neuronal representation of limb position. Proceedings of the National Academy of Sciences of the United States of America 96: 10418–10421. Graziano, M. S. A. 2001. A system of multimodal areas in the primate brain. Neuron 29: 4–6. Graziano, M. S. A., and D. F. Cooke. 2006. Parieto-frontal interactions, personal space, and defensive behavior. Neuropsychologia 44: 2621–2635. Graziano, M. S. A., and C. G. Gross. 1993. A bimodal map of space: Somatosensory receptive fields in the macaque putamen with corresponding visual receptive fields. Experimental Brain Research 97: 96–109. Graziano, M. S. A., and C. G. Gross. 1994. Multiple pathways for processing visual space. In Attention and Performance XV, ed. C. Umiltà and M. Moscovitch, 181–207. Oxford: Oxford Univ. Press. Graziano, M. S. A., and C. G. Gross. 1995. The representation of extrapersonal space: A possible role for bimodal, visuo-tactile neurons. In The Cognitive Neurosciences, ed. M. Gazzaniga, 1021–1034. MIT Press. Graziano, M. S., X. T. Hu, and C. G. Gross. 1997. Visuospatial properties of ventral premotor cortex. Journal of Neurophysiology 77: 2268–2292. Graziano, M. S., C. S. Taylor, T. Moore, and D. F. Cooke. 2002. The cortical control of movement revisited. Neuron 36: 349–362. Graziano, M. S., G. S. Yap, and C. G. Gross. 1994. Coding of visual space by premotor neurons. Science 266: 1054–1057. Halsband, U., and R. Passingham. 1982. The role of premotor and parietal cortex in the direction of action. Brain Research 240: 368–372. Holmes, N. P., G. A. Calvert, and C. Spence. 2004. Extending or projecting peripersonal space with tools? Multisensory interactions highlight only the distal and proximal ends of tools. Neuroscience Letters 372: 62–67. Holmes, N. P., D. Sanabria, G. A. Calvert, and C. Spence. 2007a. Tool-use: Capturing multisensory spatial attention or extending multisensory peripersonal space? Cortex 43: 469–489. Erratum in: Cortex 43: 575. Holmes, N. P., and C. Spence. 2004. The body schema and multisensory representations of peripersonale space. Cognitive Processing 5: 94–105. Holmes, N. P., G. A. Calvert, and C. Spence. 2007b. Tool use changes multisensory interactions in seconds: Evidence from the crossmodal congruency task. Experimental Brain Research 183: 465–476. Holmes, N. P., C. Spence, P. C. Hansen, C. E. Mackay, and G. A. Calvert. 2008. The multisensory attentional consequences of tool use: A functional magnetic resonance imaging study. PLoS One 3: e3502. Hyvärinen, J. 1981. Regional distribution of functions in parietal association area 7 of the monkey. Brain Research 206: 287–303.
Peripersonal Space
463
Hyvärinen, J., and A. Poranen. 1974. Function of the parietal associative area 7 as revealed from cellular discharges in alert monkeys. Brain 97: 673–692. Hyvärinen, J., and Y. Shelepin. 1979. Distribution of visual and somatic functions in the parietal associative area 7 of the monkey. Brain Research 169: 561–564. Iriki, A., M. Tanaka, and Y. Iwamura. 1996. Coding of modified body schema during tool use by macaque postcentral neurons. NeuroReport 7: 2325–2330. Ishida, H., K. Nakajima, M. Inase, and A. Murata. 2009. Shared mapping of own and others’ bodies in visuotactile bimodal area of monkey parietal cortex. Journal of Cognitive Neuroscience 1–14. Jeannerod, M. 1988. Motor control: Concepts and issues. New York: Wiley. Jones, E. G., and T. P. Powell. 1970. An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93: 793–820. Kurata, K., and J. Tanji. 1986. Premotor cortex neurons in macaques: Activity before distal and proximal forelimb movements. Journal of Neuroscience 6: 403–411. Kurata, K., K. Okano, and J. Tanji. 1985. Distribution of neurons related to a hindlimb as opposed to forelimb movement in the monkey premotor cortex. Experimental Brain Research 60: 188–191. Lacquaniti, F., and R. Caminiti. 1998. Visuo-motor transformations for arm reaching. European Journal of Neuroscience 10: 195–203. Review. Erratum in: European Journal of Neuroscience, 1998, 10: 810. Ladavas, E. 2002. Functional and dynamic properties of visual peripersonal space. Trends in Cognitive Sciences 6: 17–22. Ladavas, E., and A. Farnè. 2004. Visuo-tactile representation of near-the-body space. Journal of Physiology Paris 98: 161–170. Legrand, D., C. Brozzoli, Y. Rossetti, and A. Farnè. 2007. Close to me: Multisensory space representations for action and pre-reflexive consciousness of oneself-in-the-world. Consciousness and Cognition 16: 687–699. Leinonen, L. 1980. Functional properties of neurones in the posterior part of area 7 in awake monkey. Acta Physiologica Scandinava 108: 301–308. Leinonen, L., J. Hyvärinen, G. Nyman, and I. Linnankoski. 1979. I. Functional properties of neurons in lateral part of associative area 7 in awake monkeys. Experimental Brain Research, 34: 299–320. Leinonen, L., and G. Nyman. 1979. II. Functional properties of cells in anterolateral part of area 7 associative face area of awake monkeys. Experimental Brain Research 34: 321–333. Luppino, G., A. Murata, P. Govoni, and M. Matelli. 1999. ��������������������������������������������������� Largely segregated parietofrontal connections linking rostral intraparietal cortex (areas AIP and VIP) and the ventral premotor cortex (areas F5 and F4). Experimental Brain Research 128: 181–187. Lynch, J. C., V. B. Mountcastle, W. H. Talbot, and T. C. T. Yin. 1977. Parietal lobe mechanisms for directed visual attention. Journal of Neurophysiology 140: 462–489. Makin, T. R., N. P. Holmes, C. Brozzoli, Y. Rossetti, and A. Farnè. 2009. Coding of visual space during motor preparation: Approaching objects rapidly modulate corticospinal excitability in hand-centered coordinates. Journal of Neuroscience 29: 11841–11851. Makin, T. R., N. P. Holmes, and H. H. Ehrsson. 2008. On the other hand: Dummy hands and peripersonal space. Behavioral Brain Research 191: 1–10. Makin, T. R., N. P. Holmes, and E. Zohary. 2007. Is that near my hand? Multisensory representation of peri personal space in human intraparietal sulcus. Journal of Neuroscience 27: 731–740. Makin, T. R., M. Wilf, I. Schwartz, and E. Zoary. 2010. Amputees “neglect” the space near their missing hand. Psychological Science, in press. Maravita, A. 2006. From body in the brain, to body in space: Sensory and intentional aspects of body representation. In The human body: Perception from the inside out, ed. G. Knoblich, M. Shiffrar, and M. Grosjean, 65–88. Oxford Univ. Press. Maravita, A., and A. Iriki. 2004. Tools for the body (schema). Trends in Cognitive Science 8: 79–86. Maravita, A., C. Spence, and J. Driver. 2003. Multisensory integration and the body schema: Close to hand and within reach. Current Biology 13: R531–R539. Maravita, A., C. Spence, S. Kennett, and J. Driver. 2002. Tool-use changes multimodal spatial interactions between vision and touch in normal humans. Cognition 83: B25–B34. Martino, A. M., and P. L. Strick. 1987. Corticospinal projections originate from the arcuate premotor area. Brain Research 404: 307–312. Matelli, M., R. Camarda, M. Glickstein, and G. Rizzolatti. 1984a. Interconnections within the postarcuate cortex (area 6) of the macaque monkey. Brain Research 310: 388–392. Matelli, M., R. Camarda, M. Glickstein, and G. Rizzolatti. 1984b. ���������������������������������������������� Afferent and efferent projections of the inferior area 6 in the macaque monkey. Journal of Comparative Neurology 251: 281–298.
464
The Neural Bases of Multisensory Processes
Matelli, M., ������������������������������������������������������������������������������������������������� and���������������������������������������������������������������������������������������������� G. Luppino. 2001. Parietofrontal ��������������������������������������������������������������������������� circuits for action and space perception in the macaque monkey. Neuroimage 14: S27–S32. Matelli, M., G. Luppino, and G. Rizzolatti. 1985. �������������������������������������������������������������� Patterns of cytochrome oxidase activity in the frontal agranular cortex of the macaque monkey. Behavioral Brain Research 18: 125–136. Matsumura, M., and K. Kubota. 1979. Cortical projection to hand-arm motor area from post-arcuate area in macaque monkeys: A histological study of retrograde transport of horseradish peroxidase. Neuroscience Letters 11: 241–246. Maunsell, J. H., and D. C. van Essen. 1983. The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey. Journal of Neuroscience 3: 2563–2586. Meredith, M. A., and B. E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56: 640–662. Mesulam, M. M., G. W. Van Hoesen, D. N. Pandya, and N. Geschwind. 1977. Limbic and sensory connections of the inferior parietal lobule (area PG) in the rhesus monkey: A study with a new method for horseradish peroxidase histochemistry. Brain Research 136: 393–414. Milner, A. D., and M. A. Goodale. 1995. The visual brain in action. Oxford: Oxford Univ. Press. Moll, L., and H. G. Kuypers. 1977. Premotor cortical ablations in monkeys: Contralateral changes in visually guided reaching behavior. Science 198: 317–319. Mountcastle, V. B., J. C. Lynch, A. Georgopoulos, H. Sakata, and C. Acuna. 1975. Posterior parietal association cortex of the monkey: Command functions for operations within extrapersonal space. Journal of Neurophysiology 38: 871–908. Muakkassa, K. F., and P. L. Strick. 1979. Frontal lobe inputs to primate motor cortex: Evidence for four somatotopically organized ‘premotor’ areas. Brain Research 177: 176–182. Murata, A., L. Fadiga, L. Fogassi, V. Gallese, V. Raos, and G. Rizzolatti. 1997. Object representation in the ventral premotor cortex (area F5) of the monkey. Journal of Neurophysiology 78: 2226–2230. Murray, M. M. et al. 2005. Grabbing your ear: Rapid auditory–somatosensory multisensory interactions in lowlevel sensory cortices are not constrained by stimulus alignment. Cerebral Cortex 15: 963–974. Pandya, D. N., and L. A. Vignolo. 1971. Intra- and interhemispheric projections of the precentral, premotor and arcuate areas in the rhesus monkey. Brain Research 26: 217–233. Paulignan, Y., C. MacKenzie, R. Marteniuk, and M. Jeannerod. 1991. Selective perturbation of visual input during prehension movements: 1. The effects of changing object position. Experimental Brain Research 83: 502–512. Pavani, F., and U. Castiello. 2004. Binding personal and extrapersonal space through body shadows. Nature Neuroscience 7: 14–16. Pisella, L. et al. 2000. An ‘automatic pilot’ for the hand in human posterior parietal cortex: Toward reinterpreting optic ataxia. Nature Neuroscience 3: 729–736. Prabhu, G. et al. 2009. Modulation of primary motor cortex outputs from ventral premotor cortex during visually guided grasp in the macaque monkey. Journal of Physiology 587: 1057–1069. Quinlan, D. J., and J. C. Culham. 2007. fMRI reveals a preference for near viewing in the human parietooccipital cortex. Neuroimage 36: 167–187. Raos, V., M. A. Umiltá, A. Murata, L. Fogassi, and V. Gallese. 2006. Functional properties of grasping-related neurons in the ventral premotor area F5 of the macaque monkey. Journal of Neurophysiology 95: 709–729. Rizzolatti, G., R. Camarda, L. Fogassi, M. Gentilucci, G. Luppino, and M. Matelli. 1988. Functional ���������������� organization of inferior area 6 in the macaque monkey: II. Area F5 and the control of distal movements. Experimental Brain Research 71: 491–507. Rizzolatti, G., L. Fadiga, L. Fogassi, and V. Gallese. 1997. The space around us. Science 11: 190–191. Rizzolatti, G., L. Fadiga, V. Gallese, and L. Fogassi. 1996. Premotor cortex and the recognition of motor actions. Cognitive Brain Research 3: 131–141. Rizzolatti, G., and M. Gentilucci. 1988. Motor and visual–motor functions of the premotor cortex. In Neurobiology of Neocortex, ed. P. Rakic and W. Singer, 269–284. John Wiley and Sons Ltd. Rizzolatti, G., M. Gentilucci, L. Fogassi, G. Luppino, M. Matelli, and S. Ponzoni-Maggi. 1987. Neurons related to goal-directed motor acts in inferior area 6 of the macaque monkey. Experimental Brain Research 67: 220–224. Rizzolatti, G., and G. Luppino. 2001. The cortical motor system. Neuron 31: 889–901. Rizzolatti, G., M. Matelli, and G. Pavesi. 1983. Deficits in attention and movement following the removal of postarcuate (area 6) and prearcuate (area 8) cortex in macaque monkeys. Brain 106: 655–673. Rizzolatti, G., G. Luppino, and M. Matelli. 1998. The organization of the cortical motor system: New concepts. Electroencephalography and Clinical Neurophysiology 106: 283–296.
Peripersonal Space
465
Rizzolatti, G., and M. Matelli. 2003. Two different streams form the dorsal visual system: Anatomy and functions. Experimental Brain Research 153: 146–157. Rizzolatti, G., C. Scandolara, M. Gentilucci, and R. Camarda. 1981a. Response properties and behavioral modulation of “mouth” neurons of the postarcuate cortex (area 6) in macaque monkeys. Brain Research 225: 421–424. Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981b. Afferent properties of periarcuate neurons in macaque monkeys: I. Somatosensory responses. Behavioral Brain Research 2: 125–146. Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981c. Afferent properties of periarcuate neurons in macque monkeys: II. Visual responses. Behavioral Brain Research 2, 147–163. Robinson, D. L., M. E. Goldberg, and G. B. Stanton. 1978. Parietal association cortex in the primate: Sensory mechanisms and behavioral modulations. Journal of Neurophysiology 41: 910–932. Robinson, C. J., and H. Burton. 1980a. Organization of somatosensory receptive fields in cortical areas 7b, retroinsula, postauditory and granular insula of M. fascicularis. Journal of Comparative Neurology 192: 69–92. Robinson, C. J., and H. Burton. 1980b. Somatic submodality distribution within the second somatosensory (SII), 7b, retroinsular, postauditory, and granular insular cortical areas of M. fascicularis. Journal of Comparative Neurology 192: 93–108. Sakata, H., Y. Takaoka, A. Kawarasaki, and H. Shibutani. 1973. Somatosensory properties of neurons in the superior parietal cortex (area 5) of the rhesus monkey. Brain Research 64: 85–102. Seltzer, B., and D. N. Pandya. 1980. Converging visual and somatic sensory cortical input to the intraparietal sulcus of the rhesus monkey. Brain Research 192: 339–351. Sereno, M. I., and R. S. Huang. 2006. A human parietal face area contains aligned head-centered visual and tactile maps. Nature Neuroscience 9: 1337–1343. Shimazu, H., M. A. Maier, G. Cerri, P. A. Kirkwood, and R. N. Lemon. 2004. Macaque ventral premotor cortex exerts powerful facilitation of motor cortex outputs to upper limb motoneurons. Journal of Neuroscience 24: 1200–1211. Spence, C., F. Pavani, and J. Driver. 2004a. Spatial constraints on visual–tactile cross-modal distractor congruency effects. Cognitve Affective and Behavioral Neuroscience 4: 148–169. Spence, C., F. Pavani, A. Maravita, and N. Holmes. 2004b. Multisensory contributions to the 3-D representation of visuotactile peripersonal space in humans: Evidence from the crossmodal congruency task. Journal of Physiology Paris 98: 171–189. Spence, C., F. Pavani, A. Maravita, and N. P. Holmes. 2008. Multisensory interactions. In Haptic rendering: Foundations, algorithms, and applications, ed. M. C. Lin and M. A. Otaduy, 21–52. Wellesley, MA: A. K. Peters Ltd. Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press. Strick, P. L., and C. C. Kim. 1978. Input to primate motor cortex from posterior parietal cortex (area 5): I. Demonstration by retrograde transport. Brain Research 157: 325–330. Strick, P. L. 2002. Stimulating research on motor cortex. Nature Neuroscience 5: 714–715. Ungerleider, L. G., and Desimone, R. 1986. Cortical connections of visual area MT in the macaque. Journal of Comparative Neurology, 248: 190–222. Wallace, M. T., and Stein, B. E. 2007. Early experience determines how the senses will interact. Journal of Neurophysiology 97: 921–926. Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo-auditory interactions in the primary visual cortex of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9: 79. Ward, R., S. Goodrich, and J. Driver. 1994. Grouping reduces visual extinction: Neuropsychological evidence for weight-linkage in visual selection. Visual Cognition 1: 101–129.
24
Multisensory Perception and Bodily Self-Consciousness From Out-of-Body to Inside- Body Experience Jane E. Aspell, Bigna Lenggenhager, and Olaf Blanke
CONTENTS 24.1 Introduction........................................................................................................................... 467 24.2 Multisensory Disintegration in Out-of-Body and Related Experiences of Neurological Origin.....................................................................................................................................468 24.3 Using Multisensory Conflicts to Investigate Bodily Self in Healthy Subjects...................... 470 24.3.1 Body Part Studies: Rubber Hand Illusion.................................................................. 470 24.3.2 Full Body Studies...................................................................................................... 471 24.3.3 Mislocalization of Touch during FBIs....................................................................... 475 24.3.4 Multisensory First-Person Perspective...................................................................... 477 24.4 Conclusion............................................................................................................................. 478 References....................................................................................................................................... 478
24.1 INTRODUCTION The most basic foundations of the self arguably lie in those brain systems that represent the body (Blanke and Metzinger 2009; Damasio 2000; Gallagher 2005; Jeannerod 2006; Knoblich 2002; Metzinger et al. 2007). The representation of the body is complex, involving the encoding and integration of a wide range of multisensory (somatosensory, visual, auditory, vestibular, visceral) and motor signals (Damasio 2000; Gallagher 2005; Metzinger 2003). One’s own body is thus possibly the most multisensory “object” in the world. Importantly, whereas external objects of perception come and go, multisensory bodily inputs are continuously present, and have thus been proposed as the basis for bodily self-consciousness—the nonconceptual and prereflective representation of body-related information (Gallagher 2000; Haggard et al. 2003; Jeannerod 2007; Metzinger et al. 2007; Pacherie 2008). Despite the apparent unitary, global character of bodily self-consciousness, experimental manipulations have mainly focused on subglobal aspects, such as the sense of ownership and agency for one’s hand and its movements (Botvinick and Cohen 1998; Ehrsson et al. 2004; Jeannerod 2006, 2007; Knoblich 2002; Pavani et al. 2000; Tsakiris and Haggard 2005; Tsakiris et al. 2007). These latter studies on body-part representation are important (and will be discussed below in detail), yet we have argued (e.g., see Blanke and Metzinger 2009) that they fail to account for a key feature of bodily self-consciousness: its global character. This is because a fundamental aspect of bodily selfconsciousness is its association with a single, whole body, not with multiple body parts (Blanke and Metzinger 2009; Carruthers 2008; Lenggenhager et al. 2007; Metzinger et al. 2007). A number of 467
468
The Neural Bases of Multisensory Processes
recent studies (Aspell et al. 2009; Ehrsson 2007; Lenggenhager et al. 2007, 2009; Mizumoto and Ishikawa 2005; Petkova and Ehrsson 2008) have demonstrated that more global aspects of body perception can also be experimentally manipulated using multisensory conflicts. These experimental studies on healthy subjects were inspired by an unusual and revealing set of neurological phenomena—autoscopic phenomena—in which the sense of the body as a whole is disrupted in different ways, and which are likely to be caused by an underlying abnormality in the multisensory integration of global bodily inputs (Blanke and Mohr 2005). In this chapter, we first examine how the scientific understanding of bodily self-consciousness and its multisensory mechanisms can be informed by the study of autoscopic phenomena. We then present a review of investigations of multisensory processing relating to body-part perception (“rubber hand” illusion studies: Botvinick and Cohen 1998; Ehrsson et al. 2004; Tsakiris and Haggard 2005) and go on to discuss more recent “full body” illusion studies that were inspired by autoscopic phenomena and have shown that it is also possible to dissociate certain components of bodily self-consciousness—namely, self-location, self-identification, and the first-person perspective—in healthy subjects by inducing multisensory conflicts.
24.2 MULTISENSORY DISINTEGRATION IN OUT-OF-BODY AND RELATED EXPERIENCES OF NEUROLOGICAL ORIGIN The following is a description of an out-of-body experience (OBE) by Sylvan Muldoon, one of the first authors to publish detailed descriptions of his own (and others’) OBEs: “I was floating in the very air, rigidly horizontal, a few feet above the bed […] I was moving toward the ceiling, horizontal and powerless […] I managed to turn around and there […] was another ‘me’ lying quietly upon the bed” (Muldoon and Carrington 1929). We and other research groups (Irwin 1985; Brugger et al. 1997; Brugger 2002; Blanke et al. 2002, 2004; Blanke and Mohr 2005) have argued that an OBE is a breakdown of several key aspects of bodily self-consciousness, and that the study of this phenomenon is likely to lead to insights into the multisensory foundations of bodily self-consciousness. OBEs can be characterized by three phenomenological elements: the impression (1) that the self is localized outside one’s body (disembodiment or extracorporeal self-location), (2) of seeing the world from an extracorporeal and elevated first-person perspective, and (3) of seeing one’s own body from this perspective (Blanke et al. 2004; Irwin 1985). OBEs challenge our everyday experience of the spatial unity of self and body: the experience of a “real me” that “resides” in my body and is the subject or “I” of experience and thought (Blackmore 1982). OBEs have been estimated to occur in about 5% of the general population (Blackmore 1982; Irwin 1985) and they also occur in various medical conditions (Blanke et al. 2004). Several precipitating factors have been determined including certain types of neurological and psychiatric diseases (Devinsky et al. 1989; Kölmel 1985; Lippman 1953; Todd and Dewhurst 1955). OBEs have also been associated with various generalized and focal diseases of the central nervous system (Blanke et al. 2004; Brugger et al. 1997; Dening and Berrios 1994; Devinsky et al. 1989; Hécaen and Ajuriaguerra 1952; Lhermitte 1939). OBEs of focal origin mainly implicate posterior regions of the brain and some authors have suggested a primary involvement of either the temporal or parietal lobe (Blanke et al. 2004; Devinsky et al. 1989; Hécaen and Ajuriaguerra 1952; Todd and Dewhurst 1955). More recently, Blanke and colleagues (2004) argued for a crucial role for the cortex at the temporo-parietal junction (TPJ). The crucial role of the right TPJ was suggested because lesions and foci of epileptic seizures overlap in several patients with OBEs centered on this region (Blanke et al. 2004; Blanke and Mohr 2005), electrical stimulation of this region can give rise to OBE-like experiences (Blanke et al. 2002; De Ridder et al. 2007; Penfield and Erickson 1941), and because the TPJ is activated during mental imagery of disembodied self-location (Arzy et al. 2006). The role of the TPJ in OBEs makes sense on functional grounds since this region is important
Multisensory Perception and Bodily Self-Consciousness
469
for multisensory integration, vestibular processing, and in generating an egocentric perspective (Brandt and Dieterich 1999; Bremmer et al. 2001; Calvert et al. 2000; Leube et al. 2003; Ruby and Decety 2001). An individual undergoing an OBE usually experiences a dissociation between his self-location and his first-person visuospatial perspective with respect to the seen location of his own body—in other words, he perceives his own body (and the world) from a spatial location and perspective that does not coincide with the seen position of his body (Blanke et al. 2004; Blanke and Mohr 2005; Brugger et al. 1997). In OBEs the origin of the first-person visuospatial perspective is colocalized with self-location (as it is for healthy subjects under normal conditions), but the body is experienced at a different location. What causes this breakdown in the unity between self and body? To date, only a few neurological and neuroscientific investigations have been carried out on OBEs, probably because, in general, they occur spontaneously, are of short duration, and happen only once or twice in a lifetime (Irwin 1985). However, the anatomical, phenomenological, and behavioral data collected from patients has led to the hypothesis that the abnormal perceptions in OBEs are due to selective deficits in integrating multisensory body-related information into a single coherent neural representation of one’s body and its position in extrapersonal space (Blanke et al. 2004; Blanke and Mohr 2005). This theory extended previous propositions made for the related phenomena of phantom limb sensations (Brugger 2002; Brugger et al. 1997) and synesthesia (Irwin 1985). Furthermore, OBE deficits were attributed to abnormal processing at the TPJ, TPJ lesions are found in patients with OBEs (Blanke et al. 2004; Blanke and Mohr 2005), and neuroimaging studies (Arzy et al. 2006; Blanke et al. 2005; Vallar et al. 1999) have shown that this region plays an important role in multisensory integration, embodiment, and in generating an egocentric perspective in healthy subjects (see also Bremmer et al. 2001; Calvert et al. 2000; Leube et al. 2003; Ruby and Decety 2001; Schwabe et al. 2008; Vogeley and Fink 2003). More precisely, Blanke and colleagues (Blanke et al. 2004; Blanke and Mohr 2005) have proposed that OBEs occur when there is, first, a disintegration in own-body (personal) space because of incongruent tactile, proprioceptive, and visual inputs and, second, a disintegration between personal and extrapersonal space due to incongruent vestibular and visual inputs. They further suggested that the phenomenological variation between different types of autoscopic phenomena—the group of illusions characterized by an illusory multisensory duplication of one’s own body. Autoscopic phenomena include OBEs, heautoscopy, and autoscopic hallucination, and it has been proposed that they are caused by a shared disintegration in own-body (personal) space, but different levels of disintegration between personal and extrapersonal space due to vestibular disturbance. Vestibular dysfunction (mainly of otolithic origin) is greatest in OBEs, which are strongly associated with feelings of floating and elevation (usually absent in the two other autoscopic phenomena; Blanke et al. 2004). During autoscopic hallucinations patients see their body in extrapersonal space, but there is no disembodiment, no self-identification with the illusory extracorporeal body, and no change in first-person perspective (Blanke et al. 2004; Brugger et al. 1997). Autoscopic hallucinations are caused by damage that primarily implicates the temporo-occipital and parieto-occipital cortices (Blanke and Castillo 2007). Patients with heautoscopy—linked to the left TPJ (Blanke and Mohr 2005)—may experience their self-location and visuospatial perspective at the position of the physical body or at the position of the illusory body, or these may even rapidly alternate, leaving them confused about where their self is localized (Blanke et al. 2004; Brugger et al. 1994). The pronounced vestibular disturbance in OBEs and heautoscopy fits with the greater implication of the TPJ in both disorders (Blanke and Mohr 2005; Lopez et al. 2008), as the core region of vestibular cortex is located in the TPJ (Brandt and Dieterich 1999; Fasold et al. 2002; Lobel et al. 1998). These clinical data may suggest that vestibular function in the left and right TPJs may differ, with the left TPJ specialized for vestibular input from the semicircular canals and the right TPJ encoding primarily otolithic input (for more details, see Lopez et al. 2008).
470
The Neural Bases of Multisensory Processes
24.3 USING MULTISENSORY CONFLICTS TO INVESTIGATE BODILY SELF IN HEALTHY SUBJECTS Clinical patients with disturbed bodily self-consciousness due to aberrant brain processes present unique and important opportunities to study the relation between the representation of the body and the self. However, the small sample sizes of clinical studies, the often long-term pathological history of these patients, as well as other methodological concerns make it difficult to generalize these findings to normal bodily self-consciousness in healthy subjects. In the past decade, a growing number of studies have therefore used the technique of providing conflicting or ambiguous multisensory information about the body in order to “trick” the brain and induce bodily illusions in healthy subjects that resemble experiences in neurological patients. These experimental manipulations enable better-controlled and repeatable investigations of bodily self-consciousness and its underlying neural bases in large samples of healthy subjects.
24.3.1 Body Part Studies: Rubber Hand Illusion Probably the most commonly used body illusion is the so-called “rubber hand illusion” (Botvinick and Cohen 1998), in which a subject watches a rubber hand on a table being stroked in synchrony with his corresponding (left or right) hidden hand. After a few seconds this simple manipulation causes the rubber hand (Ehrsson et al. 2004; Lloyd 2007) to “feel like my own hand,” that is, to be self-attributed. This does not happen when the stroking is applied asynchronously, suggesting that an intermodal correlation of different senses is crucial for self-attribution (Botvinick and Cohen 1998). The phenomenological experience of self-attribution is accompanied by a change in where subjects localize their real stroked hand (“proprioceptive drift”; Botvinick and Cohen 1998; Tsakiris and Haggard 2005; Kammers et al. 2009; Longo et al. 2008; Schütz-Bosbach et al. 2009). It has been argued that this latter finding demonstrates that the changes in bodily self-consciousness induced by the rubber hand illusion are due to changes in low-level, multisensory body representations. Recent studies of the illusion revealed a number of further behavioral changes related to the rubber hand such as increased cortical (Ehrsson et al. 2007), physiological (skin conductance response; Ehrsson 2007; Hägni et al. 2008), and fear responses to a threat to the rubber hand. Moreover, there are also rubber hand illusion–related changes in the real stimulated hand (e.g., body part–specific decrease in skin temperature; Moseley et al. 2008). However, the relation between these different measurements is still unclear (Schütz-Bosbach et al. 2009), and recent studies discuss the possibility of the existence of multiple (parallel and serial) body representations and dimensions of bodily self-consciousness (Longo et al. 2008; Kammers et al. 2009) that are differentially affected by the rubber hand illusion. The rubber hand illusion has been explained as an effect of visual capture—the dominance of vision over other modalities in representations of the spatial location of events (Botvinick and Cohen 1998)—and has been related to properties of bimodal neurons in the parietal and premotor cortices (Ehrsson et al. 2004; Graziano et al. 2000; Iriki et al. 1996, 2001; Rizzolatti et al. 1981; Tsakiris et al. 2007). A recent article on the rubber hand illusion (Makin et al. 2008) proposed an explanatory model for the rubber hand illusion that implicates the role of multisensory integration within peri-hand space. The relative weighting (compared to that of proprioception) of visual information about hand position is greater when the hand is not moving; thus, in this situation visual information can bias proprioception. Furthermore, because vision can dominate over touch in the representation of spatial location, the brushstrokes that are seen to occur on the rubber hand may be processed as though they are occurring nearer to or on the real hand. Thus, the central representation of the location of the real hand may shift toward the rubber hand (Lloyd 2007). Given the temporal congruence of the seen and felt stroking, these inputs are integrated together as a coherent multisensory event in spatial coordinates that are shifted toward those of the rubber hand (Graziano et al. 2000). Makin and colleagues (2008) propose that this may result in the sensation of touch and
Multisensory Perception and Bodily Self-Consciousness
471
ownership being referred to the rubber hand. It should be noted that these mechanisms and this direction of causality have yet to be verified experimentally. It is worth noting that the size of the drift is generally quite small (a few centimeters) compared to the actual distance between the fake and the real hand, and that the induced changes in illusory touch and (even more so) ownership during the rubber hand illusion are most often relatively weak changes in conscious bodily experience (even after 30 min of stroking). Several studies have investigated the brain mechanisms involved in the rubber hand illusion, for example, using functional MRI (Ehrsson et al. 2004) and positron emission tomoraphy (Tsakiris et al. 2007). A systematic review of the studies using the rubber hand illusion would be beyond the scope of the present review, as this chapter focuses on scientific experimentation with full body illusions and global aspects of bodily self-consciousness. The interested reader is referred to the recent review on body part–specific aspects of bodily self-consciousness by Makin and colleagues (2008). We only note here that comparison of neuroimaging studies of the rubber hand illusion is hampered by the fact that the studies employed different methods to induce the rubber hand illusion, used different control conditions, different behavioral proxies to quantify illusory touch and ownership, and employed different brain imaging techniques. Not surprisingly, though, these studies implicated several key brain areas that have previously been shown to be important in multisensory integration, such as the premotor and intraparietal cortices as well as the TPJ, insula, extrastriate cortex, and the cerebellum.
24.3.2 Full Body Studies Although illusory ownership in the rubber hand illusion exemplifies a deviant form of bodily selfconsciousness, the illusion only affects partial ownership, or the attribution and localization of a hand with respect to the global bodily self, that is, it is characterized by a change in part-to-whole relationships. As we have seen, the situation is different in neurological patients who have illusory perceptions of their full bodies such as in OBEs and heautoscopy. These states are characterized by abnormal experience with respect to the global bodily self, for example, a mislocalization and a misidentification of the entire body and self (Blanke et al. 2004; Blanke and Mohr 2005; Brugger et al. 1997). Recent studies in healthy subjects (Ehrsson 2007; Lenggenhager et al. 2007, 2009; Mizumoto and Ishikawa 2005) have therefore sought to investigate these global aspects of selfconsciousness (self-location and self-identification) by the systematic manipulation of the multisensory cues that the brain uses to create a representation of self-location and identity. As we shall see, these experimental setups have allowed us to gain insight into the biological mechanisms that are important for humans’ everyday “inside-body experience.” They show that this experience—which is often taken for granted (“where else should I be localized than in my body?”)—is made possible by active multisensory brain processes. Two groups (Ehrsson 2007; Lenggenhager et al. 2007) have separately developed novel techniques to dissociate (1) the location of the physical body, (2) the location of the self (self-location), (3) the location of the origin of the first-person visuospatial perspective, and (4) self-identification. Both groups utilized congruent and incongruent visual–tactile stimulation to alter these four aspects of bodily self-consciousness by extending a protocol similar to that used in the rubber hand illusion (Botvinick and Cohen 1998)—to the full body (see Figure 24.1; see also Altschuler and Ramachandran 2007; Mizumoto and Ishikawa 2005; Stratton 1899). The general idea in these full body studies is to mislead subjects about where they experience their body and/or self to be, and/or with what location and which body they self-identify with. To achieve this, a visual (real-time video) image of their body was presented via a head-mounted display (HMD) that was linked to a video camera that filmed their back from behind (Figure 24.1). They were thus able to see themselves from an “outside” or third-person visuospatial perspective, as though they were viewing their own body from the visuospatial perspective of the camera (note that this is related to changes in perspective during OBEs). In one study (Lenggenhager et al. 2007), subjects viewed the video image of their
472
The Neural Bases of Multisensory Processes
FIGURE 24.1 Experimental setup in synchronous (back) stroking condition in Lenggenhager et al.’s (2007) study (top panel) and in synchronous (chest) stroking condition in Ehrsson’s (2007) study (bottom panel). In both panels, the physical body of the subject is light-colored and the dark-colored body indicates the hypothesized location of the perceived body (bodily self). (Modified from Lenggenhager, B. et al., Consciousness and Cognition, 18(1), 110–117, 2009.)
body (the “virtual body”) while they were stroked on their real back with a stick. This stroking was felt on their back and also seen in front on the virtual body either simultaneously (in real time) or not (when delayed by a video delay). The stroking manipulation thus generated either congruent (synchronous) or incongruent (asynchronous) visuo-tactile stimulation (as had been shown to affect the perception of hand ownership and hand location in the rubber hand illusion; Botvinick and Cohen 1998). It was found that the illusion of self-identification with the virtual body (i.e., global ownership, the feeling that “the virtual body is my body”) and the referral of touch (“feeling the touch of the stick where I saw it touching my virtual body”) were both stronger when subjects were stroked synchronously than when they were stroked asynchronously (Lenggenhager et al. 2007). Selflocation was also measured by passively displacing blindfolded subjects after the stroking period and then asking them to walk back to the original position. Note that, as predicted, self-location was experienced at a position that was closer to the virtual body, as if the subject was located “in front” of the position where (s)he had been standing during the experiment. This ensemble of measures has been termed the full body illusion (FBI). In a related study (Ehrsson 2007), subjects were stroked on their chest (Figure 24.1). They were seated while they viewed themselves (via an HMD) from behind, and they could see a stick moving (synchronous or asynchronous with the touch) just below the camera’s lens. In this case, subjects felt that the stick they saw was touching their real chest, they self-identified with the camera’s location and they felt that looking at the virtual body was like viewing the body of someone else (i.e., decreased self-identification with the virtual body). Self-location was not quantified in this study by using the drift measure as in Lenggenhager et al.’s (2007) study; instead, a threatening stimulus was
Multisensory Perception and Bodily Self-Consciousness
473
presented to the apparent location of the origin of the visuospatial perspective (just below the camera). The skin conductance response to a swinging hammer (approaching the camera) was found to be higher during synchronous stroking than during asynchronous stroking, providing implicit physiological evidence that subjects self-identified with a spatial position that was displaced toward the position of the camera. There were several differences in bodily experiences in these two similar setups, and it is worth considering what may account for these. Meyer (2008) proposed (in a response to these studies) that in both setups the brain may use at least four different sources of information to generate the conscious experience of self-location and self-identification: (1) where the body is seen, (2) where the world is seen from (the origin of the visuospatial perspective), (3) where the touch is seen to occur, and (4) where the touch is felt to occur. These four “cues” do not correspond in both experimental setups (but in everyday life, they usually do). Meyer argued that the most important of these cues for the conscious experience of self-location might be where the touch is seen to occur (i.e., where the stroking stick is seen). He concluded this because, first, in neither setup did self-location (measured via drift by Lenggenhager et al. 2007 or assessed via a questionnaire score by Ehrsson 2007) exactly coincide with the location where the touch was felt (i.e., where the physical body was located). Second, the seen location of the virtual body biased self-location in one study (Lenggenhager et al. 2007) but not in the other (Ehrsson 2007), and third, the location of the visuospatial perspective corresponded to self-location in Ehrsson’s (2007) study but not in Lenggenhager et al.’s (2007) study. However, in both cases (during synchronous stroking), self-location coincided with (or more accurately, was biased toward) the location where the touch was seen to occur (i.e., the seen location of the stroking stick). It is not very surprising that the tactile sense appears to have the weakest role in determining self-location. Touch, after all, cannot give any reliable information regarding the location of the body in external space, except via tactile contact with external surfaces. There is, however, an additional important point to consider regarding the four cues: self-location was biased toward the virtual body more when the seen stroking was synchronous with the felt stroking than when it was asynchronous (Blanke et al. 2008). Thus, the congruence between tactile and visual input is an additional important factor in determining self-location in this context. It seems that when vision and touch are incongruent, the influence of the “visual information about stroking” is weaker and not preeminent as Meyer implies. Thus, in the asynchronous condition, subjects’ self-location is closer to where the touch is felt (i.e., where their physical body is actually located) than it is in the synchronous condition. It should be noted that different methods (different experimental conditions and dependent variables to quantify changes in bodily self-consciousness) were used in these studies (Ehrsson 2007; Lenggenhager et al. 2007). It is therefore difficult to make meaningful, direct comparisons between the results of these studies. A more recent study (Lenggenhager et al. 2009) therefore sought to directly compare the approaches presented in these previous studies by using identical body positions and measures in order to quantify the conscious experience of self-identification, first-person visuospatial perspective, and self-location. In addition, the authors investigated these aspects of bodily self-consciousness while subjects were tested in the supine position (as OBEs usually occur in this position; Bünning and Blanke 2005; Green 1968). Subjects were again fitted with an HMD that displayed a video image of their body. Their virtual body thus appeared to be located below their physical body (see Figure 24.2). The dependent behavioral measure for the quantification of self-location was a new one: a “mental ball dropping” (MBD) task in which subjects had to imagine that a ball fell from their hand, and they had to press one button when they imagined that it left their grasp, and then another button when they imagined that it hit the floor. The authors hypothesized that MBD estimation would be greater (i.e., the time that subjects imagined it would take for the ball to reach the ground would be longer) when subjects’ self-location (where they perceived their self to be) was higher from the ground than when it was closer to the ground. The prediction in this study was that, compared to asynchronous stroking,
474
The Neural Bases of Multisensory Processes
FIGURE 24.2 Experimental setup in synchronous (back) stroking condition (top panel) and synchronous (chest) stroking condition (bottom panel) in Lenggenhager et al.’s (2009) study. Subject was filmed from above and viewed the scene via an HMD. Light-colored body indicates where subjects’ real body was located and dark-colored body, the hypothesized location of the perceived body (bodily self). (Modified from Lenggenhager, B. et al., Consciousness and Cognition, 18(1), 110–117, 2009.)
synchronous back stroking would lead to a “downward” shift in self-location (toward the virtual body, seen as though below subjects) and an increased self-identification with the virtual body. Synchronous chest stroking, conversely, would lead to an “upward” shift in self-location (“away” from the virtual body seen below), and a decreased self-identification with the virtual body. As predicted, self-identification with the virtual body and referral of touch to the virtual body were found to be greater during synchronous than during asynchronous back stroking. In contrast, during synchronous chest stroking, there was decreased self-identification with the virtual and decreased illusory touch. The MBD time estimates (quantifying self-location) were lower for synchronous back stroking than synchronous chest stroking, suggesting that, as predicted, self-location was more biased toward the virtual body in the synchronous back stroking condition and relatively more toward the location of the visuospatial perspective (a third-person perspective) in the synchronous chest stroking condition. This study confirmed the earlier suggestion that self-location and selfidentification are strongly influenced by where the stroking is seen to occur. Thus, self-location was biased toward the virtual body located as though below (or in front) when subjects were stroked on the back, and biased toward the location of the visuospatial perspective (behind/above the virtual body) when subjects were stroked on their chests. These studies revealed that humans’ “insidebody” self-location and “inside-body” first-person perspective can be transferred to an extracorporeal self-location and a third-person perspective. It is notable that the subjective upward drift in self-location during synchronous chest stroking was correlated with sensations of elevation and floating (as assessed by questionnaires). This
Multisensory Perception and Bodily Self-Consciousness
475
suggests that when subjects adopt a relaxed prone position—synchronous visual–tactile events may interfere with vestibular processing. The importance of vestibular (otolith) input in abnormal selflocation has already been demonstrated (Blanke et al. 2002, 2004). Furthermore, there is evidence that vestibular cues may interfere with body and self-representation (Le Chapelain et al. 2001; Lenggenhager et al. 2008; Lopez et al. 2008; Yen Pik Sang et al. 2006). The relatively motionless prone body position of the subjects in this study would have minimized vestibular sensory updating and thus may have further contributed to the occurrence of such vestibular sensations, highlighting their potential relevance for bodily self-consciousness, OBEs, and related experiences (see also Lopez et al. 2008; Schwabe and Blanke 2008). Can the mechanisms (explained above) for the rubber hand illusion also explain the changes in self-location, first-person perspective, and self-identification during the FBI? It is probable that some mechanisms are shared but there are likely to be several important conceptual, behavioral, and neurobiological differences. The finding that in the FBI there appears to be referral of touch to a virtual body viewed as though at a distance of 2 m away is in contrast to the finding that the rubber hand illusion is greatly weakened or abolished by changing the posture of the rubber hand to an implausible one (Tsakiris and Haggard 2005) or by placing the rubber hand at more distant positions (Lloyd 2007). Viewing one’s body from an external perspective at 2 m distance is even less “anatomically plausible” than a rubber hand with a misaligned posture; therefore, it is perhaps surprising that the FBI occurs at all under such conditions. However, it has been shown that the visual receptive field size of parietal bimodal neurons with tactile receptive fields centered on the shoulder or the back can be very large—extending sometimes for more than a meter in extrapersonal space (Duhamel et al. 1998; Maravita and Iriki 2004). Shifts in the spatial characteristics of such trunk-centered bimodal neurons may thus account for the observed changes during the FBI (Blanke and Metzinger 2009). What these differences illustrate is that the constraints operating in the FBI are in certain ways markedly different to those operating in the rubber hand illusion. They appear similar in that the strength of both illusions depends on the temporal congruence between seen and felt stroking. However, the constraints regarding the spatial relations between the location of the origin of the first-person visuospatial perspective and the rubber hand are different to those between the location of the origin of the first-person visuospatial perspective and the location of the seen virtual body (see also Blanke and Metzinger 2009). Moreover, in the RHI it is the hand with respect to the body that is mislocalized: a “body part–body” interaction. In the FBI the entire body (the bodily self) is mislocalized within external space: a “body–world” interaction. It may be that the “whole body drift” entails that (during the synchronous condition) the “volume” of peri personal space is relocated (toward the virtual body) within a stable external space (compatible with subjective reports during OBEs). Alternatively, it may be that peripersonal and extrapersonal space are modified. The dimensions of the external room—for example, the proximity of walls to the subjects—are likely to affect the FBI more than the RHI, but this has not been systematically tested yet. Given the differences between the illusions, it is to be expected that there should be differences in both the spatial constraints and neural bases (at the level of bimodal visuo-tactile neurons and of brain regions encoding multisensory bodily signals) between these illusions.
24.3.3 Mislocalization of Touch during FBIs The studies discussed above (Ehrsson 2007; Lenggenhager et al. 2007, 2009) suggest that during the FBI changes in self-location and self-identification are accompanied by a mislocalization of touch, that is, the feeling of touch is biased toward where the touch is seen on one’s own body in extrapersonal space. However, the evidence for this that was presented in these studies (Ehrsson 2007; Lenggenhager et al. 2007, 2009) came only from questionnaire ratings, specifically the statements: “It seemed as if I was feeling the touch in the location where I saw the virtual body touched” (Lenggenhager et al. 2007, 2009) and “I experienced that the hand I was seeing approaching the cameras was directly touching my chest (with the rod)” (Ehrsson 2007). Questionnaire
476
The Neural Bases of Multisensory Processes
ratings, being explicit judgments, are susceptible to various biases, for example, experimenter expectancy effects. Also, the questions were asked only after the period of stroking, not during, and so were not “online” measures of bodily self-consciousness. Furthermore, as recently pointed out (Ehrsson and Petkova 2008), such questions are somewhat ambiguous in a VR setup: they are, arguably, unable to distinguish between self-identification with a virtual body and self-recognition in a VR/video system. A more recent study (Aspell et al. 2009) therefore developed an online measure for the mislocalization of touch that would be less susceptible to response biases and that would test more directly whether tactile mapping is altered during the FBI. This study investigated whether modifications in bodily self-consciousness are associated with changes in tactile spatial representations. To investigate this, the authors (Aspell et al. 2009) adapted the cross-modal congruency task (Spence et al. 2004) for the full body. This task was used because the cross-modal congruency effect (CCE) measured in the task can function as a behavioral index of the perceived proximity of visual and tactile stimuli. In previous studies of the CCE (Igarashi et al. 2008; Pavani and Castiello 2004; Pavani et al. 2000; Shore et al. 2006; Spence et al. 2004), the visual and tactile stimuli were presented on foam cubes held in the hands: single vibrotactile devices paired with small lights [light emitting diodes (LEDs)] were positioned next to the thumb and index finger of each hand. Subjects made speeded elevation discriminations (“up”/index or “down”/thumb) of the tactile stimuli while attempting to ignore the visual distractors. It was found that subjects performed worse when a distracting visual stimulus occurred at an incongruent elevation with respect to the tactile (target) stimulus. Importantly, the CCE (difference between reaction times during incongruent and congruent conditions) was larger when the visual and tactile stimuli occurred closer to each other in space (Spence et al. 2004). The CC task was adapted for the full body (from the typical setup for the hands; Spence et al. 2004) by placing the vibrotactile and LEDs on the subject’s torso (back). Subjects were able to view their body and the LEDs via an HMD (see Figure 24.3) as the setup was similar to that used in the previous FBI study (Lenggenhager et al. 2007). To investigate whether “full body CCEs” would be associated in a predictable way with changes in bodily self-consciousness, subjects’ self-identification with the virtual body and self-location were manipulated across different blocks by employing either synchronous or asynchronous stroking of the subjects’ backs. CCEs were measured during
Synchronous stroking
Asynchronous stroking
FIGURE 24.3 Subject stood 2 m in front of a camera with a 3-D encoder. Four light vibration devices were fixed to the subject’s back, the upper two at inner edges of the shoulder blades and the lower two 9 cm below. Small inset windows represent what the subject viewed via the head mounted device. (1) Left panel: synchronous stroking condition. (2) Right panel: asynchronous stroking condition. (Modified from Aspell, J. E. et al., PLoS ONE, 4(8), e6488, 2009.)
Multisensory Perception and Bodily Self-Consciousness
477
the stroking period and, as predicted, were found to be larger during synchronous than asynchronous blocks, indicating that, as predicted, there was a greater mislocalization of touch during synchronous stroking compared to during asynchronous stroking. [Note that although a number of components—attention, response bias, and multisensory integration—are all thought to contribute to the CCE to varying degrees (e.g., depending on the stimulus-onset asynchrony between the visual and tactile stimuli)—the finding of a difference in the CCE between same and different side stimuli during the synchronous condition, but not during the asynchronous condition, indicates that the visual and tactile stimuli were represented as being closer to each other in the former case.] In the synchronous condition, there was also a greater bias in self-location toward the virtual body and a greater self-identification with the virtual body compared to in asynchronous blocks (as in Lenggenhager et al. 2007). Control conditions revealed that the modulating effect of spatial remapping of touch was body-specific. Interestingly, this study also found that the size of the CCE, the degree of self-identification with, and the bias in self-location toward the virtual body were all modulated by the stimulus onset synchrony between the visual and vibrotactile stimuli used in the CCE task. These data thus suggest that certain key components of bodily self-consciousness—that is, “what I experience as my body” (self-identification) and “where I experience my body to be” (self-location)—are associated with changes in the spatial representation of tactile stimuli. They imply that a greater degree of visual capture of tactile location occurs when there is a greater degree of self-identification for the seen body. This change in the tactile spatial representation of stimuli is not a remapping on the body, but is, we suggest, a change in tactile mapping with respect to extrapersonal space: the tactile sensations are perceived at a spatial location biased toward the virtual body.
24.3.4 Multisensory First-Person Perspective Less work has been carried out on the question of whether the experienced spatial position of the first-person perspective can be dissociated from that of self-location (Blanke and Metzinger 2009; Schwabe and Blanke 2008). The aforementioned FBI studies suggest that the first-person visuospatial perspective can (at least with a video setup) be dissociated from self-location in healthy subjects. This has rarely been reported in patients with own body illusions such as OBEs and related experiences. As seen above, in a typical OBE the self is experienced as “colocalized” with the first-person visuospatial perspective. However, a recent neurological study (De Ridder et al. 2007) showed that intracranial electrical stimulation at the right TPJ may lead to the experience of dissociation of selflocation from the first-person visuospatial perspective. Thus, the patient experienced extracorporeal self-location and disembodiment to a position behind his body, but perceived the environment from his normal, body-centered, first-person visuospatial perspective (and not from the disembodied perspective as is classically reported by people with OBEs). Furthermore, some patients suffering from heautoscopy may experience two rapidly alternating first-person visuospatial perspectives and selflocations (Blanke et al. 2004; Brugger et al. 1994). In such patients, the first-person visuospatial perspective may sometimes even be experienced at two positions at the same time and this is often associated with feelings of bilocation: the experience of a duplicated or split self, that is, not just a split between body and self as in OBEs, but between two experienced self-locations (see also Lopez et al. 2008). The first-person visuospatial perspective is perhaps the only perspective that usually comes to mind, and yet vision is not the only modality with an inherent “perspectivalness” (Metzinger 2003; Metzinger et al. 2007)—there is certainly also an auditory first-person perspective and possibly also “perspectives” based primarily on proprioceptive and motor signals (Schwabe and Blanke 2008). Again, in healthy subjects the auditory perspective and visual perspective are spatially congruent, and yet patients with heautoscopy may describe spatial incongruence between these perspectives (for further examples and discussion, see Blanke et al. 2004; Blanke and Metzinger 2009).
478
The Neural Bases of Multisensory Processes
24.4 CONCLUSION Studies of OBEs of neurological origin have influenced current scientific thinking on the nature of global bodily self-consciousness. These clinical studies have highlighted that bodily self-consciousness can be broken down into three key components: self-location, first-person perspective, and selfidentification (Blanke and Metzinger 2009). The phenomenology of OBEs and related experiences demonstrates that these three components are dissociable, suggesting that they may have distinct functional and neural bases. The first empirical investigations into the key dimensions of bodily self-consciousness that we have reviewed here show that it is also possible to study and dissociate these three components of the global bodily self in healthy subjects. Future studies should seek to develop experimental settings in which bodily self-consciousness can be manipulated more robustly and more strongly in healthy subjects. It will also be important for future studies to characterize in detail the neural machinery that leads to the described experiential and behavioral changes in bodily self-consciousness. The TPJ is likely to be crucially involved (Blanke et al. 2004; Blanke and Mohr 2005), but we expect that other areas such as the medial prefrontal cortex (Gusnard et al. 2001) and the precuneus (Northoff and Bermpohl 2004), as well as somatosensory (Ruby and Decety 2001) and vestibular cortex (Lopez et al. 2008) will also be found to contribute to bodily self-consciousness. Will it ever be possible to experimentally induce full-blown OBEs in healthy subjects? OBEs have previously been induced using direct brain stimulation in neurological patients (Blanke et al. 2002; De Ridder et al. 2007; Penfield 1955), but these clinical examinations can only be carried out in a highly selective patient population, whereas related techniques, such as transcranial magnetic stimulation do not induce similar effects (Blanke and Thut 2007). Blackmore (1982, 1984) has listed a number of behavioral procedures that may induce OBEs, and it may be interesting for future empirical research to employ some of these “induction” methods in a systematic manner in combination with well-controlled scientific experimentation. It is important to note that OBEs were not actually induced in the studies (Ehrsson 2007; Lenggenhager et al. 2007, 2009) that used videoprojection, but rather produced states that are more comparable to heautoscopy. Where will we find techniques to create experimental setups able to induce something even closer to an OBE? We believe that virtual reality technology, robotics, and methods from the field of vestibular physiology may be promising avenues to explore.
REFERENCES Altschuler, E., and V. Ramachandran. 2007. A simple method to stand outside oneself. Perception 36(4): 632–634. Arzy, S., G. Thut, C. Mohr, C. M. Michel, and O. Blanke. 2006. Neural basis of embodiment: Distinct contributions of temporoparietal junction and extrastriate body area. Journal of Neuroscience 26(31): 8074–8081. Aspell, J. E., B. Lenggenhager, and O. Blanke. 2009. Keeping in touch with one’s self: Multisensory mechanisms of self-consciousness. PLoS ONE 4(8): e6488. Blackmore, S. 1982. Beyond the body. An investigation of out-of-body experiences. London: Heinemann. Blackmore, S. 1984. A psychological theory of the out-of-body experience. Journal of Parapsychology 48: 201–218. Blanke, O., T. Landis, L. Spinelli, and M. Seeck. 2004. Out-of-body experience and autoscopy of neurological origin. Brain 127(2): 243–258. Blanke, O., and T. Metzinger. 2009. Full-body illusions and minimal phenomenal selfhood. Trends in Cognitive Sciences 13(1): 7–13. Blanke, O., T. Metzinger, and B. Lenggenhager. 2008. Response to Kaspar Meyer’s E-letter. Science E-letter. Blanke, O., and C. Mohr. 2005. Out-of-body experience, heautoscopy, and autoscopic hallucination of neurological origin: Implications for neurocognitive mechanisms of corporeal awareness and self-consciousness. Brain Research Reviews 50(1): 184–199. Blanke, O., S. Ortigue, T. Landis, and M. Seeck. 2002. Neuropsychology: Stimulating illusory own-body perceptions. Nature 419(6904): 269–270.
Multisensory Perception and Bodily Self-Consciousness
479
Blanke, O., and V. Castillo. 2007. Clinical neuroimaging in epileptic patients with autoscopic hallucinations and out-of-body experiences. Epileptologie 24: 90–95. Blanke, O., and G. Thut. 2007. Inducing out of body experiences. In Tall Tales, ed. G. Della Sala. Oxford: Oxford Univ. Press. Botvinick, M., and J. Cohen. 1998. Rubber hands ‘feel’ touch that eyes see. Nature 391(6669): 756–756. Brandt, T., and M. Dieterich. 1999. The vestibular cortex: Its locations, functions, and disorders. Annals of the New York Academy of Science 871(1): 293–312. Bremmer, F., A. Schlack, J.-R. Duhamel, W. Graf, and G. R. Fink. 2001. Space coding in primate posterior parietal cortex. NeuroImage 14(1): S46–S51. Brugger, P. 2002. Reflective mirrors: Perspective-taking in autoscopic phenomena. Cognitive Neuropsychiatry 7: 179–194. Brugger, P., R. Agosti, M. Regard, H. Wieser, and T. Landis. 1994. Heautoscopy, epilepsy, and suicide. Journal of Neurology, Neurosurgery & Psychiatry 57(7): 838–839. Brugger, P., M. Regard, and T. Landis. 1997. Illusory reduplication of one’s own body: Phenomenology and classification of autoscopic phenomena. Cognitive Neuropsychiatry 2(1): 19–38. Bünning, S., and O. Blanke. 2005. The out-of body experience: Precipitating factors and neural correlates. In Progress in Brain Research, Vol. 150, 331–350. Amsterdam, The Netherlands: Elsevier. Calvert, G. A., R. Campbell, and M. J. Brammer. 2000. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology 10(11): 649–657. Carruthers, G. 2008. Types of body representation and the sense of embodiment. Consciousness and Cognition 17: 1302–1316. Damasio, A. R. 2000. The feeling of what happens: Body and emotion in the making of consciousness. New York: Harcourt Brace. Dening, T. R., and G. E. Berrios. 1994. Autoscopic phenomena. The British Journal of Psychiatry 165: 808–817, doi:10.1192/bjp.165.6.808. De Ridder, D., K. Van Laere, P. Dupont, T. Menovsky, and P. Van de Heyning. 2007. Visualizing out-of-body experience in the brain. New England Journal of Medicine 357(18): 1829–1833. Devinsky, O., E. Feldmann, K. Burrowes, and E. Bromfield. 1989. Autoscopic phenomena with seizures. Archives of Neurology 46(10): 1080–1088. Duhamel, J., C. L. Colby, and M. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent visual and somatic response properties. Journal of Neurophysiology 79(1): 126–136. Ehrsson, H. 2007. The experimental induction of out-of-body experiences. Science 317(5841): 1048, doi:10.1126/science.1142175. Ehrsson, H., and V. Petkova. 2008. Response to Kaspar Meyer’s E-letter. Science, E-Letter. Ehrsson, H., C. Spence, and R. Passingham. 2004. That’s my hand! Activity in premotor cortex reflects feeling of ownership of a limb. Science 305(5685): 875–877, doi:10.1126/science.1097011. Ehrsson, H. H., N. P. Holmes, and R. E. Passingham. 2005. Touching a rubber hand: Feeling of body ownership is associated with activity in multisensory brain areas. The Journal of Neuroscience 25: 10564–10573, doi:10.1523/jneurosci.0800-05.2005. Ehrsson, H. H., K. Wiech, N. Weiskopf, R. J. Dolan, and R. E. Passingham. 2007. Threatening a rubber hand that you feel is yours elicits a cortical anxiety response. Proceedings of the National Academy of Sciences 104: 9828–9833, doi:10.1073/pnas.0610011104. Fasold, O. et al. 2002. Human vestibular cortex as identified with caloric stimulation in functional magnetic resonance imaging. NeuroImage 17: 1384–1393. Gallagher, S. 2000. Philosophical conceptions of the self: Implications for cognitive science. Trends in Cognitive Sciences 4(1): 14–21. Gallagher, S. 2005. How the body shapes the mind. Oxford: Clarendon Press. Graziano, M., D. Cooke, and C. Taylor. 2000. Coding the location of the arm by sight. Science 290(5497): 1782–1786. Green, C. 1968. Out-of-body experiences. Oxford: Institute of Psychophysical Research. Gusnard, D. A., E. Akbudak, G. L. Shulman, and M. E. Raichle. 2001. Medial prefrontal cortex and self-referential mental activity: Relation to a default mode of brain function. Proceedings of the National Academy of Sciences of the United States of America 98(7): 4259–4264. Haggard, P., M. Taylor-Clarke, and S. Kennett. 2003. Tactile perception, cortical representation and the bodily self. Current Biology 13(5): R170–R173. Hägni, K., K. Eng, M.-C. Hepp-Reymond, L. Holper, B. Keisker, E. Siekierka et al. 2008. Observing virtual arms that you imagine are yours increases the galvanic skin response to an unexpected threat. PLoS ONE 3(8): e3082.
480
The Neural Bases of Multisensory Processes
Hécaen, H., and J. Ajuriaguerra. 1952. Méconnaissances et hallucinations corporelles: intégration et désintégration de la somatognosie. Masson. Igarashi, Y., Y. Kimura, C. Spence, and S. Ichihara. 2008. The selective effect of the image of a hand on visuo tactile interactions as assessed by performance on the crossmodal congruency task. Experimental Brain Research 184(1): 31–38. Iriki, A., M. Tanaka, and Y. Iwamura. 1996. Coding of modified body schema during tool use by macaque postcentral neurones. NeuroReport 7: 2325–2330. Iriki, A., M. Tanaka, S. Obayashi, and Y. Iwamura. 2001. Self-images in the video monitor coded by monkey intraparietal neurons. Neuroscience Research 40: 163–173. Irwin, H. 1985. Flight of mind: A psychological study of the out-of-body experience. Metuche, NJ: Scarecrow Press. Jeannerod, M. 2006. Motor cognition: What actions tell the self. Oxford, UK: Oxford Univ. Press. Jeannerod, M. 2007. Being oneself. Journal of Physiology – Paris 101(4–6): 161–168. Kammers, M. P. M., F. de Vignemont, L. Verhagen, and H. C. Dijkerman. 2009. The rubber hand illusion in action. Neuropsychologia 47: 204–211, doi:10.1016/j.neuropsychologia.2008.07.028. Knoblich, G. 2002. Self-recognition: Body and action. Trends in Cognitive Sciences 6(11): 447–449. Kölmel, H. 1985. Complex visual hallucinations in the hemianopic field. Journal of Neurology, Neurosurgery and Psychiatry 48: 29–38. Le Chapelain, L., J. Beis, J. Paysant, and J. André. 2001. Vestibular caloric stimulation evokes phantom limb illusions in patients with paraplegia. Spinal Cord 39(2): 85–87. Lenggenhager, B., C. Lopez, and O. Blanke. 2008. Influence of galvanic vestibular stimulation on egocentric and object-based mental transformations. Experimental Brain Research 184(2): 211–221. Lenggenhager, B., M. Mouthon, and O. Blanke. 2009. Spatial aspects of bodily self-consciousness. Consciousness and Cognition 18(1): 110–117. Lenggenhager, B., T. Tadi, T. Metzinger, and O. Blanke. 2007. Video ergo sum: Manipulating bodily selfconsciousness. Science 317(5841): 1096–1099. Leube, D. T., G. Knoblich, M. Erb, W. Grodd, M. Bartels, and T. T. J. Kircher. 2003. The neural correlates of perceiving one’s own movements. NeuroImage 20(4): 2084–2090. Lhermitte, J. 1939. In L’image de notre corps, 170–227. L’Harmattan. Lippman, C. 1953. Hallucinations of physical duality in migraine. Journal of Nervous and Mental Disease 117: 345–350. Lloyd, D. M. 2007. Spatial limits on referred touch to an alien limb may reflect boundaries of visuo-tactile peripersonal space surrounding the hand. Brain and Cognition 64(1): 104–109. Lobel, E., J. F. Kleine, D. L. Bihan, A. Leroy-Willig, and A. Berthoz. 1998. Functional MRI of galvanic vestibular stimulation. Journal of Neurophysiology 80: 2699–2709. Longo, M. R., S. Cardozo, and P. Haggard. 2008. Visual enhancement of touch and the bodily self. Consciousness and Cognition 17: 1181–1191. Lopez, C., P. Halje, and O. Blanke. 2008. Body ownership and embodiment: Vestibular and multisensory mechanisms. Neurophysiologie Clinique/Clinical Neurophysiology 38(3): 149–161. Makin, T. R., N. P. Holmes, and H. H. Ehrsson. 2008. On the other hand: Dummy hands and peripersonal space. Behavioural Brain Research 191(1): 1–10. Maravita, A., and A. Iriki. 2004. Tools for the body (schema). Trends in Cognitive Sciences 8(2): 79–86. Metzinger, T. 2003. Being no one. The self-model theory of subjectivity. Cambridge, MA: MIT Press. Metzinger, T., B. Rahul, and K. C. Bikas. 2007. Empirical perspectives from the self-model theory of subjectivity: A brief summary with examples. In Progress in Brain Research, Vol. 168, 215–245, 273–278. Amsterdam, The Netherlands: Elsevier. Meyer, K. 2008. How does the brain localize the self? Science, E-Letter. Mizumoto, M., and M. Ishikawa. 2005. Immunity to error through misidentification and the bodily illusion experiment Journal of Consciousness Studies 12(7): 3–19. Muldoon, S., and H. Carrington. 1929. The projection of the astral body. London: Rider and Co. Northoff, G., and F. Bermpohl. 2004. Cortical midline structures and the self. Trends in Cognitive Sciences 8(3): 102–107. Pacherie, E. 2008. The phenomenology of action: A conceptual framework. Cognition 107(1): 179–217. Pavani, F., and U. Castiello. 2004. Binding personal and extrapersonal space through body shadows. Nature Neuroscience 7(1): 14–16. Pavani, F., C. Spence, and J. Driver. 2000. Visual capture of touch: Out-of-the-body experiences with rubber gloves. Psychological Science 11(5): 353–359.
Multisensory Perception and Bodily Self-Consciousness
481
Penfield, W. 1955. The 29th Maudsley lecture—The role of the temporal cortex in certain psychical phenomena. Journal of Mental Science 101(424): 451–465. Penfield, W., and T. Erickson. 1941. Epilepsy and Cerebral Localization. Oxford, England: Charles C. Thomas. Petkova, V., and H. H. Ehrsson. 2008. If I were you: Perceptual illusion of body swapping. PLoS ONE 3(12): e3832. Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981. Afferent properties of periarcuate neurons in macaque monkeys. II. Visual responses. Behavioural Brain Research 2: 147–163. Ruby, P., and J. Decety. 2001. Effect of subjective perspective taking during simulation of action: A PET investigation of agency. Nature Neuroscience 4(5): 546–550. Schütz-Bosbach, S., J. Musil, and P. Haggard. 2009. Touchant-touché: The role of self-touch in the representation of body structure. Consciousness and Cognition 18: 2–11. Schwabe, L., and O. Blanke. 2008. The vestibular component in out-of-body experiences: A computational approach. Frontiers in Human Neuroscience 2: 17. Shore, D. I., M. E. Barnes, and C. Spence. 2006. Temporal aspects of the visuotactile congruency effect. Neuroscience Letters 392(1–2): 96–100. Spence, C., F. Pavani, and J. Driver. 2004. Spatial constraints on visual–tactile cross-modal distractor congruency effects. Cognitive, Affective and Behavioral Neuroscience 4(2): 148–169. Stratton, G. 1899. The spatial harmony of touch and sight. Mind 8: 492–505. Todd, J., and K. Dewhurst. 1955. The double: Its psychopathology and psycho-physiology. Journal of Nervous and Mental Disorders 122: 47–55. Tsakiris, M., and P. Haggard. 2005. The rubber hand illusion revisited: Visuotactile integration and self-attribution. Journal of Experimental Psychology-Human Perception and Performance 31(1): 80–91. Tsakiris, M., M. Hesse, C. Boy, P. Haggard, and G. R. Fink. 2007. Neural signatures of body ownership: A sensory network for bodily self-consciousness. Cerebral Cortex 17(10): 2235–2244, doi:10.1093/cercor /bhl131. Vallar, G. et al. 1999. A fronto-parietal system for computing the egocentric spatial frame of reference in humans. Experimental Brain Research 124: 281–286. Vogeley, K., and G. R. Fink. 2003. Neural correlates of the first-person-perspective. Trends in Cognitive Sciences 7: 38–42. Yen Pik Sang, F., K. Jáuregui-Renaud, D. A. Green, A. Bronstein, and M. Gresty. 2006. Depersonalisation/ derealisation symptoms in vestibular disease. Journal of Neurology Neurosurgery and Psychiatry 77(6): 760–766.
Section VI Attention and Spatial Representations
25
Spatial Constraints in Multisensory Attention Emiliano Macaluso
CONTENTS 25.1 Introduction........................................................................................................................... 485 25.2 Unisensory and Multisensory Areas in Human Brain.......................................................... 487 25.3 Multisensory Endogenous Spatial Attention......................................................................... 490 25.4 Stimulus-Driven Spatial Attention........................................................................................ 492 25.5 Possible Relationship between Spatial Attention and Multisensory Integration................... 497 25.6 Conclusions............................................................................................................................500 References....................................................................................................................................... 501
25.1 INTRODUCTION Our sensory organs continuously receive a large amount of input from the external world; some of these are important for a successful interaction with the environment, whereas others can be ignored. The operation of selecting relevant signals and filtering out irrelevant information is a key task of the attentional system (Desimone and Duncan 1995; Kastner and Ungerleider 2001). Attentional selection can occur on the basis of many different criteria, with a main distinction between endogenous control (i.e., selection based on voluntary attention, current aims, and knowledge) and stimulus-driven control (i.e., selection based on the intrinsic features of the sensory input). Accordingly, we can decide to pay attention to the face of one person in a crowded room (i.e., attending to subtle details in a rich and complex environment), or attention can be captured by a loud sound in a quiet room (i.e., attention captured by a salient stimulus). Many different constraints can guide endogenous and stimulus-driven attention. We can voluntarily decide to attend to a specific visual feature, such as color or motion, but the very same features can guide stimulus-driven attention if they stand out from the surrounding environment (“pop-out” item, e.g., a single red stimulus presented among many green stimuli). Here, I will focus on processes related to attentional selection based on spatial location. The investigation of mechanisms of spatial attention control is appealing for many reasons. Spatial selectivity is one of the most important characteristics of single neurons (i.e., the neuron’s receptive field) and well-organized maps of space can be found throughout the brain (Gross and Graziano 1995). These include sensory areas (e.g., striate and extrastriate occipital regions, for retinotopic representations of the visual world; Tootell et al. 1982), subcortical regions [e.g., the superior colliculus (SC); Wallace et al. 1997], and higher-level associative areas in frontal and parietal cortex (e.g., Ben Hamed et al. 2001; Sommer and Wurtz 2000). This widespread selectivity for spatial locations opens the question about how/ whether these anatomically segregated representations contribute to the formation of an integrated representation of external space. Indeed, from a subjective point of view, signals about different visual features (e.g., shape/color) as well as motor commands seem to all merge effortlessly, giving rise to a coherent and unified perception–action system that allows us to interact spatially with the external environment. 485
486
The Neural Bases of Multisensory Processes
The coordination of anatomically distributed spatial representations is also particularly relevant for the processing of signals from different sensory modalities. The position of a single object or event in the external world can be registered using signals in different modalities (e.g., a car approaching that we can both see and hear), but this requires some mechanisms matching spatial information that is initially processed in anatomically separate areas (e.g., occipital cortex for vision, and temporal cortex for audition). The brain’s ability to detect spatial colocalization of signals in different modalities can lead to faster and more accurate responses for multisensory signals originating from a single external location compared with spatially separate signals (e.g., see Spence et al. 1998). Indeed, spatial alignment is considered a key determinant for multisensory integration (Stein and Meredith 1993; Meredith and Stein 1996). The main topic of this chapter concerns the possible relationship between the coordination/integration of spatial maps across sensory modalities and the control of spatial attention. Before tackling this issue, I will briefly consider a few ideas about the neural basis of spatial attention control in the visual modality. These will then help to highlight commonalities and differences between visuospatial attention and spatial attention control in multisensory situations. The neural mechanisms underlying visuospatial attention control have been studied extensively. One of the most popular approaches consists in presenting first an informative cue that instruct the subject to attend to a specific spatial location, followed by a target stimulus either at the attended location (“valid trial,” typically 75–80%) or somewhere else in the visual display (“invalid trials,” 20–25%; see Posner 1980; Posner et al. 1984). Using event-related functional magnetic resonance imaging (fMRI) techniques, it became possible to identify brain areas associated with three main processes involved in these tasks. Processing of the cue, which involves preparatory orienting of endogenous spatial attention toward the cued location, has been associated with activation of a dorsal fronto parietal (dFP) network including intraparietal (IPS) and posterior parietal cortex (PPC), plus dorsal premotor cortex thought to correspond to the human frontal eye fields (FEFs; Corbetta and Shulman 2002; Yantis et al. 2002; Vandenberghe et al. 2001). Kastner and colleagues (1999) reported that activity in the dorsal network remains elevated in the interval between the cue and the target when the display does not contain any visual stimulus, indicating that the activation of these regions is generated “internally” (i.e., endogenous shift and holding of spatial attention; see also Corbetta et al. 2005; Kelley et al. 2008). Upon presentation of the target, different brain areas are activated depending on whether the target is presented at the attended (valid) or unattended (invalid) location. On valid trials, attention modulates activity in retinotopic occipital visual areas that represent the target location (e.g., see Martinez et al. 1999; Hopfinger et al. 2000; see also Luck et al. 1997) as well as dorsal parietal and frontal regions that also contain retinotopically organized maps of visual space (Sereno et al. 2001; Hagler and Sereno 2006; Saygin and Sereno 2008). Accordingly, visuospatial attention can modulate activity in multiple brain regions, all representing the position of the attended stimulus. When the target is presented at the uncued side (invalid trials), activation is typically found in a ventral frontoparietal network (vFP), comprising the inferior parietal cortex [temporo-parietal junction (TPJ)] and inferior premotor regions [inferior frontal gyrus(IFG) and frontal operculum; see Arrington et al. 2000; Corbetta et al. 2000]. Invalidly cued targets are thought to trigger a shift of spatial attention from the cued/attended location to the unattended target location, and therefore activation of the vFP has been associated with stimulus (target)-driven reorienting of spatial attention. These findings can be contextualized in “site-source models” of visuospatial attention control (Figure 25.1a; for review, see also Pessoa et al. 2003). They postulate that control regions in dFP (“sources” of endogenous control) influence activity in sensory areas (“sites” of attentional modulation) that represent the currently relevant/attended position, via modulatory feedback signals (Desimone and Duncan 1995; Bressler et al. 2008; Ruff et al. 2006; see also Moore 2006, for review). These modulatory influences facilitate the processing of stimuli at the attended position, enabling them to outcompete other stimuli for the same processing resources. A recent EEG study clarified the temporal sequence of activation in these control and sensory areas (Green and McDonald 2008).
487
Spatial Constraints in Multisensory Attention (a) Site-source model of visuo-spatial attention control Endogenous control Dorsal FP
(IPS, PPC, FEF)
(b) Multisensory control of spatial attention
3
Visual areas
(occipital cortex)
Stimulus-driven control
1
2
Ventral FP (TPJ, IFG)
Som/Aud areas
Dorsal FP
Ventral FP
Visual areas 4
SC Thalamus
FIGURE 25.1 Schematic models of spatial attention control. (a) “Site-source model” of visuospatial control. This distinguishes areas that generate spatial biases [“sources,” in dorsal fronto-parietal (dFP) cortex] and areas that receive these modulatory signals (“sites,” occipital visual cortex). The model also includes a distinction between endogenous control (dark gray) and stimulus-driven control (light gray; see Corbetta et al. 2002). The two control systems operate together and interaction between them has been proposed to affect functional coupling between visual cortex and ventral attention control network (vFP; see Corbetta et al. 2008). IPS, intraparietal sulcus; PPC, posterior parietal cortex; TPJ, temporo-parietal cortex; IFG, inferior frontal gyrus; SC, superior colliculus; Som/Aud, somatosensory/auditory. (b) An extension of “site-source” model, with feedforward connectivity and backprojections that allow transferring spatial information between sensory-specific (e.g., visual, auditory, and somatosensory areas) and multisensory regions (dFP and vFP). These multiple pathways may mediate spatial constraints in multisensory attention. Possible routes include: (1) feedforward multisensory input converging into vFP, for stimulus-driven control; (2) multisensory interactions in dFP, which in turn may affect interplay between dorsal (endogenous) and ventral (stimulus-driven) attention control systems; (3) direct projections between sensory-specific areas that may mediate cross-modal effects in sensory-specific areas; (4) multisensory interaction via subcortical structures that send and receive projections to/from sensory-specific and multisensory cortical areas.
During the cue-to-target interval, activation occurred first in the parietal cortex (approximately at 200 ms post-cue onset) and then in the frontal regions (at 400 ms), followed by reactivation of parietal regions (600 ms), and lastly, attentional modulation was found in the occipital cortex. Moreover, this study also showed that these preparatory effects are predictive of subsequent perceptual performance upon presentation of the target, confirming the relationship between activation of fronto-parietal control regions and attentional benefits for targets presented at the cued location. The vFP provides an additional control system that can flexibly interrupt endogenous control when unexpected/salient events require reorienting of attention toward a new location (stimulus-driven control; Corbetta and Shulman 2002; Corbetta et al. 2008). It should be stressed that this is a simplified model of visuospatial attention control, as there are many other processes (e.g., feature conjunction, sensory–motor transformations, etc.) and brain regions (e.g., the SC and the pulvinar) that also contribute to covert spatial orienting. However, this simple model embodies a few key concepts concerning (1) attention control vs. modulation, (2) feedforward vs. feedback connectivity, and (3) endogenous vs. stimulus-driven control, which can help in the interpretation of many findings in studies of multisensory spatial attention.
25.2 UNISENSORY AND MULTISENSORY AREAS IN HUMAN BRAIN Before addressing the main issue of multisensory spatial attention control, it is worth to briefly review the regions of the brain that respond to stimuli in more than one modality, and to ask whether they also show some differential activation depending on stimulus position. This will
488
The Neural Bases of Multisensory Processes
highlight potential players in mechanisms of multisensory spatial attention. Single-cell electrophysiology identified multisensory neurons in many cortical regions, including intraparietal areas (Duhamel et al. 1998), premotor regions (Graziano and Gross 1993), posterior temporal cortex (Bruce et al. 1981), and subcortical regions (Meredith and Stein 1986a). Noninvasive functional imaging revealed corresponding multisensory responses in humans. For example, Bremmer et al. (2001) presented subjects with visual (moving vs. stationary random dots), auditory (moving tones vs. rest), or tactile stimuli (moving air puff vs. rest), asking where in the brain activity increases irrespective of the stimulated modality. This showed activation of the intraparietal sulcus, inferior parietal cortex, and premotor regions. Additional multisensory activations have been also found in the insula (Lewis et al. 2000) and in posterior regions of the superior temporal sulcus (STS; Beauchamp et al. 2004). Turning to the more specific question about multisensory spatial representations, we utilized visual or tactile stimuli presented either in the left or right hemifield, with left and right visual stimuli positioned in close spatial proximity of the corresponding left and right hands (see Figure 25.2a). This allowed us to test for brain regions that show differential activation depending on the stimulus position (left vs. right stimuli and vice versa) and—critically—whether any such difference depends on the stimulated modality (i.e., unisensory vs. multisensory, side-specific activations). Occipital visual regions activated for visual stimuli presented in the contralateral hemifield, but were unaffected during tactile stimulation (unisensory visual side-specific effects; see Figure 25.2a). Somatosensory areas in the post-central sulcus activated for contralateral tactile stimuli and did not respond to visual stimulations (unisensory tactile side-specific effects). Most importantly, a higher-order region in the anterior intraparietal sulcus (aIPS) activated more for contralateral than ipsilateral stimulation, but now irrespective of the modality of the stimuli (multisensory visuotactile, side-specific effects; see signal plots in Figure 25.2a). Accordingly, spatial information from different senses appears to come together in this region, forming a supramodal representation of contralateral space. More recently, Sereno and Huang (2006) extended these results showing that multisensory responses in anterior IPS are not only side-specific, but also follow a well-organized topographical layout with contiguous positions around the face represented in contiguous regions of IPS. The activation of this intraparietal region does not seem to merely reflect a common output/motor system for the different modalities, because multisensory spatial effects in aIPS have been found irrespective of overt motor task (see Macaluso et al. 2003a, who used manual/saccadic responses to visual/tactile stimuli). To summarize, multisensory responses have been found in frontal, parietal, and temporal cortex. These include intraparietal and dorsal premotor regions (Bremmer et al. 2001) that overlap with the dFP network involved in endogenous control of visuospatial attention (Corbetta et al. 2002); and regions around the TPJ and ventral premotor cortex (Beauchamp et al. 2004; Bremmer et al. 2001) that seem to overlap with the vFP attention network (Corbetta et al. 2002). Studies that manipulated the spatial position of stimuli in different modalities (Macaluso and Driver 2001; Sereno and Huang 2006) revealed a segregation between sensory-specific spatial representation in occipital (vision) and post-central (touch) areas, and multisensory representation primarily in the intraparietal sulcus. These findings fit with the idea that the integration of multisensory signals and the construction of multisensory representations of space may occur via feedforward convergence (see Massaro 1999; Graziano and Gross 1995). Accordingly, spatial locations are first computed in sensoryspecific areas, which then project to common (multisensory) regions in high-order frontal, parietal, and temporal cortex. In addition, the localization of multisensory responses both in vFP and dFP raises the hypothesis that the attention control systems operate not only in vision, but rather they may control the deployment of spatial attention irrespective of modality (Farah et al. 1989; Driver and Spence 1998). This will be addressed in the following sections, where first we will examine examples of endogenous multisensory attention and then situations related to stimulusdriven multisensory attention.
489
Spatial Constraints in Multisensory Attention
(a) Mapping multisensory space
(b) Endogenous cross-modal attention V T
T
T V
Left anterior IPS Multisensory *
*
5 0
–5
L
R
Vision
L
aL
R
Multisensory *
*
8 6 4 2 0 –2
Effect size (a.u.)
Effect size (a.u.)
10
aR
aVision
Touch
aL
aR
aTouch
Bs
Left occipital cortex Sensory-specific *
5
n.s.
0
–5
L
R
Vision
L
Cross-modal 8 6 4 2 0 –2
Effect size (a.u.)
Effect size (a.u.)
10
R
Touch
*
aL
*
aR
aVision
aL
aR
aTouch
Bs
FIGURE 25.2 (a) Mapping of multisensory space. Top panel shows a schematic illustration of an fMRI experiment to map visual and tactile side-specific activation. In different blocks/conditions, subjects were presented with stimuli in one modality and one side only (right touch in example). A region in aIPS showed greater responses for contralateral than ipsilateral stimuli, irrespective of stimulus modality. Middle panel shows multisensory activation of left aIPS for visual and tactile stimuli on right side. By contrast, sensoryspecific areas showed an effect of contralateral versus ipsilateral stimuli only for corresponding modality. For example, left occipital visual cortex activated significantly more for right than left stimulation, but only for visual stimuli (see bottom panel). (b) Multisensory endogenous spatial attention. Top panel shows a schematic illustration of one of the setup utilized to study visuo-tactile cross-modal links in endogenous spatial attention. (Reproduced from Macaluso, E. et al., Cereb. Cortex, 12, 357–368, 2002b.) The stimulation was always bimodal and bilateral, but in different conditions subjects are asked to attend to only one side and one modality (attend right touch, in this example). Direct comparison of conditions of attention to one versus the other side (attend right vs. attend left, in the figure) reveals modality-independent attentional modulation in contralateral multisensory regions (e.g., left aIPS for attention to right hemifield; see middle panel) but also cross-modal influences in sensory-specific areas. For example, bottom panel shows cross-modal spatial attentional effects in left occipital visual cortex, with increased activation when subjects attended right vision (bar 2 minus 1, in signal plot) but also when they attended right touch (bar 4 minus 3, in plot). V/T, visual/tactile; L/R, left/right; aL/aR, attend left/right; Bs, baseline condition (central detection); *p < .05.
490
The Neural Bases of Multisensory Processes
25.3 MULTISENSORY ENDOGENOUS SPATIAL ATTENTION Behavioral studies have often reported cross-modal cueing effects that suggest the existence of a common system for spatial attention control across the different sensory modalities (e.g., Spence and Driver 1996). The typical finding here is that when subjects are asked to attend to one location and to expect targets in one specific modality there (e.g., attend to the left side to discriminate visual targets), subjects are not only faster in discriminating visual targets at the attended side compared to visual targets on the opposite side (i.e., intramodal effect of visuospatial attention), but will also show an advantage for targets presented in a different modality (e.g., touch) presented on the “visually attended” compared to the unattended side (e.g., faster responses for left vs. right touch). These results are consistent with the proposal that the selection of the relevant location occurs irrespective of modality, supporting a supramodal account of endogenous attention control (Farah et al. 1989; Driver et al. 1998). We have investigated the neural substrates of cross-modal cueing effects in endogenous attention using positron emission tomography and fMRI (Macaluso et al. 2000a, 2002b, 2003b). The supramodal account of attention control predicts that regions of the brain involved in spatial attention should show modality-independent activation. Indeed, we found that associative regions in the fronto-parietal cortex activated when subjects attended to one or the other side (compared with central attention), regardless of whether they were asked to judge vision or touch (see also Shomstein and Yantis 2006). In addition, in one fMRI study we separated cue-related effects and target-related activity (Macaluso et al. 2003b). Specifically, we compared the activity associated with predictive auditory cues (tones instructing subjects to shift attention toward one or the other hemifield) versus control cues (a different tone indicating that no target would follow on that trial, i.e., no shift is required). This showed activation of dorsal fronto-parietal regions, regardless of whether subjects had to prepare for visual or tactile discrimination. Preparatory, cue-related effects in dFP regions have also been reported in studies on pure auditory attention (Wu et al. 2007), confirming that dFP is involved in endogenous, voluntary attention control irrespective of modality. Additional evidence for supramodal mechanisms of attention control came from analyses that directly compared attention to one or the other hemifield. In particular, if attention control selects spatial locations irrespective of modality, it may be expected that activity in regions that represent the attended location should be affected irrespective of the modality of the target presented there. In should be stressed that in attention experiments, the stimuli are held constant across conditions (cf. Figure 25.2a and b), thus highlighting the modulatory effect of endogenous spatial attention over and above any activation related to the sensory input. The comparison of attention to one versus the other hemifield showed modulation of activity in the aIPS, both when subjects attend and judge vision but also when they attend and judge touch (Macaluso et al. 2000a, 2002b, 2003b; see Figure 25.2b). Moreover, these spatially specific attentional modulations in multisensory aIPS occurred in anticipation of the appearance of the target (cue-related effects), corroborating the “internal,” endogenous origin of these signals (see Macaluso et al. 2003b). These results are in agreement with the idea that parietal association cortex contains multisensory representation of space (feedforward convergence hypothesis) and that supramodal mechanisms of endogenous spatial attention control operate by means of these representations. Together with the modulation of activity within “multisensory convergence” regions, we have also consistently found cross-modal influences of endogenous spatial attention in “sensory-specific” occipital visual cortex. The occipital visual cortex that does not respond to tactile stimuli (see Figure 25.2a; but see also Kayser et al. 2005; Meyer et al. 2007)—and if anything—tends overall to deactivate when attention is focused on nonvisual stimuli (Laurienti et al. 2002). Nonetheless, the direction of tactile attention was found to modulate cross-modally activity there. For example, when subjects directed endogenous attention toward the right hand to discriminate tactile targets there, activity increased in the left occipital cortex that represents the contralateral, right visual hemifield (Figure 25.2b, bottom panel; see also Macaluso et al. 2000a, 2002b, 2003b). These effects were
Spatial Constraints in Multisensory Attention
491
observed even when visual distracters at the attended side conveyed misleading information (e.g., a single flash of light, while subjects attempted to detect double pulses of vibrations; Macaluso et al. 2000a, 2002b, 2003b; see also Ciaramitaro et al. 2007). Accordingly, it is unlikely that subjects decided to strategically shift both tactile and visual attention toward one side, but rather cross-modal spatial influences in visual cortex appear to be obligatory (see also Eimer 1999). It should be noted that modulatory effects of one modality on areas dedicated to a different modality are not confined to tactile attention affecting the visual cortex (for review, see Eimer and Driver 2001). For example, Eimer and Van Velzen (2002) showed modulation of early somatosensory event-related potentials (ERPs) depending on the direction of visual attention (see also Kida et al. 2007; for a recent magnetoencephalography study localizing related cross-modal influences in secondary somatosensory cortex); Teder-Salejarvi et al. (1999) found that endogenous visuospatial attention can modulate early auditory ERPs; and Hotting et al. (2003) reported reciprocal cross-modal influences of auditory and tactile spatial attention on tactile and auditory ERPs, respectively. Our visuo-tactile fMRI study that isolated cue-related, preparatory processes (Macaluso et al. 2003b) provided additional hints about the nature of spatially specific cross-modal influences in the occipital cortex. The comparison of leftward versus rightward attention-directing cues, and vice versa, demonstrated that activity in contralateral occipital cortex increases before the presentation of the target stimuli, that is, when subjects prepared for the upcoming tactile judgment. For example, when the auditory cue instructed the subject to shift tactile attention to the right hemifield, brain activity increased not only in left post-central somatosensory areas and in left multimodal intraparietal cortex, but also in the left extrastriate visual cortex (for preparatory cross-modal influences between other modalities, see also Trenner et al. 2008; Eimer et al. 2002; Green et al. 2005). This supports the hypothesis that endogenous attention generates “multisensory spatial biases,” and that these can influence multiple levels of processing, including activity in multisensory regions (aIPS) as well as in sensory-specific areas (somatosensory and visual cortex, for tactile spatial attention). To summarize, studies on multisensory endogenous spatial attention have shown that: (1) control regions in dFP activate irrespective of modality; (2) selective attention to one hemifield boosts activity in areas that represent the contralateral hemifield, including also sensory-specific areas concerned with a different modality (e.g., cross-modal modulation of occipital cortex during tactile attention); (3) both multisensory regions in dFP (plus spatially specific aIPS) and unisensory areas show attentional modulation before the presentation of the target stimuli (cue-related effects), consistent with the endogenous, internally generated nature of these attentional signals. These findings can be interpreted in the context of “sites-sources” models of attention control. Accordingly, feedforward sensory convergence would make multisensory information available to the dFP attentional network that can therefore operate as a supramodal control system. Backprojections from the control system (“sources”) to sensory-specific areas (“sites”) enable conveying modulatory signals about the currently relevant location. Critically, because the control system operates supramodally and is connected with several modalities, these signals will spread over multiple “site” regions affecting activity in a distributed network of multimodal and unimodal brain regions, all representing the attended location. The net result of this is that endogenous attention selects the attended location irrespective of modality, with all stimuli presented at the attended location receiving enhanced processing (e.g., Eimer and Driver 2000). This proposal entails that feedforward and feedbackward connections between sensory areas and associative regions in dFP mediate a transfer of spatial information across modalities. This effectively means that endogenous attention “broadcasts” information about the currently attended location between anatomically distant brain areas, thus mediating multisensory integration of space. Drawing a loose analogy with the feature integration theory in the visual modality (Treisman and Gelade 1980), we can think that space is like an “object” composed by multiple “features” (visual location, auditory location, saccadic target location, etc). Each “feature” is represented in a specific region of brain, including many sensory-specific, multisensory, and motor representations localized in separate brain regions. Attention coordinates and binds together these representations via
492
The Neural Bases of Multisensory Processes
modulatory influences, thus generating a coherent representation of the whole “object,” that is, an integrated representation of space. However, traditional views of multisensory integration posit that signals in different modalities interact in an automatic manner, suggesting “preattentive” mechanisms of multisensory integration. The next two sections will address this issue in more detail, first looking for multisensory effects in paradigms involving stimulus-driven rather than voluntary attention, and then discussing a set of studies that directly tested for the interplay between endogenous and stimulus-driven factors in multisensory spatial attention. In the last section, I will further specify the possible relationship between attention control and multisensory integration.
25.4 STIMULUS-DRIVEN SPATIAL ATTENTION The previous section highlighted multisensory consequences of spatial selection, when subjects choose voluntarily to play attention to one specific location (endogenous attention). Under these conditions, selection appears to operate cross-modally, with supramodal mechanisms of control (“sources”: dFP) and modulation (“sites”: sensory areas) that boost processing of stimuli at the attended location irrespective of modality. The question arises whether these supramodal mechanisms are contingent on voluntary control (e.g., maybe because of a limited pool of resources for endogenous attention) or whether the merging of spatial representations across modalities can also occur in situations that do not involve any strategic, voluntary control. Cross-modal cueing effects in automatic, stimulus-driven spatial attention and any associated change of brain activity might also provide further evidence about the possible relationship between spatial attention and multisensory integration. Behavioral studies showed that nonpredictive stimuli in one modality can affect processing of subsequent targets in a different modality (e.g., Spence et al. 1998; McDonald et al. 2000). For example, task-irrelevant touch on the left hand can speed up responses to visual targets presented nearby, compared with visual targets presented on the other side. Critically, this occurs also when the side of the tactile stimulus does not predict the side of the subsequent visual target (nonpredictive tactile cues), suggesting that these cross-modal cueing effects do not depend solely on strategic (endogenous) deployment of spatial attention. The investigation of the neural basis of stimulus-driven (visual) attention with neuroimaging methods can be somewhat problematic. Studies on endogenous attention typically compare experimental conditions with identical stimuli, manipulating only the instructions that are given to the subject (Heinze et al. 1994; cf. also Figure 25.2b for an example in the context of multisensory attention). On the contrary, stimulus-driven effects depend by definition on the stimulus configuration. In the visual modality, a typical stimulus-driven (exogenous) cueing study entails presentation of a nonpredictive peripheral visual cue followed by a visual target either at the same location (valid trials, 50%) or on the opposite side (invalid trials, 50%). The direct comparison of these two trials entails both a different attentional status (e.g., “attended” target on valid trials vs. “unattended” target on invalid trials), but also different physical stimuli. Hence, the interpretation of any differential brain activation can be problematic. For example, two stimuli presented in close spatiotemporal proximity (as during valid trials) can give rise to nonlinear summation of the hemodynamic responses, which will appear as a differential activation when valid and invalid trials are directly compared even if there is no actual change in neuronal activity or attentional modulation. These drawbacks are somewhat mitigated in the context of multisensory fMRI experiments. Multisensory stimulus-driven paradigms also entail the co-occurrence of attentional (attended vs. unattended) and sensory effects (same side vs. opposite side), but—critically—“same-side” conditions do not entail delivering twice the same, or very similar, stimuli (i.e., cue and target are now in different modalities). Moreover, the registration of the cue–target spatial alignment (i.e., the distinction between valid/same-side and invalid/opposite-side trials) cannot occur in a straightforward manner within low-level sensory-specific maps of space, because the two stimuli are initially processed in anatomically separate brain regions (e.g., visual vs. somatosensory cortices).
Spatial Constraints in Multisensory Attention
493
We have utilized stimulus-driven paradigms in a series of multisensory, visuo-tactile fMRI studies (Macaluso et al. 2000b, 2002a, 2005; Zimmer and Macaluso 2007). As in classical behavioral studies, we compared trials with nonpredictive touch on the same side as the visual target versus trials with touch and vision on opposite sides. More specifically, we revealed hemifield-specific cross-modal interactions by testing for the interaction between the position of the visual target (left/ right) and the presence of the tactile input (e.g., Macaluso et al. 2000b) or the spatial congruence of the visuo-tactile stimuli (same or opposite sides; cf. Figure 25.3a; see also Zimmer and Macaluso 2007). This allowed us to eliminate any trivial difference due to the sensory stimuli; for example, comparing directly VLTL trials (vision and touch on the left “same side”) versus a VLTR trials (“opposite side” trials) would activate the right somatosensory cortex just because this comparison also entails left versus right tactile stimuli. On the contrary, the comparison (VLTL – VLnoT) > (VRTL – VRnoT) does not entail this confound, with the mere effect of visual and tactile stimulation subtracting out in the interaction. Our results consistently showed that nonpredictive, task-irrelevant tactile stimuli on the same side of the visual target can boost activity in occipital visual cortex contralateral to the target side (e.g., Macaluso et al. 2000b). Figure 25.3a shows an example of this cross-modal, stimulus-driven, and spatially specific effect in the visual cortex. In this experiment, the visual target was delivered equiprobably in left or right visual hemifield near to the subject’s face. Task-irrelevant nonpredictive touch consisted of air puffs presented equiprobably on the left or right side of the forehead, in close spatial correspondence of the position of the visual stimuli on each side. The task of the subject was to discriminate the “up/down” elevation of the visual target (two LEDs were mounted on each side). The test for hemifield-specific effects of cross-modal attention (i.e., the interaction between the position of the visual stimulus and the spatial congruence of the bimodal stimulation; for more details on this topic, see also Zimmer and Macaluso 2007) revealed increased activation in the left occipital cortex when both vision and touch were on the right side of space; and activation in the right occipital cortex for spatially congruent (same-side) vision and touch on the left side (see Figure 25.3a). Accordingly, task-irrelevant touch can affect processing in the visual cortex in a spatially specific and fully stimulus-driven manner. This is consistent with the hypothesis that spatial information about one modality (e.g., touch) can be transmitted to anatomically distant areas that process stimuli in a different modality (e.g., occipital visual cortex), and that this can occur irrespective of strategic, endogenous task requirements (see also McDonald and Ward 2000; Kennett et al. 2001; Kayser et al. 2005; Teder-Salejarvi et al. 2005; related findings about stimulus-driven cross-talks between areas processing different modalities are also discussed later in this book). The finding of spatially specific cross-modal influences of touch in visual areas is also remarkable because the visual cortex registers the stimulus position in a retino-centered frame of reference, whereas the position of touch is initially registered in a body-centered frame of reference. Thus, the question arises on whether these side-specific effects of multisensory spatial congruence truly reflect the alignment of visual and tactile stimuli in external space or rather merely reflect an overall hemispheric bias. Indeed, a congruent VRTR stimulus entails a double stimulation of the left hemisphere, whereas on incongruent VRTL trials the two stimuli will initially activate opposite hemispheres (see also Kinsbourne 1970, on hemispheric biases in spatial attention). We dissociated the influence of hemisphere versus external location manipulating the direction of gaze with respect to the hand position (Macaluso et al. 2002a). Tactile stimuli were always delivered to the right hand that was positioned centrally. When subjects fixated on the left side, the right visual field stimulus was spatially aligned with touch, and both right touch and right vision projected to the left hemisphere. However, when gaze was shifted to the right side, now the left visual field stimulus was spatially aligned with right touch, with vision and touch projecting initially to opposite hemispheres. The fMRI results showed that common location in external space, rather than common hemisphere, determined crossmodal influences in the occipital cortex. Hence, right-hand touch can boost the right visual field when the right hand is in the right visual field, but will boost the left visual field if a posture change puts the right hand in the left visual field (see also Kennett et al. 2001, for a related ERP study).
494
The Neural Bases of Multisensory Processes
(a) Stimulus-driven cross-modal spatial interactions +
Right occipital cortex VT
4 3 2 1 0 –1 –2 –3 –4
Cong VL TL
VR TR
Cong
VL TR
VR TL
Incong
Effect size (a.u.)
Left occipital cortex 4 3 2 1 0 –1 –2 –3 –4
Incong
T
V
VL TL
VR TR
Cong
VL TR
VR TL
Incong
+
4 2 0
Cong Incong Cong Incong LOW load
HIGH load
+ VT
Left–right vision (a.u.)
Left occipital cortex
High
6
Low
Right–left vision (a.u.)
(b) Stimulus-driven cross-modal interactions and endogenous visuo-spatial load
6
Right occipital cortex
4 2 0
Cong Incong Cong Incong LOW load
HIGH load
FIGURE 25.3 Stimulus-driven cross-modal spatial attention and interactions with endogenous control. (a) Stimulus-driven cross-modal influences in visual cortex. In this event-related fMRI study (unpublished data), subjects performed a visual discrimination task (“up/down” judgment) with visual stimuli presented in left or right hemifield near the forehead. Task-irrelevant touch was presented equiprobably on left or right side of the forehead, yielding to spatially congruent trials (vision and touch on same side; e.g., both stimuli on right side, cf. top-central panel) and incongruent trials (vision and touch on opposite sides; e.g., vision on the right and touch on the left). Imaging data tested for interaction between position of visual target (left/right) and spatial congruence of bimodal stimulation (congruent/incongruent: e.g., testing for greater activation for right than left visual targets, in spatially congruent vs. incongruent trials). This revealed activity enhancement in occipital visual areas when a contralateral visual target was coupled with a spatially congruent task-irrelevant touch. For example, left occipital cortex showed greater activation comparing “right minus left visual targets,” when touch was congruent vs. incongruent (see signal plot on left side: compare “bar 2 minus 1” vs. “bar 4 minus 3”); effectively yielding to maximal activation of left occipital cortex area when a right visual target was combined with right touch on same side (see bar 2, in same plot). (b) Stimulus-driven cross-modal influences and endogenous visuospatial attention. (From Zimmer, U. and Macaluso, E., Eur. J. Neurosci., 26, 1681–1691, 2007.) Also in this study, we indexed side-specific cross-modal influences testing for interaction between position of visual stimuli and spatial congruence of visuo-tactile input (see also Figure 25.3a; note that, for simplicity, panel b shows only “right-congruent” condition), but now with both vision and touch fully task-irrelevant. We assessed these cross-modal spatial effects under two conditions of endogenous visuospatial attentional load. In “High load” condition, subjects were asked to detect subtle changes of orientation of a grating patch presented above fixation. In “Low load” condition, they detected changes of luminance at fixation. fMRI results showed that activity in occipital cortex increased for spatially congruent visuo-tactile stimuli in contralateral hemifield, and that—critically—this occurred irrespective of load of visuospatial endogenous task. Accordingly, analogous effects of spatial congruence were found in “Low load” condition (bar 1 minus 2) and in “High load” condition (bar 3 minus 4, in each signal plot). V/T, vision/touch; L/R, left/ right; Cong/Incong, congruent (VT on the same side)/incongruent (VT on opposite sides).
Spatial Constraints in Multisensory Attention
495
The finding that cross-modal influences in sensory-specific occipital cortex can take posture into account suggests that intermediate brain structures representing the current posture are also involved. Postural signals have been found to affect activity in many different regions of the brain, including fronto-parietal areas that also participate in attention control and multisensory processing (Andersen et al. 1997; Ben Hamed and Duhamel 2002; Boussaoud et al. 1998; Bremmer et al. 1999; Kalaska et al. 1997; Fasold et al. 2008). Hence, we can hypothesize that the fronto-parietal cortex may also take part in stimulus-driven multisensory attention control. In the visual modality, stimulus-driven control has been associated primarily with activation of a vFP, including the TPJ and the IFG. These areas activate when subjects are cued to attended to one hemifield but the visual target appears on the opposite side (invalid trials), thus triggering a stimulus/target-driven shift of visuospatial attention (plus other task-related resetting processes; see below). We employed a variation of this paradigm to study stimulus-driven shifts of attention in vision and in touch (Macaluso et al. 2002c). A central informative cue instructed the subject to attend to one side. On 80% of the trials the target appeared on the attended side (valid trials), whereas in the remaining 20% of the trials the target appeared on the opposite side (invalid trials). Critically, the target could be either visual (LED near to the left/right hands, on each side) or tactile (air puff on the left/right hands). The modality of the target stimulus was randomized and unpredictable, thus subjects could not strategically prepare to perform target discrimination in one or the other modality. The dorsal FP network activated irrespective of cue validity, consistent with the role of this network in voluntary shifts of attention irrespective of modality (see also Wu et al. 2007). The direct comparison of invalid versus valid trials revealed activation of the vFP (TPJ and IFG), both for invalid visual targets and for invalid tactile targets. This demonstrates that both visual and tactile target stimuli at the unattended location can trigger stimulus-driven reorienting of spatial attention and activation of the vFP network (see also Mayer et al. 2006; Downar et al. 2000). Nonetheless, extensive investigation of spatial cueing paradigms in the visual modality indicates that the activation of the vFP network does not reflect pure stimulus-driven control. As a matter of fact, invalid trials involve not only stimulus-driven shifts of attention from the cued location to the new target location, but they also entail breaches of expectation (Nobre et al. 1999), updating taskrelated settings (Corbetta and Shulman 2002) and processing of low frequency stimuli (Vossel et al. 2006). Several different strategies have been undertaken to tease apart the contribution of these factors (e.g., Kincade et al. 2005; Indovina and Macaluso 2007). Overall, the results of these studies lead to the current view that task-related (e.g., the task-relevance of the reorienting stimulus, i.e., the target that requires judgment and response) and stimulus-driven factors jointly contribute to the activation of the vFP system (see Corbetta et al. 2008 for review). Additional evidence for the role of task relevance for the activation of vFP in the visual modality comes from a recent fMRI study, where we combined endogenous predictive cues and exogenous nonpredictive visual cues on the same trial (Natale et al. 2009). Each trial began with a central, predictive endogenous cue indicating the most likely (left/right) location of the upcoming target. The endogenous cue was followed by a task-irrelevant, nonpredictive exogenous cue (brightening and thickening of a box in the left or right hemifield) that was quickly followed by the (left or right) visual target. This allowed us to cross factorially the validity of endogenous and exogenous cues within the same trial. We reasoned that if pure stimulus-driven attentional control can influence activity in vFP, exogenous cues that anticipate the position of an “endogenous-invalid” taskrelevant target (e.g., endogenous cue left, exogenous cue right, target right) should affect reorienting related activation of vFP. Behaviorally, we found that both endogenous and exogenous cues affected response times. Subjects were faster to discriminate “endogenous-invalid” targets when the exogenous cue anticipated the position of the target (exogenous valid trials, as in the stimulus sequence above). However, the fMRI data did not reveal any significant effect of the exogenous cues in the vFP, which activated equivalently in all conditions containing task-relevant targets on the opposite side of the endogenously cued hemifield (i.e., all endogenous-invalid trials). These findings are in agreement with the hypothesis that fully task-irrelevant visual stimuli do not affect activity in vFP
496
The Neural Bases of Multisensory Processes
(even when the behavioral data demonstrate an influence on these task-irrelevant cues on target discrimination; see also Kincade et al. 2005). However, a different picture emerged when we used task-irrelevant auditory rather than visual cues (Santangelo et al. 2009). The experimental paradigm was analogous to the pure visual study described above, with a predictive endogenous cue followed by a nonpredictive exogenous cue (now auditory) and by the visual target, within each trial. The visual targets were presented in the left/right hemifields near to the subject’s face and the task-irrelevant auditory stimuli were delivered at corresponding external locations. The overall pattern of reaction times was similar to the visual study: both valid endogenous and valid exogenous cues speeded up responses, confirming cross-modal influences of the task-irrelevant auditory cues on the processing of the visual targets (McDonald et al. 2000). The fMRI data revealed the expected activation of vFP for “endogenously invalid” visual targets, demonstrating once again the role of these regions during reorienting toward task-relevant targets (e.g., Corbetta et al. 2000). But critically, now the side of the task-irrelevant auditory stimuli was found to modulate activity in the vFP. Activation of the right TPJ for endogenous-invalid trials diminished when the auditory cue was on the same side as the upcoming invalid target (e.g., endogenous cue left, exogenous auditory cue right, visual target right). Accordingly, task-irrelevant sounds that anticipate the position of the invalid visual target reduce reorienting-related activation in TPJ, demonstrating a “pure” stimulus-driven cross-modal spatial effect in the ventral attention control system (but see also Downar et al. 2001; Mayer et al. 2009). To summarize, multisensory studies of stimulus-driven attention showed that: (1) task-irrelevant stimuli in one modality modulate activity in sensory-specific areas concerned with a different modality, and they can do so in a spatially specific manner (e.g., boosting of activity in contralateral occipital cortex for touch and vision on the same side); (2) spatially specific cross-modal influences in sensory-specific areas take posture into account, suggesting indirect influences via higher-order areas; (3) control regions in vFP operate supramodally, activating during stimulus-driven spatial reorienting toward visual or tactile targets; (4) task-irrelevant auditory stimuli can modulate activity in vFP, revealing a “special status” of multisensory stimulus-driven control compared with unisensory visuospatial attention (cf. Natale et al. 2009). These findings call for an extension of site-source models of attention control, which should take into account the “special status” of multisensory stimuli. In particular, models of multisensory attention control should include pathways allowing nonvisual stimuli to reach the visual cortex and to influence activity in the ventral attention network irrespective of task-relevance. Figure 25.1b shows some of the hypothetical pathways that may mediate these effects. “Pathway 1” entails direct feedforward influences from auditory/somatosensory cortex into the vFP attention system. The presence of multisensory neurons in the temporo-parietal cortex and inferior premortor cortex (Bruce et al. 1981; Barraclough et al. 2005; Hyvarinen 1981; Dong et al. 1994; Graziano et al. 1997), plus activation of these regions for vision, audition, and touch in humans (Macaluso and Driver 2001; Bremmer et al. 2001; Beauchamp et al. 2004; Downar et al. 2000) is consistent with convergent multisensory projections into the vFP. A possible explanation for the effect of taskirrelevant auditory cues in TPJ (see Santangelo et al. 2009) is that feedforward pathways from the auditory cortex, unlike the pathway from occipital cortex, might not be under “task-related inhibitory influences” (see Figure 25.1a). The hypothesis of inhibitory influences on the visual, occipitalto-TPJ pathway was initially put forward by Corbetta and Shulman (2002) as a possible explanation for why task-irrelevant visual stimuli do not activate TPJ (see also Natale et al. 2009). More recently the same authors suggested that these inhibitory effects may arise from the middle frontal gyrus and/or via subcortical structures (locus coeruleus; for details on this topic, see Corbetta et al. 2008). Our finding of a modulatory effect by task-irrelevant audition in TPJ (Santangelo et al. 2009) suggests that these inhibitory effects may not apply in situations involving task-irrelevant stimuli in a modality other than vision. “Pathway 2” involves indirect influences of multisensory signals in the ventral FP network, via dorsal FP regions. Task-related modulations of the pathway between occipital cortex and TPJ are
Spatial Constraints in Multisensory Attention
497
thought to implicate the dFP network (Corbetta et al. 2008; see also the previous paragraph). Because multisensory stimuli can affect processing in the dorsal FP network (via feedforward convergence), these may in turn modify any influence that the dorsal network exerts on the ventral network (see also He et al. 2007, for an example of how changes/lesions of one attention network can affect functioning of the other network). This could comprise the abolishment of any inhibitory influence on (auditory) task-irrelevant stimuli. The involvement of dorsal FP areas may also be consistent with the finding that cross-modal effects in unisensory areas take posture into account. Postural signals modulate activity of neurons in many dFP regions (e.g., Andersen et al. 1997; Ben Hamed et al. 2002; Boussaoud et al. 1998; Bremmer et al. 1999; Kalaska et al. 1997). An indirect route via dFP could therefore combine sensory signals and postural information about eyes/head/body, yielding to cross-modal influences according to position in external space (cf. Stein and Steinford 2008; but note that postural signals are available in multisensory regions of the vFP network; Graziano et al. 1997; and the SC, Grossberg et al. 1997; see also Pouget et al. 2002 Deneve and Pouget 2004 for computational models on this issue). “Pathway 3” involves direct anatomical projections between sensory-specific areas that process stimuli in different modalities. These have been now reported in many animal studies (e.g., Falchier et al. 2002; Rockland and Ojima 2003; Cappe and Barone 2005) and could mediate automatic influences of one modality (e.g., touch) on activity in sensory-specific areas of a different modality (e.g., occipital visual cortex; see also Giard and Peronnet 1999; Kayser et al. 2005; Eckert et al. 2008). These connections between sensory-specific areas may provide fast, albeit spatially coarse, indications about the presence of a multisensory object or event in the external environment. In addition, a direct effect of audition or touch in occipital cortex could change the functional connectivity between occipital cortex and TPJ (see Indovina and Macaluso 2004), also determining stimulus-driven cross-modal influences in vFP. Finally, additional pathways are likely to involve subcortical structures (“pathways 4” in Figure 25.1b). Many different subcortical regions contain multisensory neurons and can influence cortical processing (e.g., superior colliculus, Meredith and Stein 1983; thalamus, Cappe et al. 2009; basal ganglia, Nagy et al. 2006). In addition, subcortical structures are important for spatial orienting (e.g., intermediate and deep layers SC are involved in the generation of overt saccadic responses; see also Frens and Van Opstal 1998, for a study on overt orienting to bimodal stimuli) and have been linked to selection processes in spatial attention (Shipp 2004). The critical role of SC for combining spatial information across sensory modalities has been also demonstrated in two recent behavioral studies (Maravita et al. 2008; Leo et al. 2008). These showed that superior behavioral performance for spatially aligned, same-side versus opposite-side audiovisual trials disappears when the visual stimuli are invisible to the SC (purple/blue stimuli).
25.5 POSSIBLE RELATIONSHIP BETWEEN SPATIAL ATTENTION AND MULTISENSORY INTEGRATION Regardless of the specific pathways involved (see preceding section), the finding that spatial information can be shared between multiple sensory-specific and multisensory areas even in condition of stimulus-driven automatic attention, suggests a possible relationship between attention control and the integration of space across sensory modalities. The central idea here is that attention may “broadcast” information about the currently relevant location between anatomically distant brain areas, thus providing a mechanism that coordinates spatial representations in different sensory modalities and implying some relationship between attention and multisensory integration. The functional relationship between attention and multisensory integration is very much debated and not understood yet (e.g., Talsma et al. 2007; McDonald et al. 2001; Alsius et al. 2005; Saito et al. 2005; Macaluso and Driver 2005; Driver and Spence 1998; Bertelson et al. 2000; Kayser et al. 2005). This is attributable—at least, to some extent—to the difficulty of defining univocal indexes of multisensory integration. Different authors have proposed and utilized a variety of measures to
498
The Neural Bases of Multisensory Processes
highlight interactions between stimuli in different senses. These include phenomenological measures such as the perception of multisensory illusions (e.g., as in the “McGurk” illusion, McGurk and MacDonald 1976; see also Soto-Faraco and Alsius 2009; or the “sound-bounce” illusion, Bushara et al. 2003), behavioral criteria based on violations of the Miller inequality (Miller 1982; see Tajadura-Jiménez et al. 2009, for an example), or physiological measures related to nonlinear effects in single-cell spiking activity (Meredith and Stein 1986b), EEG (Giard and Peronnet 1999), or fMRI (Calvert et al. 2001) signals. At present, there is still no consensus as most of these measures have drawbacks and no single index appears suitable for all possible experimental situations (for an extensive treatment, see Beauchamp 2005; Laurienti et al. 2005; Holmes 2009). In the case of cross-modal spatial cueing effects in stimulus-driven attention, the issue is further complicated by the fact that stimulus-driven effects are driven by changes in stimulus configuration (same vs. different position), which is also considered a critical determinant for multisensory integration (Meredith and Stein 1986b). Therefore, it is difficult to experimentally tease apart these two processes. In our initial study (Macaluso et al. 2000b), we showed boosting of activity in occipital cortex contralateral to the position of spatially congruent bimodal visuo-tactile stimuli that were presented simultaneously and for a relatively long duration (300 ms). McDonald et al. (2001) argued that these cross-modal influences may relate to multisensory interactions rather than spatial attention, as there was no evidence that task-irrelevant touch captured attention on the side of the visual target. However, this point is difficult to address because it is impossible to obtain behavioral evidence that exogenous cues—which by definition do not require any response—trigger shifts of spatial attention. A related argument was put forward suggesting that a minimum condition to disentangle attention versus integration is to introduce a gap between the offset of the cue and the onset of the target (McDonald et al. 2001). This should eliminate multisensory integration (the trial would never include simultaneous bimodal stimulation), while leaving spatial attentional effects intact (i.e., faster and more accurate behavioral responses for same-side vs. opposite-side trials). However, we have previously argued that criteria based on stimulus timing may be misleading because of differential response latencies and discharge proprieties of neurons in different regions of the brain (Macaluso et al. 2001). Thus, physically nonoverlapping stimuli (e.g., an auditory cue that precedes a visual target) may produce coactivation of a bimodal neuron that has shorter response latency for audition than for vision (e.g., see Meredith et al. 1987; for related findings using ERPs in humans; see also Meylan and Murray 2007). As an extension of the idea that the temporal sequence of events may be used to disentangle the role of attention and multisensory integration in stimulus-driven cross-modal cueing paradigms (McDonald et al. 2001), one may consider the timing of neuronal activation rather than the timing of the external stimuli. This can be addressed in the context of site-source models of attention (cf. Figure 25.1). Along these lines, Spence et al. (2004) suggested that if control regions activate before any modulation in sensory areas, this would speak for a key role of attention in cross-modal integration; meanwhile, if attentional control engages after cross-modal effects in sensory-specific areas, this would favor the view that multisensory integration takes place irrespective of attention. In the latter case, cross-modal cueing effects could be regarded as arising as a “consequence” of the integration process (see also Busse et al. 2005). Using ERP and dipole source localization in a stimulus-driven audiovisual cueing paradigm, McDonald and colleagues (2003) found that associative regions in the posterior temporal cortex activate before any cross-modal spatial effect in the visual cortex. In this study, there was a 17- to 217-ms gap between cue offset and target onset, and the analysis of the behavioral data showed increased perceptual sensitivity (d′) for valid compared to invalid trials. Accordingly, the authors suggested that the observed sequence of activation (including cross-modal influences of audition on visual ERPs) could be related to involuntary shifts of spatial attention. However, this study did not assess brain activity associated specifically with the exogenous cues, thus again not providing any direct evidence for cue-related shifts of attention. Using a different approach to investigate the dynamics of cross-modal influences in sensory areas, a recent fMRI study of functional connectivity showed that during processing of simultaneous audiovisual
Spatial Constraints in Multisensory Attention
499
streams, temporal areas causally influence activity in visual and auditory cortices, rather than the other way round (Noesselt et al. 2007). Thus, cross-modal boosting of activity in sensory-specific areas seems to arise because of backprojections from multisensory regions, emphasizing the causal role of high-order associative areas and consistent with some coupling between attention control and the sharing of spatial information across sensory modalities (which, depending on the definition, can be viewed as an index of multisensory integration). More straightforward approaches can be undertaken to investigate the relationship between endogenous attention and multisensory integration. Still pending on the specific definition of multisensory integration (see above), one may ask whether endogenous attention affects the way signals in different modalities interact with each other. For example, Talsma and Woldorff (2005) indexed multisensory integration using a supra-additive criterion on ERP amplitudes (AV > A + V), and tested whether this was different for stimuli at the endogenously attended versus unattended side (note that both vision and audition were task-relevant/attended in this experiment). Supra-additive responses for AV stimuli were found in frontal and centro-medial scalp sites. Critically, this effect was larger for stimuli at the attended than the unattended side, demonstrating some interplay between spatial endogenous attention and multisensory integration (see also the study of Talsma et al. 2007, who manipulated relevant-modality rather than relevant-location). In a similar vein, we have recently investigated the effect of selective visuospatial endogenous attention on the processing of audiovisual speech stimuli (Fairhall and Macaluso 2009). Subjects were presented visually with two “speaking mouths” simultaneously in the left and right visual fields. A central auditory stream (speaking voice) was congruent with one of the two visual stimuli (mouth reading the same tale’s passage) and incongruent with the other one (mouth reading a different passage). In different blocks, subjects were asked to attend either to the congruent or to the incongruent visual stimulus. In this way, we were able to keep the absolute level of multisensory information present in the environment constant, testing specifically for the effect of selective spatial attention to congruent or incongruent multisensory stimuli. The results showed that endogenous visuospatial attention can influence the processing of audiovisual stimuli, with greater activation for “attend to congruent” than “attend to incongruent” conditions. This interplay between attention and multisensory processing was found to affect brain activity at multiple stages, including highlevel regions in the superior temporal sulcus, subcortically in the superior colliculus, as well as in sensory-specific occipital visual cortex (V1 and V2). Endogenous attention has been found not only to boost multisensory processing, but also in some cases to reduce responses for attended versus unattended multisensory stimuli. For example, van Atteveldt and colleagues (2007) presented subjects with letter–sound pairs that were either congruent or incongruent. Under conditions of passive listening, activity increased in association cortex for congruent compared to incongruent presentations. However, this effect disappeared as soon as subjects were asked to perform an active “same/different” judgment with the letters and sounds. The authors suggested that voluntary top-down attention can overrule bottom-up multisensory interactions (see also Mozolic et al. 2008, on the effect of active attention to one modality during multisensory stimulation). In another study on audiovisual speech, Miller and D’Esposito (2005) dissociated patterns of activation related to physical stimulus attributes (synchronous vs. asynchronous stimuli) and perception (“fused” vs. “unfused” percept). This showed that active perception leads to increases in activity in the auditory cortex and the superior temporal sulcus for fused audiovisual stimuli, whereas in the SC activity decreased for synchronous vs. asynchronous stimuli, irrespective of perception. These results indicate that constraints of multisensory integration may change as a function of endogenous factors (fused/unfused percept), for example, with synchronous audiovisual stimuli reducing rather than increasing activity in the SC (cf. Miller and D’Esposito 2005 and Meredith et al. 1987). Another approach to investigate the relationship between endogenous attention and multisensory integration is to manipulate the attentional load of a primary task and to assess how this influences multisensory processing. The underlying idea is that if a single/common pool of neural resources
500
The Neural Bases of Multisensory Processes
mediates both processes, increasing the amount of resources spent on a primary attentional task should lead to some changes in the processing of the multisensory stimuli. On the contrary, if multisensory integration does not depend on endogenous attention, changes in the attentional task should not have any influence on multisensory processing. We used this approach to investigate the possible role of endogenous visuospatial attention for the integration of visuo-tactile stimuli (Zimmer and Macaluso 2007). We indexed multisensory integration comparing same-side versus opposite-side visual–tactile stimuli and assessing activity enhancement in contralateral occipital cortex for the same-side condition (cf. Figure 25.3a). These visual and tactile stimuli were fully task-irrelevant and did not require any response. Concurrently, we asked subjects to perform a primary endogenous visuospatial attention task. This entailed either attending to central fixation (low load) or sustaining visuospatial covert attention to a location above fixation to detect subtle orientation changes in a grating patch (high load; see Figure 25.3b). The results showed cross-modal enhancements in the contralateral visual cortex for spatially congruent trials, irrespective of the level of endogenous load (see signal plots in Figure 25.3b). These findings suggest that the processing of visuo-tactile spatial congruence in visual cortex can be uncoupled from endogenous visuospatial attention control (see also Mathiak et al. 2005, for a magnetoencephalography study reporting related findings in auditory cortex). In summary, direct investigation of the possible relationship between attention control and multisensory integration revealed that voluntary attention to multisensory stimuli or changing the task relevance of the unisensory components of a multisensory stimulus (attend to one modality, to both, or to neither) can affect multisensory interactions. This indicates that—to some extent—attention control and multisensory integration make use of a shared pool of processing resources. However, when both components of a multisensory stimulus are fully task-irrelevant, changes in cognitive load in a separate task does not affect the integration of the multisensory input (at least for the load manipulations reported by Zimmer et al. 2007; Mathiak et al. 2005). Taken together, these findings suggest that multisensory interactions can occur at multiple levels of processing, and that different constraints apply depending on the relative weighting of stimulusdriven and endogenous attentional requirements. This multifaceted scenario can be addressed in the context of models of spatial attention control that include multiple routes for the interaction of signals in different modalities (see Figure 25.1b). It can be hypothesized that some of these pathways (or network’s nodes) are under the modulatory influence of endogenous and/or stimulus-driven attention. For instance, cross-modal interactions that involve dorsal FP areas are likely to be subject to endogenous and task-related attentional factors (e.g., see Macaluso et al. 2002b). Conversely, stimulus-driven factors may influence multisensory interactions that take place within or via the ventral FP system (e.g., Santangelo et al. 2009). Direct connections between sensory-specific areas should be—at least in principle—fast, automatic, and preattentive (Kayser et al. 2005), although attentional influences may then superimpose on these (e.g., see Talsma et al. 2007). Some interplay between spatial attention and multisensory processing can take place also in subcortical areas, as demonstrated by attentional modulation there (Fairhall et al. 2008; see also Wallace and Stein 1994; Wilkinson et al. 1996, for the role of cortical input on multisensory processing in the SC).
25.6 CONCLUSIONS Functional imaging studies of multisensory spatial attention revealed a complex interplay between effects associated with the external stimulus configuration (e.g., spatially congruent vs. incongruent multisensory input) and endogenous task requirements. Here, I propose that these can be addressed in the context of “site-source” models of attention that include control regions in dorsal and vFP associative cortex, connected via feedforward and feedback projections with sensoryspecific areas (plus subcortical regions). This architecture permits sharing spatial information across multiple brain regions that represent space (unisensory, multisensory, plus motor representations). Spatial attention and the selection of currently relevant location result from the dynamic
Spatial Constraints in Multisensory Attention
501
interplay between the nodes of this network, with both stimulus-driven and endogenous factors influencing the relative contribution of each node and pathway. I propose that the coordination of activity within this complex network underlies the integration of space across modalities, producing a sensory–motor system that allows us to perceive and act within a unified representation of external space. In this framework, future studies may seek to better specify the dynamics of this network. A key issue concerns possible causal links between activation of some parts of the network and attention/integration effects in other parts of the network. This relationship is indeed a main feature of the “sites-sources” distinction emphasized in this model. This can be addressed in several ways. Transcranic magnetic stimulation (TMS) can be used to transiently knock out one node of the network during multisensory attention tasks, revealing the precise timing of activation of each network’s node. Using this approach, Chambers and colleagues (2004a) identified two critical windows for the activation of inferior parietal cortex during visuospatial reorienting, and demonstrated the involvement of the same region (the angular gyrus) for stimulus-driven visuo-tactile spatial interactions (Chambers et al. 2007; but see also Chambers et al. 2004b, for modality-specific effects). TMS was also used to demonstrate the central role of posterior parietal cortex for spatial remapping between vision and touch (Bolognini and Maravita 2007) and to infer direct influences of auditory input on human visual cortex (Romei et al. 2007). Most recently, TMS has been combined with fMRI, which allows investigating the causal influence of one area (e.g., frontal or parietal regions) on activity in other areas (e.g., sensory-specific visual areas; see Ruff et al. 2006; and Bestmann et al. 2008, for review). These studies may be extended to multisensory attention paradigms, looking for the coupling between fronto-parietal attention control regions and sensory areas as a function of the type of input (unisensory or multisensory, spatially congruent or incongruent). Task-related changes in functional coupling between brain areas can also be assessed using analyses of effective connectivity (e.g., dynamic causal modeling; Stephan et al. 2007). These have been successfully applied to both fMRI and ERP data in multisensory experiments, showing causal influences of associative areas in parietal and temporal cortex on sensory processing in the visual cortex (Moran et al. 2008; Noesselt et al. 2007; Kreifelts et al. 2007). Future studies may combine attentional manipulations (e.g., the direction of endogenous attention) and multisensory stimuli (e.g., spatially congruent vs. incongruent multisensory input), providing additional information on the causal role of top-down and bottom-up influences for the formation of an integrated system that represents space across sensory modalities.
REFERENCES Alsius, A., J. Navarra, R. Campbell, and S. Soto-Faraco. 2005. Audiovisual integration of speech falters under high attention demands. Curr Biol 15: 839–843. Andersen, R. A., L. H. Snyder, D. C. Bradley, and J. Xing. 1997. Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annu Rev Neurosci 20: 303–330. Arrington, C. M., T. H. Carr, A. R. Mayer, and S. M. Rao. 2000. Neural mechanisms of visual attention: Objectbased selection of a region in space. J Cogn Neurosci 2: 106–117. Barraclough, N. E., D. Xiao, C. I. Baker, M. W. Oram, and D. I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci 17: 377–391. Beauchamp, M. S. 2005. Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics 3: 93–113. Beauchamp, M. S., B. D. Argall, J. Bodurka, J. H. Duyn, and A. Martin. 2004. Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nat Neurosci 7: 1190–1192. Ben Hamed, S., J. R. Duhamel, F. Bremmer, and W. Graf. 2001. Representation of the visual field in the lateral intraparietal area of macaque monkeys: A quantitative receptive field analysis. Exp Brain Res 140: 127–144. Ben Hamed, S., and J. R. Duhamel. 2002. Ocular fixation and visual activity in the monkey lateral intraparietal area. Exp Brain Res 142: 512–528.
502
The Neural Bases of Multisensory Processes
Bertelson, P., J. Vroomen, B. de Gelder, and J. Driver. 2000. The ventriloquist effect does not depend on the direction of deliberate visual attention. Percept Psychophys 62: 321–332. Bestmann, S., C. C. Ruff, F. Blankenburg, N. Weiskopf, J. Driver, and J. C. Rothwell. 2008. Mapping causal interregional influences with concurrent TMS-fMRI. Exp Brain Res 191: 383–402. Bolognini, N., and A. Maravita. 2007. Proprioceptive alignment of visual and somatosensory maps in the posterior parietal cortex. Curr Biol 17: 1890–1895. Boussaoud, D., C. Jouffrais, and F. Bremmer. 1998. Eye position effects on the neuronal activity of dorsal premotor cortex in the macaque monkey. J Neurophysiol 80: 1132–1150. Bremmer, F., W. Graf, S. Ben Hamed, and J. R. Duhamel. 1999. Eye position encoding in the macaque ventral intraparietal area (VIP). Neuroreport 10: 873–878. Bremmer, F., A. Schlack, N. J. Shah, O. Zafiris, M. Kubischik, K. Hoffmann et al. 2001. Polymodal motion processing in posterior parietal and premotor cortex: A human fMRI study strongly implies equivalencies between humans and monkeys. Neuron 29: 287–296. Bressler, S. L., W. Tang, C. M. Sylvester, G. L. Shulman, and M. Corbetta. 2008. Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. J Neurosci 28: 10056–10061. Bruce, C., R. Desimone, and C. G. Gross. 1981. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J Neurophysiol 46: 369–384. Bushara, K. O., T. Hanakawa, I. Immisch, K. Toma, K. Kansaku, and M. Hallett. 2003. Neural correlates of cross-modal binding. Nat Neurosci 6:190–195. Busse, L., K. C. Roberts, R. E. Crist, D. H. Weissman, and M. G. Woldorff. 2005. The spread of attention across modalities and space in a multisensory object. Proc Natl Acad Sci USA 102: 18751–18756. Calvert, G. A., P. C. Hansen, S. D. Iversen, and M. J. Brammer. 2001. Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage 14: 427–438. Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. Eur J Neurosci 22: 2886–2902. Cappe, C., A. Morel, P. Barone, and E. M. Rouiller. 2009. The thalamocortical projection systems in primate: an anatomical support for multisensory and sensorimotor interplay. Cereb Cortex 19: 2025–2037. Chambers, C. D., J. M. Payne, and J. B. Mattingley. 2007. Parietal disruption impairs reflexive spatial attention within and between sensory modalities. Neuropsychologia 45: 1715–1724. Chambers, C. D., J. M. Payne, M. G. Stokes, and J. B. Mattingley. 2004a. Fast and slow parietal pathways mediate spatial attention. Nat Neurosci 7: 217–218. Chambers, C. D., M. G. Stokes, and J. B. Mattingley. 2004b. Modality-specific control of strategic spatial attention in parietal cortex. Neuron 44: 925–930. Ciaramitaro, V. M., G. T. Buracas, and G. M. Boynton. 2007. Spatial and cross-modal attention alter responses to unattended sensory information in early visual and auditory human cortex. J Neurophysiol 98: 2399–2413. Corbetta, M., J. M. Kincade, J. M., Ollinger, M. P. McAvoy, and G. L. Shulman. 2000. Voluntary orienting is dissociated from target detection in human posterior parietal cortex. Nat Neurosci 3: 292–297. Corbetta, M., G. Patel, and G. L. Shulman. 2008. The reorienting system of the human brain: From environment to theory of mind. Neuron 58: 306–324. Corbetta, M., and G. L. Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3: 215–229. Corbetta, M., A. P. Tansy, C. M. Stanley, S. V. Astafiev, A. Z. Snyder, and G. L. Shulman. 2005. A functional MRI study of preparatory signals for spatial location and objects. Neuropsychologia 43: 2041–2056. Deneve, S., and A. Pouget. 2004. Bayesian multisensory integration and cross-modal spatial links. J Physiol Paris 98: 249–258. Desimone, R., and J. Duncan. 1995. Neural mechanisms of selective visual attention. Annl Rev Neurosci 18: 193–222. Dong, W. K., E. H. Chudler, K. Sugiyama, V. J. Roberts, and T. Hayashi. 1994. Somatosensory, multisensory, and task-related neurons in cortical area 7b (PF) of unanesthetized monkeys. J Neurophysiol 72: 542–564. Downar, J., A. P. Crawley, D. J. Mikulis, and K. D. Davis. 2000. A multimodal cortical network for the detection of changes in the sensory environment. Nat Neurosci 3: 277–283. Downar, J., A. P. Crawley, D. J. Mikulis, and K. D. Davis. 2001. The effect of task relevance on the cortical response to changes in visual and auditory stimuli: An event-related fMRI study. Neuroimage 14: 1256–1267. Driver, J., and C. Spence. 1998. Attention and the crossmodal construction of space. Trends Cogn Sci 2: 254–262.
Spatial Constraints in Multisensory Attention
503
Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent visual and somatic response properties. J Neurophysiol 79: 126–136. Eckert, M. A., N. V. Kamdar, C. E. Chang, C. F. Beckmann, M. D. Greicius, and V. Menon. 2008. A crossmodal system linking primary auditory and visual cortices: Evidence from intrinsic fMRI connectivity analysis. Hum Brain Mapp 29: 848–857. Eimer, M. 1999. Can attention be directed to opposite locations in different modalities? An ERP study. Clin Neurophysiol 110: 1252–1259. Eimer, M., and J. Driver. 2000. An event-related brain potential study of cross-modal links in spatial attention between vision and touch. Psychophysiology 37: 697–705. Eimer, M., and J. Driver. 2001. Crossmodal links in endogenous and exogenous spatial attention: Evidence from event-related brain potential studies. Neurosci Biobehav Rev 25: 497–511. Eimer, M., and J. van Velzen. 2002. Crossmodal links in spatial attention are mediated by supramodal control processes: Evidence from event-related potentials. Psychophysiology 39: 437–449. Eimer, M., J. van Velzen, and J. Driver. 2002. Cross-modal interactions between audition, touch, and vision in endogenous spatial attention: ERP evidence on preparatory states and sensory modulations. J Cogn Neurosci 14: 254–271. Fairhall, S. L., and E. Macaluso. 2009. Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. Eur J Neurosci 29: 1247–1257. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2002. Anatomical evidence of multimodal integration in primate striate cortex. J Neurosci 22: 5749–5759. Farah, M. J., A. B. Wong, M. A. Monheit, and L. A. Morrow. 1989. Parietal lobe mechanisms of spatial attention: Modality-specific or supramodal? Neuropsychologia 27: 461–470. Fasold, O., J. Heinau, M. U. Trenner, A. Villringer, and R. Wenzel. 2008. Proprioceptive head posture-related processing in human polysensory cortical areas. Neuroimage 40: 1232–1242. Frens, M. A., and A. J. Van Opstal. 1998. Visual–auditory interactions modulate saccade-related activity in monkey superior colliculus. Brain Res Bull 46: 211–224. Giard, M. H., and F. Peronnet. 1999. Auditory–visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. J Cogn Neurosci 11: 473–490. Graziano, M. S., and C. G. Gross. 1993. A bimodal map of space: Somatosensory receptive fields in the macaque putamen with corresponding visual receptive fields. Exp Brain Res 97: 96–109. Graziano, M. S., and C. G. Gross. 1995. The representation of extrapersonal space: A possible role for bimodal, visuo-tactile neurons. In The Cognitive Neurosciences, ed. M. S. Gazzaniga, 1021–1034. Cambridge, MA: MIT Press. Graziano, M. S., X. T. Hu, and C. G. Gross. 1997. Visuospatial properties of ventral premotor cortex. J Neurophysiol 77: 2268–2292. Green, J. J., and J. J. McDonald. 2008. Electrical neuroimaging reveals timing of attentional control activity in human brain. PLoS Biol 6: 81. Green, J. J., W. A. Teder-Salejarvi, and J. J. McDonald. 2005. Control mechanisms mediating shifts of attention in auditory and visual space: A spatio-temporal ERP analysis. Exp Brain Res 166: 358–369. Gross, C. G., and M. S. Graziano. 1995. Multiple representations of space in the brain. The Neuroscientist 1: 43–50. Grossberg, S., K. Roberts, M. Aguilar, and D. Bullock. 1997. A neural model of multimodal adaptive saccadic eye movement control by superior colliculus. J Neurosci 17: 9706–9725. Hagler Jr., D. J., and M. I. Sereno. 2006. Spatial maps in frontal and prefrontal cortex. Neuroimage 29: 567–577. He, B. J., A. Z. Snyder, J. L. Vincent, A. Epstein, G. L. Shulman, and M. Corbetta. 2007. Breakdown of functional connectivity in frontoparietal networks underlies behavioral deficits in spatial neglect. Neuron 53: 905–918. Heinze�������������������������������������������������������������������������������������������������� , H. J., G. R. Mangun, W. Burchert, H. Hinrichs, M. Scholz, T. F. Munte et al. 1994. ������������� Combined spatial and temporal imaging of brain activity during visual selective attention in humans. Nature 372: 543–546. Holmes, N. P. 2009. The principle of inverse effectiveness in multisensory integration: Some statistical considerations. Brain Topogr 21: 168–176. Hopfinger, J. B., M. H. Buonocore, and G. R. Mangun. 2000. The neural mechanisms of top-down attentional control. Nat Neurosci 3: 284–291. Hotting, K., F. Rosler, and B. Roder. 2003. Crossmodal and intermodal attention modulate event-related brain potentials to tactile and auditory stimuli. Exp Brain Res 148: 26–37. Hyvarinen, J. 1981. Regional distribution of functions in parietal association area 7 of the monkey. Brain Res 206: 287–303.
504
The Neural Bases of Multisensory Processes
Indovina, I., and E. Macaluso, E. 2004. Occipital–parietal interactions during shifts of exogenous visuospatial attention: Trial-dependent changes of effective connectivity. Magn Reson Imaging 22: 1477–1486. Indovina, I., and E. Macaluso. 2007. Dissociation of stimulus relevance and saliency factors during shifts of visuospatial attention. Cereb Cortex 17: 1701–1711. Kalaska, J. F., S. H. Scott, P. Cisek, and L. E. Sergio. 1997. Cortical control of reaching movements. Curr Opin Neurobiol 7: 849–859. Kastner, S., M. A. Pinsk, P. De Weerd, R. Desimone, and L. G. Ungerleider. 1999. Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron 22: 751–761. Kastner, S., and L. G. Ungerleider. 2001. The neural basis of biased competition in human visual cortex. Neuropsychologia 39: 1263–1276. Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48: 373–384. Kelley, T. A., J. T. Serences, B. Giesbrecht, and S. Yantis. 2008. Cortical mechanisms for shifting and holding visuospatial attention. Cereb Cortex 18: 114–325. Kennett, S., M. Eimer, C. Spence, and J. Driver. 2001. Tactile–visual links in exogenous spatial attention under different postures: Convergent evidence from psychophysics and ERPs. J Cogn Neurosci 13: 462–478. Kida, T., K. Inui, T. Wasaka, K. Akatsuka, E. Tanaka, and R. Kakigi. 2007. Time-varying cortical activations related to visual–tactile cross-modal links in spatial selective attention. J Neurophysiol 97: 3585–3596. Kincade, J. M., R. A. Abrams, S. V. Astafiev, G. L. Shulman, and M. Corbetta. 2005. An event-related functional magnetic resonance imaging study of voluntary and stimulus-driven orienting of attention. J Neurosci 25: 4593–4604. Kinsbourne, M. 1970. The cerebral basis of lateral asymmetries in attention. Acta Psychol (Amst) 33: 193–201. Kreifelts, B., T. Ethofer, W. Grodd, M. Erb, and D. Wildgruber. 2007. Audiovisual integration of emotional signals in voice and face: An event-related fMRI study. Neuroimage 37: 1445–1456. Laurienti, P. J., J. H. Burdette, M. T. Wallace, Y. F. Yen, A. S. Field, and B. E. Stein. 2002. Deactivation of sensory-specific cortex by cross-modal stimuli. J Cogn Neurosci 14: 420–429. Laurienti, P. J., T. J. Perrault, T. R. Stanford, M. T. Wallace, and B. E. Stein. 2005. On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Exp Brain Res 166: 289–297. Leo, F., C. Bertini, G. di Pellegrino, and E. Ladavas. 2008. Multisensory integration for orienting responses in humans requires the activation of the superior colliculus. Exp Brain Res 186: 67–77. Lewis, J. W., M. S. Beauchamp, and E. A. DeYoe. 2000. A comparison of visual and auditory motion processing in human cerebral cortex. Cereb Cortex 10: 873–888. Luck, S. J., L. Chelazzi, S. A. Hillyard, and R. Desimone. 1997. Neural mechanisms of spatial selective attention in areas V1, V2, and V4 of macaque visual cortex. J Neurophysiol 77: 24–42. Macaluso, E., and J. Driver. 2001. Spatial attention and crossmodal interactions between vision and touch. Neuropsychologia 39: 1304–1316. Macaluso, E., and J. Driver. 2005. Multisensory spatial interactions: A window onto functional integration in the human brain. Trends Neurosci 28: 264–271. Macaluso, E., J. Driver, and C. D. Frith. 2003a. Multimodal spatial representations engaged in human parietal cortex during both saccadic and manual spatial orienting. Curr Biol 13: 990–999. Macaluso, E., M. Eimer, C. D. Frith, and J. Driver. 2003b. Preparatory states in crossmodal spatial attention: Spatial specificity and possible control mechanisms. Exp Brain Res 149: 62–74. Macaluso, E., C. Frith, and J. Driver. 2000a. Selective spatial attention in vision and touch: Unimodal and multimodal mechanisms revealed by PET. J Neurophysiol 83: 3062–3075. Macaluso, E., C. D. Frith, and J. Driver. 2005. Multisensory stimulation with or without saccades: fMRI evidence for crossmodal effects on sensory-specific cortices that reflect multisensory location-congruence rather than task-relevance. Neuroimage 26: 414–425. Macaluso, E., C. D. Frith, and J. Driver. 2001. Multisensory integration and crossmodal attention effects in the human brain. Science [Technical response] 292: 1791. Macaluso, E., C. D. Frith, and J. Driver. 2002a. Crossmodal spatial influences of touch on extrastriate visual areas take current gaze direction into account. Neuron 34: 647–658. Macaluso, E., C. D. Frith, and J. Driver. 2002b. Directing attention to locations and to sensory modalities: Multiple levels of selective processing revealed with PET. Cereb Cortex 12: 357–368. Macaluso, E., C. D. Frith, and J. Driver. 2002c. Supramodal effects of covert spatial orienting triggered by visual or tactile events. J Cogn Neurosci 14: 389–401.
Spatial Constraints in Multisensory Attention
505
Macaluso, E., C. D. Frith, and J. Driver. 2000b. Modulation of human visual cortex by crossmodal spatial attention. Science 289: 1206–1208. Maravita, A., N. Bolognini, E. Bricolo, C. A. Marzi, and S. Savazzi. 2008. Is audiovisual integration subserved by the superior colliculus in humans? Neuroreport 19: 271–275. Martinez, A., L. Anllo-Vento, M. I. Sereno, L. R. Frank, R. B. Buxton, D. J. Dubowitz et al. 1999. Involvement of striate and extrastriate visual cortical areas in spatial attention. Nat Neurosci 2: 364–369. Massaro, D. W. 1999. Speechreading: Illusion or window into pattern recognition. Trends Cogn Sci 3: 310–317. Mathiak, K., I. Hertrich, M. Zvyagintsev, W. Lutzenberger, and H. Ackermann. 2005. Selective influences of cross-modal spatial-cues on preattentive auditory processing: A whole-head magnetoencephalography study. Neuroimage 28: 627–634. Mayer, A. R., A. R. Franco, and D. L. Harrington. 2009. Neuronal modulation of auditory attention by informative and uninformative spatial cues. Hum Brain Mapp 30: 1652–1666. Mayer, A. R., D. Harrington, J. C. Adair, and R. Lee. 2006. The neural networks underlying endogenous auditory covert orienting and reorienting. Neuroimage 30: 938–949. McDonald, J. J., W. A. Teder-Salejarvi, F. Di Russo, and S. A. Hillyard. 2003. Neural substrates of perceptual enhancement by cross-modal spatial attention. J Cogn Neurosci 15: 10–19. McDonald, J. J., W. A. Teder-Salejarvi, and S. A. Hillyard. 2000. Involuntary orienting to sound improves visual perception. Nature 407: 906–908. McDonald, J. J., W. A. Teder-Salejarvi, and L. M. Ward. 2001. Multisensory integration and crossmodal attention effects in the human brain. Science 292: 1791. McDonald, J. J., and L. M. Ward. 2000. Involuntary listening aids seeing: Evidence from human electrophysiology. Psychol Sci 11: 167–171. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264: 746–748. Meredith, M. A., J. W. Nemitz, and B. E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons: I. Temporal factors. J Neurosci 7: 3215–3229. Meredith, M. A., and B. E. Stein. 1996. Spatial determinants of multisensory integration in cat superior colliculus neurons. J Neurophysiol 75: 1843–1857. Meredith, M. A., and B. E. Stein. 1986a. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J Neurophysiol 56: 640–662. Meredith, M. A., and B. E. Stein. 1986b. Spatial factors determine the activity of multisensory neurons in cat superior colliculus. Brain Res 365: 350–354. Meredith, M. A., and B. E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221: 389–391. Meyer, M., S. Baumann, S. Marchina, and L. Jancke. 2007. Hemodynamic responses in human multisensory and auditory association cortex to purely visual stimulation. BMC Neurosci 8: 14. Meylan, R. V., and M. M. Murray. 2007. Auditory–visual multisensory interactions attenuate subsequent visual responses in humans. Neuroimage 35: 244–254. Miller, J. 1982. Discrete versus continuous stagemodels of human information processing: In search of partial output. Exp Psychol Hum Percept Perform 8: 273–296. Miller, L. M., and M. D’Esposito. 2005. Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci 25: 5884–5893. Moore, T. 2006. The neurobiology of visual attention: Finding sources. Curr Opin Neurobiol 16: 159–165. Moran, R. J., S. Molholm, R. B. Reilly, and J. J. Foxe. 2008. Changes in effective connectivity of human superior parietal lobule under multisensory and unisensory stimulation. Eur J Neurosci 27: 2303–2312. Mozolic, J. L., D. Joyner, C. E. Hugenschmidt, A. M. Peiffer, R. A. Kraft, J. A. Maldjian et al. 2008. Crossmodal deactivations during modality-specific selective attention. BMC Neurol 8: 35. Nagy, A., G. Eordegh, Z. Paroczy, Z. Markus, and G. Benedek. 2006. Multisensory integration in the basal ganglia. Eur J Neurosci 24: 917–924. Natale, E., C. A. Marzi, and E. Macaluso. 2009. FMRI correlates of visuo-spatial reorienting investigated with an attention shifting double-cue paradigm. Hum Brain Mapp 30: 2367–2381. Nobre, A. C., J. T. Coull, C. D. Frith, and M. M. Mesulam. 1999. Orbitofrontal cortex is activated during breaches of expectation in tasks of visual attention. Nat Neurosci 2: 11–12. Noesselt, T., J. W. Rieger, M. A. Schoenfeld, M. Kanowski, H. Hinrichs, H. J. Heinze et al. 2007. Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. J Neurosci 27: 11431–11441. Pessoa, L., S. Kastner, and L. G. Ungerleider. 2003. Neuroimaging studies of attention: From modulation of sensory processing to top-down control. J Neurosci 23: 3990–3998.
506
The Neural Bases of Multisensory Processes
Posner, M. I. 1980. Orienting of attention. Q J Exp Psychol 32: 3–25. Posner, M. I., J. A. Walker, F. J. Friedrich, and R. D. Rafal. 1984. Effects of parietal injury on covert orienting of attention. J Neurosci 4: 1863–1874. Pouget, A., S. Deneve, and J. R. Duhamel. 2002. A computational perspective on the neural basis of multisensory spatial representations. Nat Rev Neurosci 3: 741–747. Rockland, K. S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. Int J Psychophysiol 50: 19–26. Romei, V., M. M. Murray, L. B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions. J Neurosci 27: 11465–11472. Ruff, C. C., F. Blankenburg, O. Bjoertomt, S. Bestmann, E. Freeman, J. D. Haynes et al. 2006. Concurrent TMS-fMRI and psychophysics reveal frontal influences on human retinotopic visual cortex. Curr Biol 16: 1479–1488. Saito, D. N., K. Yoshimura, T. Kochiyama, T. Okada, M. Honda, and N. Sadato. 2005. Cross-modal binding and activated attentional networks during audio-visual speech integration: A functional MRI study. Cereb Cortex 15: 1750–1760. Santangelo, V., M. O. Belardinelli, C. Spence, and E. Macaluso. 2009. Interactions between voluntary and stimulusdriven spatial attention mechanisms across sensory modalities. J Cogn Neurosci 21: 2384–2397. Saygin, A. P., and M. I. Sereno. 2008. Retinotopy and attention in human occipital, temporal, parietal, and frontal cortex. Cereb Cortex 18: 2158–2168. Sereno, M. I., and R. S. Huang. 2006. A human parietal face area contains aligned head-centered visual and tactile maps. Nat Neurosci 9: 1337–1343. Sereno, M. I., S. Pitzalis, and A. Martinez. 2001. Mapping of contralateral space in retinotopic coordinates by a parietal cortical area in humans. Science 294: 1350–1354. Shipp, S. 2004. The brain circuitry of attention. Trends Cogn Sci 8: 223–230. Shomstein, S., and S. Yantis. 2006. Parietal cortex mediates voluntary control of spatial and nonspatial auditory attention. J Neurosci 26: 435–439. Sommer, M. A., and R. H. Wurtz. 2000. Composition and topographic organization of signals sent from the frontal eye field to the superior colliculus. J Neurophysiol 83: 1979–2001. Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Exp Psychol Hum Percept Perform 35: 580–587. Spence, C., and Driver, J. 1996. Audiovisual links in endogenous covert spatial attention. J Exp Psychol Hum Percept Perform 22: 1005–1030. Spence, C., J. J. McDonald, and J. Driver. 2004. Exogenous spatial-cuing studies of human cross-modal attention and multisensory integration. In: Crossmodal space and crossmodal attention, ed. C. Spence and J. Driver, 277–320. Oxford: Oxford Univ. Press. Spence, C., M. E. Nicholls, N. Gillespie, and J. Driver. 1998. Cross-modal links in exogenous covert spatial orienting between touch, audition, and vision. Percept Psychophys 60: 544–557. Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press. Stein, B. E., and T. R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single neuron. Nat Rev Neurosci 9: 255–266. Stephan, K. E., L. M. Harrison, S. J. Kiebel, O. David, W. D. Penny, and K. J. Friston. 2007. Dynamic causal models of neural system dynamics: Current state and future extensions. J Biosci 32: 129–144. Talsma, D., T. J. Doty, and M. G. Woldorff. 2007. Selective attention and audiovisual integration: Is attending to both modalities a prerequisite for early integration? Cereb Cortex 17: 679–690. Talsma, D., and M.G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of effects on the evoked brain activity. J Cogn Neurosci 17: 1098–1114. Tajadura-Jiménez, A., N. Kitagawa, A. Väljamäe, M. Zampini, M. M. Murray, and C. Spence. 2009. Auditory– somatosensory multisensory interactions are spatially modulated by stimulated body surface and acoustic spectra. Neuropsychologia 47: 195–203. Teder-Salejarvi, W. A., F. Di Russo, J. J. McDonald, and S. A. Hillyard. 2005. Effects of spatial congruity on audio-visual multimodal integration. J Cogn Neurosci 17: 1396–1409. Teder-Salejarvi, W. A., T. F. Munte, F. Sperlich, and S. A. Hillyard. 1999. Intra-modal and cross-modal spatial attention to auditory and visual stimuli. An event-related brain potential study. Cogn Brain Res 8: 327–343. Tootell, R. B., M. S. Silverman, E. Switkes, and R. L. De Valois. 1982. Deoxyglucose analysis of retinotopic organization in primate striate cortex. Science 218: 902–904. Treisman, A. M., and G. Gelade. 1980. A feature-integration theory of attention. Cogn Psychol. 12: 97–136.
Spatial Constraints in Multisensory Attention
507
Trenner, M. U., H. R. Heekeren, M. Bauer, K. Rossner, R. Wenzel, A. Villringer et al. 2008. What happens in between? Human oscillatory brain activity related to crossmodal spatial cueing. PLoS ONE 3: 1467. van Atteveldt, N. M., E. Formisano, R. Goebel, and L. Blomert. 2007. Top-down task effects overrule automatic multisensory responses to letter–sound pairs in auditory association cortex. Neuroimage 36: 1345–1360. Vandenberghe, R., D. R. Gitelman, T. B. Parrish, and M. M. Mesulam. 2001. Functional specificity of superior parietal mediation of spatial shifting. Neuroimage 14: 661–673. Vossel, S., C. M. Thiel, and G. R. Fink. 2006. Cue validity modulates the neural correlates of covert endogenous orienting of attention in parietal and frontal cortex. Neuroimage 32: 1257–1264. Wallace, M. T., J. G. McHaffie, and B. E. Stein. 1997. Visual response properties and visuotopic representation in the newborn monkey superior colliculus. J Neurophysiol 78: 2732–2741. Wallace, M. T., and B. E. Stein, 1994. Cross-modal synthesis in the midbrain depends on input from cortex. J Neurophysiol 71: 429–432. Wilkinson, L. K., M. A. Meredith, and B. E. Stein. 1996. The role of anterior ectosylvian cortex in crossmodality orientation and approach behavior. Exp Brain Res 112: 1–10. Wu, C. T., D. H. Weissman, K. C. Roberts, and M. G. Woldorff. 2007. The neural circuitry underlying the executive control of auditory spatial attention. Brain Res 1134: 187–198. Yantis, S., J. Schwarzbach, J. T. Serences, R. L. Carlson, M. A. Steinmetz, J. J. Pekar et al. 2002. Transient neural activity in human parietal cortex during spatial attention shifts. Nat Neurosci 5: 995–1002. Zimmer, U., and E. Macaluso. 2007. Processing of multisensory spatial congruency can be dissociated from working memory and visuo-spatial attention. Eur J Neurosci 26: 1681–1691.
26
Cross-Modal Spatial Cueing of Attention Influences Visual Perception John J. McDonald, Jessica J. Green, Viola S. Störmer, and Steven A. Hillyard
CONTENTS 26.1 Spatial Attention: Modality-Specific or Supramodal?..........................................................509 26.2 Involuntary Cross-Modal Spatial Attention Enhances Perceptual Sensitivity...................... 511 26.3 Involuntary Cross-Modal Spatial Attention Modulates Time-Order Perception.................. 512 26.4 Beyond Temporal Order: The Simultaneity Judgment Task................................................. 516 26.5 Involuntary Cross-Modal Spatial Attention Alters Appearance........................................... 518 26.6 Possible Mechanisms of Cross-Modal Cue Effects............................................................... 520 26.7 Conclusions and Future Directions........................................................................................ 523 References....................................................................................................................................... 523
26.1 SPATIAL ATTENTION: MODALITY-SPECIFIC OR SUPRAMODAL? It has long been known that “looking out of the corner of one’s eye” can influence the processing of objects in the visual field. One of the first experimental demonstrations of this effect came from Hermann von Helmholtz, who, at the end of the nineteenth century, demonstrated that he could identify letters in a small region of a briefly illuminated display if he directed his attention covertly (i.e., without moving his eyes) toward that region in advance (Helmholtz 1866). Psychologists began to study this effect systematically in the 1970s using the spatial-cueing paradigm (Eriksen and Hoffman 1972; Posner 1978). Across a variety of speeded response tasks, orienting attention to a particular location in space was found to facilitate responses to visual targets that appeared at the cued location. Benefits in speeded visual performance were observed when attention was oriented voluntarily (endogenously, in a goal-driven manner) in response to a spatially predictive symbolic visual cue or involuntarily (exogenously, in a stimulus-driven manner) in response to a spatially nonpredictive peripheral visual cue such as a flash of light. For many years, the covert orienting of attention in visual space was seen as a special case, because initial attempts to find similar spatial cueing effects in the auditory modality did not succeed (e.g., Posner 1978). Likewise, in several early cross-modal cueing studies, voluntary and involuntary shifts of attention in response to visual cues were found to have no effect on the detection of subsequent auditory targets (for review, see Spence and McDonald 2004). Consequently, during the 1970s and 1980s (and to a lesser extent 1990s), the prevailing view was that location-based attentional selection was a modality-specific and predominantly visual process. Early neurophysiological and neuropsychological studies painted a different picture about the modality specificity of spatial attention. On the neurophysiological front, Hillyard and colleagues (1984) showed that sustaining attention at a predesignated location to the left or right of fixation modulates the event-related potentials (ERPs) elicited by stimuli in both task-relevant and task- irrelevant 509
510
The Neural Bases of Multisensory Processes
modalities. Visual stimuli presented at the attended location elicited an enlarged negative ERP component over the anterior scalp 170 ms after stimulus onset, both when visual stimuli were relevant and when they were irrelevant. Similarly, auditory stimuli presented at the attended location elicited an enlarged negativity over the anterior scalp beginning 140 ms after stimulus onset, both when auditory stimuli were relevant and when they were irrelevant. Follow-up studies confirmed that spatial attention influences ERP components elicited by stimuli in an irrelevant modality when attention is sustained at a prespecified location over several minutes (Teder-Sälejärvi et al. 1999) or is cued on a trial-by-trial basis (Eimer and Schröger 1998). The results from these ERP studies indicate that spatial attention is not an entirely modality-specific process. On the neuropsychological front, Farah and colleagues (1989) showed that unilateral damage to the parietal lobe impairs reaction time (RT) performance in a spatial cueing task involving spatially nonpredictive auditory cues. Prior visual-cueing studies had shown that patients with damage to the right parietal lobe were substantially slower to detect visual targets appearing in the left visual field following a peripheral visual cue to the right visual field (invalid trials) than when attention was cued to the left (valid trials) or was cued to neither side (neutral trials) (Posner et al. 1982, 1984). This location-specific RT deficit was attributed to an impairment in the disengagement of attention, mainly because the patients appeared to have no difficulty in shifting attention to the contralesional field following a valid cue or neutral cue. In Farah et al.’s study, similar impairments in detecting contralesional visual targets were observed following either invalid auditory or visual cues presented to the ipsilesional side. On the basis of these results, Farah and colleagues concluded that sounds and lights automatically engage the same supramodal spatial attention mechanism. Given the neurophysiological and neuropsychological evidence in favor of a supramodal (or at least partially shared) spatial attention mechanism, why did several early behavioral studies appear to support the modality-specific view of spatial attention? These initial difficulties in showing spatial attention effects outside of the visual modality may be attributed largely to methodological factors, because some of the experimental designs that had been used successfully to study visual spatial attention were not ideal for studying auditory spatial attention. In particular, because sounds can be rapidly detected based on spectrotemporal features that are independent of a sound’s spatial location, simple detection measures that had shown spatial specificity in visual cueing tasks did not always work well for studying spatial attention within audition (e.g., Posner 1978). As researchers began to realize that auditory spatial attention effects might be contingent on the degree to which sound location is processed (Rhodes 1987), new spatial discrimination tasks were developed to ensure the use of spatial representations (McDonald and Ward 1999; Spence and Driver 1994). With these new tasks, researchers were able to document spatial cueing effects using all the various combinations of visual, auditory, and tactile cue and target stimuli. As reviewed elsewhere (e.g., Driver and Spence 2004), voluntary spatial cueing studies had begun to reveal a consistent picture by the mid 1990s: voluntarily orienting attention to a location facilitated the processing of subsequent targets regardless of the cue and target modalities. The picture that emerged from involuntary spatial cueing studies remained less clear because some of the spatial discrimination tasks that were developed failed to reveal cross-modal cueing effects (for detailed reviews of methodological issues, see Spence and McDonald 2004; Wright and Ward 2008). For example, using an elevation-discrimination task, Spence and Driver found an asymmetry in the involuntary spatial cueing effects between visual and auditory stimuli (Spence and Driver 1997). In their studies, spatially nonpredictive auditory cues facilitated responses to visual targets, but spatially nonpredictive visual cues failed to influence responses to auditory targets. For some time the absence of a visual–auditory cue effect weighed heavily on models of involuntary spatial attention. In particular, it was taken as evidence against a single supramodal attention system that mediated involuntary deployments of attention in multisensory space. However, researchers began to suspect that Spence and Driver’s (1997) missing audiovisual cue effect stemmed from the large spatial separation between cue and target, which existed even on validly (ipsilaterally) cued trials, and the different levels of precision with which auditory and visual stimuli can be localized.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception
511
Specifically, it was hypothesized that visual cues triggered shifts of attention that were focused too narrowly around the cued location to affect processing of a distant auditory target (Ward et al. 2000). Data from a recent study confirmed this narrow-focus explanation for the last remaining “missing link” in cross-modal spatial attention (Prime et al. 2008). Visual cues were found to facilitate responses to auditory targets that were presented at the cued location but not auditory targets that were presented 14° above or below the cued location (see also McDonald et al. 2001). The bulk of the evidence to date indicates that orienting attention involuntarily or voluntarily to a specific location in space can facilitate responding to subsequent targets, regardless of the modality of the cue and target stimuli. In principle, such cross-modal cue effects might reflect the consequences of a supramodal attention-control system that alters the perceptual representations of objects in different modalities (Farah et al. 1989). However, the majority of behavioral studies to date have examined the effects of spatial cues on RT performance, which is at best a very indirect measure of perceptual experience (Luce 1986; Watt 1991). Indeed, measures of response speed are inherently ambiguous in that RTs reflect the cumulative output of multiple stages of processing, including low-level sensory and intermediate perceptual stages, as well as later stages involved in making decisions and executing actions. In theory, spatial cueing could influence processing at any one of these stages. There is some evidence that the appearance of a spatial cue can alter an observer’s willingness to respond and reduce the uncertainty of his or her decisions without affecting perception (Shiu and Pashler 1994; Sperling and Dosher 1986). Other evidence suggests that whereas voluntary shifts of attention can affect perceptual processing, involuntary shifts of attention may not (Prinzmetal et al. 2005). In this chapter, we review studies that have extended the RT-based chronometric investigation of cross-modal spatial attention by utilizing psychophysical measures that better isolate perceptuallevel processes. In addition, neurophysiological and neuroimaging methods have been combined with these psychophysical approaches to identify changes in neural activity that might underlie the cross-modal consequences of spatial attention on perception. These methods have also examined neural activity within the cue–target interval that might reflect supramodal (or modality specific) control of spatial attention and subsequent anticipatory biasing of activity within sensory regions of the cortex.
26.2 INVOLUNTARY CROSS-MODAL SPATIAL ATTENTION ENHANCES PERCEPTUAL SENSITIVITY The issue of whether attention affects perceptual or post-perceptual processing of external stimuli has been vigorously debated since the earliest dichotic listening experiments revealed that selective listening influenced auditory performance (Broadbent 1958; Cherry 1953; Deutsch and Deutsch 1963; Treisman and Geffen 1967). In the context of visual–spatial cueing experiments, the debate has focused on two general classes of mechanisms by which attention might influence visual performance (see Carrasco 2006; Lu and Dosher 1998; Luck et al. 1994, 1996; Smith and Ratcliff 2009; Prinzmetal et al. 2005). On one hand, attention might lead to a higher signal-to-noise ratio for stimuli at attended locations by enhancing their perceptual representations. On the other hand, attention might reduce the decision-level or response-level uncertainty without affecting perceptual processing. For example, spatial cueing might bias decisions about which location contains relevant stimulus information (the presumed signal) in favor of the cued location, thereby promoting a strategy to exclude stimulus information arising from uncued locations (the presumed noise; e.g., Shaw 1982, 1984; Shiu and Pashler 1994; Sperling and Dosher 1986). Such noise-reduction explanations account for the usual cueing effects (e.g., RT costs and benefits) without making assumptions about limited perceptual capacity. Several methods have been developed to discourage decision-level mechanisms so that any observable cue effect can be ascribed more convincingly to attentional selection at perceptual stages
512
The Neural Bases of Multisensory Processes
of processing. One such method was used to investigate whether orienting attention involuntarily to a sudden sound influences perceptual-level processing of subsequent visual targets (McDonald et al. 2000). The design was adapted from earlier visual-cueing studies that eliminated location uncertainty by presenting a mask at a single location and requiring observers to indicate whether they saw a target at the masked location (Luck et al. 1994, 1996; see also Smith 2000). The mask serves a dual purpose in this paradigm: to ensure that the location of the target (if present) is known with complete certainty and to backwardly mask the target so as to limit the accrual and persistence of stimulus information at the relevant location. Under such conditions, it is possible to use methods of signal detection theory to obtain a measure of an observer’s perceptual sensitivity (d′)—the ability to discern a sensory event from background noise—that is independent of the observer’s decision strategy (which, in signal detection theory, is characterized by the response criterion, β; see Green and Swets 1966). Consistent with a perceptual-level explanation, McDonald and colleagues (2000) found that perceptual sensitivity was higher when the visual target appeared at the location of the auditory cue than when it appeared on the opposite side of fixation (Figure 26.1a and b). This effect was ascribed to an involuntary shift of attention to the cued location because the sound provided no information about the location of the impending target. Also, because there was no uncertainty about the target location, the effect could not be attributed to a reduction in location uncertainty. Consequently, the results provided strong evidence that shifting attention involuntarily to the location of a sound actually improves the perceptual quality of a subsequent visual event appearing at that location (see also Dufour 1999). An analogous effect on perceptual sensitivity has been reported in the converse audiovisual combination, when spatially nonpredictive visual cues were used to orient attention involuntarily before the onset of an 800-Hz target embedded in a white-noise mask (Soto-Faraco et al. 2002). Together, these results support the view that sounds and lights engage a common supramodal spatial attention system, which then modulates perceptual processing of relevant stimuli at the cued location (Farah et al. 1989). To investigate the neural processes by which orienting spatial attention to a sudden sound influences processing of a subsequent visual stimulus, McDonald and colleagues (2003) recorded ERPs in the signal-detection paradigm outlined above. ERPs to visual stimuli appearing at validly and invalidly cued locations began to diverge from one another at about 100 ms after stimulus onset, with the earliest phase of this difference being distributed over the midline central scalp (Figure 26.1c and d). After about 30–40 ms, this ERP difference between validly and invalidly cued visual stimuli shifted to midline parietal and lateral occipital scalp regions. A dipole source analysis indicated that the initial phase of this difference was generated in or near the multisensory region of the superior temporal sulcus (STS), whereas the later phase was generated in or near the fusiform gyrus of the occipital lobe (Figure 26.1e). This pattern of results suggests that enhanced visual perception produced by the cross-modal orienting of spatial attention may depend on feedback connections from the multisensory STS to the ventral stream of visual cortical areas. Similar cross-modal cue effects were observed when participants made speeded responses to the visual targets, but the earliest effect was delayed by 100 ms (McDonald and Ward 2000). This is in line with behavioral data suggesting that attentional selection might take place earlier when target detection accuracy (or fine perceptual discrimination; see subsequent sections) is emphasized than when speed of responding is emphasized (Prinzmetal et al. 2005).
26.3 INVOLUNTARY CROSS-MODAL SPATIAL ATTENTION MODULATES TIME-ORDER PERCEPTION The findings reviewed in the previous section provide compelling evidence that cross-modal attention influences the perceptual quality of visual stimuli. In the context of a spatial cueing experiment, perceptual enhancement at an early stage of processing could facilitate decision and response
513
Cross-Modal Spatial Cueing of Attention Influences Visual Perception (a)
validly cued invalidly cued
(b) Detectibility (d′)
2.00 1.75 1.50 1.25 1.00 0
Left
Right
Target location
(c)
(d)
PO7
PO8
validly cued invalidly cued
(e)
x=43
–1µV
–1.73
200 ms x=33
120–140 ms
µV
150–170 ms
0
–2.2
µV
0
x=35
STS dipoles FG dipoles PostC dipoles
FIGURE 26.1 Results from McDonald et al.’s (2000, 2003) signal detection experiments. (a) Schematic illustration of stimulus events on a valid-cue trial. Small light displays were fixed to bottoms of two loudspeaker cones, one situated to the left and right of a central fixation point. Each trial began with a spatially nonpredictive auditory cue from the left or right speaker (first panel), followed by a faint visual target on some trials (second panel) and a salient visual mask (third panel). Participants were required to indicate whether they saw the visual target. (b) Perceptual sensitivity data averaged across participants. (c) Grand-average event-related potentials (ERPs) to left visual field stimuli following valid and invalid auditory cues. The ERPs were recorded from lateral occipital electrodes PO7 and PO8. Negative voltages are plotted upward, by convention. Shaded box highlights interval of P1 and N1 components, in which cue effects emerged. (d) Scalp topographies of enhanced negative voltages to validly cued visual targets. (e) Projections of best-fitting dipolar sources onto sections of an individual participant’s MRI. Dipoles were located in superior temporal lobe (STS), fusiform gyrus (FG), and perisylvian cortex near post-central gyrus (PostC). PostC dipoles accounted for relatively late (200–300 ms) activity over more anterior scalp regions.
processing at later stages, thereby leading to faster responses for validly cued objects than for invalidly cued objects. Theoretically, however, changes in the timing of perceptual processing could also contribute to the cue effects on RT performance: an observer might become consciously aware of a target earlier in time when it appears at a cued location than when it appears at an uncued location. In fact, the idea that attention influences the timing of our perceptions is an old and controversial one. More than 100 years ago, Titchener (1908) asserted that when confronted with multiple objects, an observer becomes consciously aware of an attended object before other unattended objects. Titchener called the hypothesized temporal advantage for attended objects the law of prior entry.
514
The Neural Bases of Multisensory Processes
Observations from laboratory experiments in the nineteenth and early twentieth centuries were interpreted along the lines of attention-induced prior entry. In one classical paradigm known as the complication experiment, observers were required to indicate the position of a moving pointer at the moment a sound was presented (e.g., Stevens 1904; Wundt 1874; for a review, see Boring 1929). When listening in anticipation for the auditory stimulus, observers typically indicated that the sound appeared when the pointer was at an earlier point along its trajectory than was actually the case. For example, observers might report that a sound appeared when a pointer was at position 4 even though the sound actually appeared when the pointer was at position 5. Early on, it was believed that paying attention to the auditory modality facilitated sound perception and led to a relative delay of visual perception, so that the pointer’s perceived position lagged behind its actual position. However, this explanation fell out of favor when later results indicated that a specific judgment strategy, rather than attention-induced prior entry, might be responsible for the mislocalization error (e.g., Cairney 1975). In more recent years, attention-induced prior entry has been tested experimentally in visual temporal-order judgment (TOJ) tasks that require observers to indicate which of two rapidly presented visual stimuli appeared first. When the attended and unattended stimuli appear simultaneously, observers typically report that the attended stimulus appeared to onset before the unattended stim ulus (Stelmach and Herdman 1991; Shore et al. 2001). Moreover, in line with the supramodal view of spatial attention, such changes in temporal perception have been found when shifts in spatial attention were triggered by spatially nonpredictive auditory and tactile cues as well as visual cues (Shimojo et al. 1997). Despite the intriguing behavioral results from TOJ experiments, the controversy over attentioninduced prior entry has continued. The main problem harks back to the debate over the complication experiments: an observer’s judgment strategy might contribute to the tendency to report the cued target as appearing first (Pashler 1998; Schneider and Bavelier 2003; Shore et al. 2001). Thus, in a standard TOJ task, observers might perceive two targets to appear simultaneously but still report seeing the target on the cued side first because of a decision rule that favors the cued target (e.g., when in doubt, select the cued target). Simple response biases (e.g., stimulus–response compatibility effects) can be avoided quite easily by altering the task (McDonald et al. 2005; Shore et al. 2001), but it is difficult to completely avoid the potential for response bias. As noted previously, ERP recordings can be used to distinguish between changes in high-level decision and response processes and changes in perceptual processing that could underlie entry to conscious awareness. An immediate challenge to this line of research is to specify the ways in which the perceived timing of external events might be associated with activity in the brain. Philosopher Daniel Dennett expounded two alternatives (Dennett 1991). On one hand, the perceived timing of external events may be derived from the timing of neural activities in relevant brain circuits. For example, the perceived temporal order of external events might be based on the timing of early cortical evoked potentials. On the other hand, the brain might not represent the timing of perceptual events with time itself. In Dennett’s terminology, the represented time (e.g., A before B) is not necessarily related to the time of the representing (e.g., representing of A does not necessarily precede representing of B). Consequently, the perceived temporal order of external events might be based on nontemporal aspects of neural activities in relevant brain circuits. McDonald et al. (2005) investigated the effect of cross-modal spatial attention on visual timeorder perception using ERPs to track the timing of cortical activity in a TOJ experiment. A spatially nonpredictive auditory cue was presented to the left or right side of fixation just before the occurrence of a pair of simultaneous or nearly simultaneous visual targets (Figure 26.2a). One of the visual targets was presented at the cued location, whereas the other was presented at the homologous location in the opposite visual hemifield. Consistent with previous behavioral studies, the auditory spatial cue had a considerable effect on visual TOJs (Figure 26.2b). Participants judged the cued target as appearing first on 79% of all simultaneous-target trials. To nullify this cross-modal cueing effect, the uncued target had to be presented nearly 70 ms before the cued target.
515
Cross-Modal Spatial Cueing of Attention Influences Visual Perception (b) Cued side reported first (%)
(a)
Cue
T1
100 Actual simultaneity 75 50
0
T2
Point of subjective simultaneity
25
uncued target first –70
–35
cued target first 0
35
70
Cued side on set advantage (ms)
(c)
Ipsilateral to cued side Contralateral to cued side
(d)
P1
–2 µV
1.55
N1
0
100
(e)
90-120 ms
0
p < .05 200
Time post stimulus (ms)
300
ipsilateral
contralateral
μV
STS dipole
FG dipole
FIGURE 26.2 Results from McDonald et al.’s (2005) temporal-order-judgment experiment. (a) Schematic illustration of events on a simultaneous-target trial (top) and nonsimultaneous target trials (bottom). Participants indicated whether a red or a green target appeared first. SOA between cue and first target event was 100– 300 ms, and SOA between nonsimultaneous targets was 35 or 70 ms. T1 and T2 denote times at which visual targets could occur. (b) Mean percentage of trials on which participants reported seeing the target on cued side first, as a function of cued-side onset advantage (CSOA; i.e., lead time). Negative CSOAs indicate that uncued-side target was presented first; positive CSOAs indicate that cued-side target was presented first. (c) Grand-average ERPs to simultaneous visual targets, averaged over 79% of trials on which participants indicated that cued-side target appeared first. ERPs were recorded at contralateral and ipsilateral occipital electrodes (PO7/PO8). Statistically significant differences between contralateral and ipsilateral waveforms are denoted in gray on time axis. (d) Scalp topographies of ERP waveforms in time range of P1 (90–120 ms). Left and right sides of the map show electrodes ipsilateral and contralateral electrodes, respectively. (e) Projections of best-fitting dipolar sources onto sections of an average MRI. Dipoles were located in superior temporal lobe (STS) and fusiform gyrus (FG). FG dipoles accounted for cue-induced P1 amplitude modulation, whereas STS dipoles accounted for a long-latency (200–250 ms) negative deflection.
To elucidate the neural basis of this prior-entry effect, McDonald and colleagues (2005) examined the ERPs elicited by simultaneously presented visual targets following the auditory cue. The analytical approach taken was premised on the lateralized organization of the visual system and the pattern of ERP effects that have been observed under conditions of bilateral visual stimulation. Several previous studies on visual attention showed that directing attention to one side of a bilateral visual display results in a lateralized asymmetry of the early ERP components measured over the occipital scalp, with an increased positivity at electrode sites contralateral to the attended location beginning in the time range of the occipital P1 component (80–140 ms; Heinze et al. 1990, 1994; Luck et al. 1990; see also Fukuda and Vogel 2009). McDonald et al. (2005) hypothesized that if attention speeds neural transmission at early stages of the visual system, the early ERP components elicited by simultaneous visual targets would show an analogous lateral asymmetry in time, such that the P1 measured contralateral to the attended (cued) visual target would occur earlier than the P1 measured contralateral to the unattended (uncued) visual target. Such a finding would
516
The Neural Bases of Multisensory Processes
be consistent with Stelmach and Herdman’s (1991) explanation of attention-induced prior entry as well as with the view that the time course of perceptual experience is tied to the timing of the early evoked activity in the visual cortex (Dennett 1991). Such a latency shift was not observed, however, even though the auditory cue had a considerable effect on the judgments of temporal order of the visual targets. Instead, cross-modal cueing led to an amplitude increase (with no change in latency) of the ERP positivity in the ventral visual cortex contralateral to the side of the auditory cue, starting in the latency range of the P1 component (90–120 ms) (Figure 26.2c–e). This finding suggests that the effect of spatial attention on the perception of temporal order occurs because an increase in the gain of the cued sensory input causes a perceptual threshold to be reached at an earlier time, not because the attended input was transmitted more rapidly than the unattended input at the earliest stages of processing. The pattern of ERP results obtained by McDonald and colleagues is likely an important clue for the understanding the neural basis of visual prior entry due to involuntary deployments of spatial attention to sudden sounds. Although changes in ERP amplitude appear to underlie visual perceptual prior entry when attention is captured by lateralized auditory cues, changes in ERP timing might contribute to perceptual prior entry in other situations. This issue was addressed in a recent study of multisensory prior entry, in which participants voluntarily attended to either visual or tactile stimuli and judged whether the stimulus on the left or right appeared first, regardless of stimulus modality (Vibell et al. 2007). The ERP analysis centered on putatively visual ERP peaks over the posterior scalp (although ERPs to the tactile stimuli were not subtracted out and thus may have contaminated the ERP waveforms; cf. Talsma and Woldorff 2005). Interestingly, the P1 peaked at an average of 4 ms earlier when participants were attending to the visual modality than when they were attending the tactile modality, suggesting that modality-based attentional selection may have a small effect on the timing of early, evoked activity in the visual system. These latency results are not entirely clear, however, because the small-but-significant attention effect may have been caused by a single participant with an implausibly large latency difference (17 ms) and may have been influenced by overlap with the tactile ERP. Unfortunately, the authors did not report whether attention had a similar effect on the latency of the tactile ERPs, which may have helped to corroborate the small attention effect on P1 latency. Notwithstanding these potential problems in the ERP analysis, it is tempting to speculate that voluntary modality-based attentional selection influences the timing of early visual activity, whereas involuntary location-based attentional selection influences the gain of early visual activity. The question would still remain, however, how very small changes in ERP latency (4 ms or less) could underlie much larger perceptual effects of tens of milliseconds.
26.4 BEYOND TEMPORAL ORDER: THE SIMULTANEITY JUDGMENT TASK Recently, Santangelo and Spence (2008) offered an alternative explanation for the finding of McDonald and colleagues (2005) that nonpredictive auditory spatial cues affect visual time order perception. Specifically, the authors suggested that the behavioral results in McDonald et al.’s TOJ task were not due to changes in perception but rather to decision-level factors. They acknowledged that simple response biases (e.g, a left cue primes a “left” response) would not have contributed to the behavioral results because participants indicated the color, not the location, of the target that appeared first. However, Santangelo and Spence raised the concern that some form of “secondary” response bias might have contributed to the TOJ effects (Schneider and Bavelier 2003; Stelmach and Herdman 1991).* For example, participants might have decided to select the stimulus at the cued location when uncertain as to which stimulus appeared first. In an attempt to circumvent such secondary response biases, Santangelo and Spence used a simultaneity judgment (SJ) task, in which participants had to judge whether two stimuli were presented simultaneously or successively (Carver and Brown 1997; Santangelo and Spence 2008; Schneider and Bavelier 2003). They reported that * This argument would also apply to the findings of Vibell et al.’s (2007) cross-modal TOJ study.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception
517
the uncued target had to appear 15–17 ms before the cued target in order for participants to have the subjective impression that the two stimuli appeared simultaneously. This difference is referred to as a shift in the point of subjective simultaneity (PSS), and it is typically attributed to the covert orienting of attention (but see Schneider and Bavelier 2003, for an alternative sensory-based account). The estimated shift in PSS was much smaller than the one reported in McDonald et al.’s earlier TOJ task (17.4 vs. 68.5 ms), but the conclusions derived from the two findings were the same: Involuntary capture of spatial attention by a sudden sound influences the perceived timing of visual events. Santangelo and Spence went on to argue that the shift in PSS reported by McDonald et al. might have been due to secondary response biases and, as a result, the shift in PSS observed in their study provided “the first unequivocal empirical evidence in support of the effect of cross-modal attentional capture on the latencies of perceptual processing” (p. 163). Although the SJ task has its virtues, there are two main arguments against Santangelo and Spence’s conclusions. First, the authors did not take into account the neurophysiological findings of McDonald and colleagues’ ERP study. Most importantly, the effect of auditory spatial cuing on early ERP activity arising from sensory-specific regions of the ventral visual cortex cannot be explained in terms of response bias. Thus, although it may be difficult to rule out all higher-order response biases in a TOJ task, the ERP findings provide compelling evidence that cross-modal spatial attention modulates early visual-sensory processing. Moreover, although the SJ task may be less susceptible to some decision-level factors, it may be impossible to rule out all decision-level factors entirely as contributors to the PSS effect.* Thus, it is not inconceivable that Santangelo and Spence’s behavioral findings may have reflected post-perceptual rather than perceptual effects. Second, it should be noted that Santangelo and Spence’s results provided little, if any, empirical support for the conclusion that cross-modal spatial attention influences the timing of visual perceptual processing. The problem is that their estimated PSS did not accurately represent their empirical data. Their PSS measure was derived from the proportion of “simultaneous” responses, which varied as a function of the stimulus onset asynchrony (SOA) between the target on the cued side and the target on the uncued side. As shown in their Figure 2a, the proportion of “simultaneous” responses peaked when the cued and uncued targets appeared simultaneously (0 ms SOA) and decreased as the SOA between targets increased. The distribution of responses was fit to a Gaussian function using maximum likelihood estimation, and the mean of the fitted Gaussian function—not the observed data—was used as an estimate of the PSS. Critically, this procedure led to a mismatch between the mean of the fitted curve (or more aptly, the mean of the individual-subject fitted curves) and the mean of the observed data. Specifically, whereas the mean of the fitted curves fell slightly to the left of the 0-ms SOA (uncued target presented first), the mean of the observed data actually fell slightly to the right of the 0-ms SOA (cued target presented first) because of a positive skew of the distribution.† Does auditory cueing influence the subjective impression of simultaneity in the context of a SJ task? Unfortunately, the results from Santangelo and Spence’s study provide no clear answer to this question. The reported leftward shift in PSS suggests that the auditory cue had a small facilitatory effect on the perceived timing of the ipsilateral target. However, the rightward skew of the observed * Whereas Santangelo and Spence (2008) made the strong claim that performance in SJ tasks should be completely independent of all response biases, Schneider and Bavelier (2003) argued only that performance in SJ tasks should be less susceptible to such decision-level effects than performance in TOJ tasks. † The mismatch between the estimated PSS and the mean of the observed data in Santangelo and Spence’s (2008) SJ task might have been due to violations in the assumptions of the fitting procedure. Specifically, the maximum likelihood procedure assumes that data are distributed normally, whereas the observed data were clearly skewed. Santangelo and Spence did perform one goodness-of-fit test to help determine whether the data differed significantly from the fitted Gaussians, but this test was insufficient to pick up the positive skew (note that other researchers have employed multiple goodness-of-fit tests before computing PSS; e.g., Stone et al. 2001). Alternatively, the mismatch between the estimated PSS and the mean of the observed data might have arisen because data from the simultaneous-target trials were actually discarded prior to the curve-fitting procedure. This arbitrary step shifted the mode of the distribution 13 ms to the left (uncued target was presented 13 ms before cued target), which happened to be very close to the reported shift in PSS.
518
The Neural Bases of Multisensory Processes
distribution (and consequential rightward shift in the mean) suggests that the auditory cue may actually have delayed perception of the ipsilateral target. Finally, the mode of the observed distribution suggests that the auditory cue had no effect on subjective reports of simultaneity. These inconclusive results suggest that the SJ task may lack adequate sensitivity to detect shifts in perceived time order induced by cross-modal cueing.
26.5 INVOLUNTARY CROSS-MODAL SPATIAL ATTENTION ALTERS APPEARANCE The findings of the signal-detection and TOJ studies outlined in previous sections support the view that involuntary cross-modal spatial attention alters the perception of subsequent visual stimuli as well as the gain of neural responses in extrastriate visual cortex 100–150 ms after stimulus onset. These results largely mirrored the effects of visual spatial cues on visual perceptual sensitivity (e.g., Luck et al. 1994; Smith 2000) and temporal perception (e.g. Stelmach and Herdman 1991; Shore et al. 2001). However, none of these studies directly addressed the question of whether attention alters the subjective appearance of objects that reach our senses. Does attention make white objects appear whiter and dark objects appear darker? Does it make the ticking of a clock sound louder? Psychologists have pondered questions like these for well over a century (e.g., Fechner 1882; Helmholtz 1866; James 1890). Recently, Carrasco and colleagues (1994) introduced a psychophysical paradigm to address the question, “does attention alter appearance.” The paradigm is similar to the TOJ paradigm except that, rather than varying the SOA between two visual targets and asking participants to judge which one was first (or last), the relative physical contrast of two targets is varied and participants are asked to judge which one is higher (or lower) in perceived contrast. In the original variant of the task, a small black dot was used to summon attention to the left or right just before the appearance of two Gabor patches at both left and right locations. When the physical contrasts of the two targets were similar or identical, observers tended to report the orientation of the target on the cued side. Based on these results, Carrasco and colleagues (2004) concluded that attention alters the subjective impression of contrast. In subsequent studies, visual cueing was found to alter the subjective impressions of several other stimulus features, including color saturation, spatial frequency, and motion coherence (for a review, see Carrasco 2006). Carrasco and colleagues performed several control experiments to help rule out alternative explanations for their psychophysical findings (Prinzmetal et al. 2008; Schneider and Komlos 2008). The results of these controls argued against low-level sensory factors (Ling and Carrasco 2007) as well as higher-level decision or response biases (Carrasco et al. 2004; Fuller et al. 2008). However, as we have discussed in previous sections, it is difficult to rule out all alternative explanations on the basis of the behavioral data alone. Moreover, results from different paradigms have led to different conclusions about whether attention alters appearance: whereas the results from Carrasco’s paradigm have indicated that attention does alter appearance, the results from an equality-judgment paradigm introduced by Schneider and Komlos (2008) have suggested that attention may alter decision processes rather than contrast appearance. Störmer et al. (2009) recently investigated whether cross-modal spatial attention alters visual appearance. The visual cue was replaced by a spatially nonpredictive auditory cue delivered in stereo so that it appeared to emanate from a peripheral location of a visual display (25° from fixation). After a 150-ms SOA, two Gabors were presented, one at the cued location and one on the opposite side of fixation (Figure 26.3a). The use of an auditory cue eliminated some potential sensory interactions between visual cue and target that might boost the contrast of the cued target even in the absence of attention (e.g., the contrast of a visual cue could add to the contrast of the cued-location target, thereby making it higher in contrast than the uncued-location target). As in Carrasco et al.’s (2004) high-contrast experiment, the contrast of one (standard) Gabor was set at 22%, whereas the
519
Cross-Modal Spatial Cueing of Attention Influences Visual Perception (b)
p < .05 –100
0
100
Time (ms)
200
300
N1
–1 µV
Probability of choosing test patch as higher in contrast
(a)
1.0
Test patch cued
0.8 0.6 0.4 0.2
Standard patch cued
0.0 6
13
22
37
78
Contrast level of test patch (log)
(c)
(d)
120–140 ms 1.8
0
Ipsilateral
P1
μV
Contralateral
120–140 ms
(e) Mean amplitude difference contralateral minus ipsilateral
Ipsilateral to cued side Contralateral to cued side
120–140 ms
1.5 1.0 0.5 0.0
PO3/PO4
–0.5 –1.0 –0.1
PO7/PO8 0.0
0.1
0.2
0.3
Difference of the probability of choosing cued target as higher contrast minus the probability of choosing uncued target as higher in contrast
FIGURE 26.3 Results from Störmer et al.’s (2009) contrast-appearance experiment. (a) Stimulus sequence and grand-average ERPs to equal-contrast Gabor, recorded at occipital electrodes (PO7/PO8) contralateral and ipsilateral to cued side. On a short-SOA trial (depicted), a peripheral auditory cue was presented 150 ms before a bilateral pair of Gabors that varied in contrast (see text for details). Isolated target ERPs revealed an enlarged positivity contralateral to cued target. Statistically significant differences between contralateral and ipsilateral waveforms are denoted in gray on time axis. (b) Mean probability of reporting contrast of test patch to be higher than that of standard patch, as a function of test-patch contrast. Probabilities for cued-test and cued-standard trials are shown separately. (c) Scalp topographies of equal-contrast-Gabor ERPs in time interval of P1 (120–140 ms). Left and right sides of the map show electrodes ipsilateral and contralateral electrodes, respectively. (d) Localization of distributed cortical current sources underlying contralateral-minus-ipsilateral ERP positivity in 120–140 ms interval, projected onto cortical surface. View of the ventral surface, with occipital lobes at the top. Source activity was estimated using LAURA algorithm and is shown in contralateral hemisphere (right side of brain) only. (e) Correlations between individual participants’ tendencies to report the cued-side target to be higher in contrast and magnitude of enlarged ERP positivities recorded at occipital and parieto-occipital electrodes (PO7/PO8, PO3/PO4) in 120–140 ms interval.
contrast of the other (test) Gabor varied between 6% and 79%. ERPs were recorded on the trials (1/3 of the total) where the two Gabors were equal in contrast. Participants were required to indicate whether the higher-contrast Gabor patch was oriented horizontally or vertically. The psychophysical findings in this auditory cueing paradigm were consistent with those reported by Carrasco and colleagues (2004). When the test and standard Gabors had the same physical contrast, observers reported the orientation of the cued-location Gabor significantly more often than the uncued-location Gabor (55% vs. 45%) (Figure 26.3b). The point of subjective equality (PSE)—the
520
The Neural Bases of Multisensory Processes
test contrast at which observers judged the test patch to be higher in contrast on half of the trials— averaged 20% when the test patch was cued and 25% when the standard patch was cued (in comparison with the 22% standard contrast; Figure 26.3a). These results indicate that spatially nonpredictive auditory cues as well as visual cues can influence subjective (visual) contrast judgments. To investigate whether the auditory cue altered visual appearance as opposed to a decision or response processes, Störmer and colleagues (2009) examined the ERPs elicited by the equal-contrast Gabors as a function of cue location. The authors reasoned that changes in subjective appearance would likely be linked to modulations of early ERP activity in visual cortex associated with perceptual processing rather than decision- or response-level processing (see also Schneider and Komlos 2008). Moreover, any such effect on early ERP activity should correlate with the observers’ tendencies to report the cued target as being higher in contrast. This is exactly what was found. Starting at approximately 90 ms after presentation of the equal-contrast targets, the waveform recorded contralaterally to the cued side became more positive than the waveform recorded ipsilaterally to the cued side (Figure 26.3a). This contralateral positivity was observed on those trials when observers judged the cued-location target to be higher in contrast but not when observers judged the uncuedlocation target to be higher in contrast. The tendency to report the cued-location target as being higher in contrast correlated with the contralateral ERP positivity, most strongly in the time interval of the P1 component (120–140 ms), which is generated at early stages of visual cortical processing. Topographical mapping and distributed source modeling indicated that the increased contralateral positivity in the P1 interval reflected modulations of neural activity in or near the fusiform gyrus of the occipital lobe (Figure 26.3c and d). These ERP findings converge with the behavioral evidence that cross-modal spatial attention affects visual appearance through modulations at an early sensory level rather than by affecting a late decision process.
26.6 POSSIBLE MECHANISMS OF CROSS-MODAL CUE EFFECTS The previous sections have focused on the perceptual consequences of cross-modal spatial cueing. To sum up, salient-but-irrelevant sounds were found to enhance visual perceptual sensitivity, accelerate the timing of visual perceptions, and alter the appearance of visual stimuli. Each of these perceptual effects was accompanied by modulation of the early cortical response elicited by the visual stimulus within ventral-stream regions. Such findings are consistent with the hypothesis that auditory and visual stimuli engage a common neural network involved in the control and covert deployment of attention in space (Farah et al. 1989). Converging lines of evidence have pointed to the involvement of several key brain structures in the control and deployment of spatial attention in visual tasks. These brain regions include the superior colliculus, pulvinar nucleus of the thalamus, intraparietal sulcus, and dorsal premotor cortex (for additional details, see Corbetta and Shulman 2002; LaBerge 1995; Posner and Raichle 1994). Importantly, multisensory neurons have been found in each of these areas, which suggests that the neural network responsible for the covert deployment of attention in visual space may well control the deployment of attention in multisensory space (see Macaluso, this volume; Ward et al. 1998; Wright and Ward 2008). At present, however, there is no consensus as to whether a supramodal attention system is responsible for the cross-modal spatial cue effects outlined in the previous sections. Two different controversies have emerged. The first concerns whether a single, supramodal system controls the deployment of attention in multisensory space or whether separate, modality-specific systems direct attention to stimuli of their respective modalities. The latter view can account for cross-modal cueing effects by assuming that the activation of one system triggers coactivation of others. According to this separate-but-linked proposal, a shift of attention to an auditory location would lead to a separate shift of attention to the corresponding location of the visual field. Both the supramodal and separate-but-linked hypotheses can account for cross-modal cueing effects, making it difficult to distinguish between the two views in the absence of more direct measures of the neural activity that underlies attention control.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception
521
The second major controversy over the possible mechanisms of cross-modal cue effects is specific to studies utilizing salient-but-irrelevant stimuli to capture attention involuntarily. In these studies, the behavioral and neurophysiological effects of cueing are typically maximal when the cue appears 100–300 ms before the target. Although it is customary to attribute these facilitatory effects to the covert orienting of attention, they might alternatively result from sensory interactions between cue and target (Tassinari et al. 1994). The cross-modal-cueing paradigm eliminates unimodal sensory interactions, such as those taking place at the level of the retina, but the possibility of cross-modal sensory interaction remains because of the existence of multisensory neurons at many levels of the sensory pathways that respond to stimuli in different modalities (Driver and Noesselt 2008; Foxe and Schroeder 2005; Meredith and Stein 1996; Schroeder and Foxe 2005). In fact, the majority of multisensory neurons do not simply respond to stimuli in different modalities, but rather appear to integrate the input signals from different modalities so that their responses to multimodal stimulation differ quantitatively from the simple summation of their unimodal responses (for reviews, see Stein and Meredith 1993; Stein et al. 2009; other chapters in this volume). Such multisensory interactions are typically largest when stimuli from different modalities occur at about the same time, but they are possible over a period of several hundreds of milliseconds (Meredith et al. 1987). In light of these considerations, the cross-modal cueing effects described in previous sections could in principle have been due to the involuntary covert orienting of spatial attention or to the integration of cue and target into a single multisensory event (McDonald et al. 2001; Spence and McDonald 2004; Spence et al. 2004). Although it is often difficult to determine which of these mechanisms are responsible for crossmodal cueing effects, several factors can help to tip the scales in favor of one explanation or the other. One factor is the temporal relationship between the cue and target stimuli. A simple rule of thumb is that increasing the temporal overlap between the cue and target will make multisensory integration more likely and pre-target attentional biasing less likely (McDonald et al. 2001). Thus, it is relatively straightforward to attribute cross-modal cue effects to multisensory integration when cue and target are presented concurrently or to spatial attention when cue and target are separated by a long temporal gap. The likely cause of cross-modal cueing effects is not so clear, however, when there is a short gap between cue and target that is within the temporal window where integration is possible. In such situations, other considerations may help to disambiguate the causes of the cross-modal cueing effects. For example, multisensory integration is largely an automatic and invariant process, whereas stimulus-driven attention effects are dependent on an observer’s goals and intentions (i.e., attentional set; e.g., Folk et al. 1992). Thus, if cross-modal spatial cue effects were found to be contingent upon an observer’s current attentional set, they would be more likely to have been caused by pre-target attentional biasing. To our knowledge, there has been little discussion of the dependency of involuntary cross-modal spatial cueing effects on attentional set and other task-related factors (e.g., Ward et al. 2000). A second consideration that could help distinguish between alternative mechanisms of crossmodal cueing effects concerns the temporal sequence of control operations (Spence et al. 2004). According to the most prominent multisensory integration account, signals arising from stimuli in different modalities converge onto multimodal brain regions and are integrated therein. The resulting integrated signal is then fed back to the unimodal brain regions to influence processing of subsequent stimuli in modality-specific regions of cortex (Calvert et al. 2000; Macaluso et al. 2000). Critically, such an influence on modality-specific processing would occur only after feedforward convergence and integration of the unimodal signals takes place (Figure 26.4a). This contrasts with the supramodal-attention account, according to which the cue’s influence on modality-specific processing may be initiated before the target in another modality has been presented (i.e., before integration is possible). In the context of a peripheral cueing task, a cue in one modality (e.g., audi tion) would initiate a sequence of attentional control operations (such as disengage, move, reengage; see Posner and Raichle 1994) that would lead to anticipatory biasing of activity in another modality (e.g., vision) before the appearance of the target (Figure 26.4b). In other words, whereas
522
The Neural Bases of Multisensory Processes (a) Integration
(b) Attention
AV
AV
cue
Auditory
target
Visual
cue
Visual
Auditory
attentional spotlight
AV
Visual
time
Auditory
AV
Visual
Auditory
AV
Auditory
Visual
AV
Visual
Auditory
AV
Auditory
target
Visual
FIGURE 26.4 Hypothetical neural mechanisms for involuntary cross-modal spatial cueing effects. (a) Integration-based account. Nearly simultaneous auditory and visual stimuli first activate unimodal auditory and visual cortical regions and then converge upon a multisensory region (AV). Audiovisual interaction within multisensory region feeds back to boost activity in visual cortex. (b) Attention-based account. An auditory cue elicits a shift of spatial attention in a multisensory representation, which leads to pre-target biasing of activity in visual cortex and ultimately boosts target-related activity in visual cortex.
multisensory integration occurs only after stimulation in two (or more) modalities, the consequences of spatial attention are theoretically observable after stimulation in the cue modality alone. Thus, a careful examination of neural activity in the cue–target interval would help to ascertain whether pre-target attentional control is responsible for the cross-modal cueing effects on perception. This is a challenging task in the case of involuntary cross-modal cue effects, because the time interval between the cue and target is typically very short. In the future, however, researchers might successfully adapt the electrophysiological methods used to track the voluntary control of spatial attention (e.g., Doesburg et al. 2009; Eimer et al. 2002; Green and McDonald 2008; McDonald and Green 2008; Worden et al. 2000) to look for signs of attentional control in involuntary cross-modal cueing paradigms such as the ones described in this chapter.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception
523
26.7 CONCLUSIONS AND FUTURE DIRECTIONS To date, most of the research on spatial attention has considered how attending to a particular region of space influences the processing of objects within isolated sensory modalities. However, a growing number of studies have demonstrated that orienting attention to the location of a stimulus in one modality can influence the perception of subsequent stimuli in different modalities. As outlined here, recent cross-modal spatial cueing studies have shown that the occurrence of a nonpredictive auditory cue affects the way we see subsequent visual objects in several ways: (1) by improving the perceptual sensitivity for detection of masked visual stimuli appearing at the cued location, (2) by producing earlier perceptual awareness of visual stimuli appearing at the cued location, and (3) by altering the subjective appearance of visual stimuli appearing at the cued location. Each of these cross-modally induced changes in perceptual experience is accompanied by short-latency changes in the neural processing of targets within occipitotemporal cortex in the vicinity of the fusiform gyrus, which is generally considered to represent modality-specific cortex belonging to the ventral stream of visual processing. There is still much to be learned about these cross-modally induced changes in perception. One outstanding question is why spatial cueing appears to alter visual perception in tasks that focus on differences in temporal order or contrast (Carrasco et al. 2004; McDonald et al. 2005; Störmer et al. 2009) but not in tasks that focus on similarities (i.e., “same or not” judgments; Santangelo and Spence 2008; Schneider and Komlos 2008). Future studies could address this question by recording physiological measures (such as ERPs) in the two types of tasks. If an ERP component previously shown to correlate with perception were found to be elicited equally well under the two types of task instructions, it might be concluded that the same-or-not judgment lacks sensitivity to reveal perceptual effects. Another outstanding question is whether the cross-modal cueing effects reviewed in this chapter are caused by the covert orienting of attention or by passive intersensory interactions. Some insight may come from recent ERP studies of the “double flash” illusion produced by the interaction of a single flash with two pulsed sounds (Mishra et al. 2007, 2010). In these studies, an enhanced early ventral stream response at 100–130 ms was observed in association with the perceived extra flash. Importantly, this neural correlate of the illusory flash was sensitive to manipulations of spatial selective attention, suggesting that the illusion is not the result of automatic multisensory integration. Along these lines, it is tempting to conclude that the highly similar enhancement of early ventral-stream activity found in audiovisual cueing studies (McDonald et al. 2005; Störmer et al. 2009) also results from the covert deployment of attention rather than the automatic integration of cue and target stimuli. Future studies could address this issue by looking for electrophysiological signs of attentional control and anticipatory modulation of visual cortical activity before the onset of the target stimulus. A further challenge for future research will be to extend these studies to different combinations of sensory modalities to determine whether cross-modal cueing of spatial attention has analogous effects on the perception of auditory and somatosensory stimuli. Such findings would be consistent with the hypothesis that stimuli from the various spatial senses can all engage the same neural system that mediates the covert deployment of attention in multisensory space (Farah et al. 1989).
REFERENCES Boring, E. G. 1929. A history of experimental psychology. New York: Appleton-Century. Broadbent, D. E. 1958. Perception and communication. London: Pergamon Press. Cairney, P. T. 1975. The complication experiment uncomplicated. Perception 4: 255–265. Calvert, G. A., R. Campbell, and M. J. Brammer. 2000. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology 10: 649–657.
524
The Neural Bases of Multisensory Processes
Carrasco, M. 2006. Covert attention increases contrast sensitivity: Psychophysical, neurophysiological, and neuroimaging studies. In Progress in Brain Research, Volume 154, Part 1: Visual Perception. Part I. Fundamentals of Vision: Low and Mid-level Processes in Perception, ed. S. Martinez-Conde, S. L. Macknik, L. M. Martinez, J. M. Alonso, and P. U. Tse, 33–70. Amsterdam: Elsevier. Carrasco, M., S. Ling, and S. Read. 2004. Attention alters appearance. Nature Neuroscience 7: 308–313. Carver, R. A., and V. Brown. 1997. Effects of amount of attention allocated to the location of visual stimulus pairs on perception of simultaneity. Perception & Psychophysics 59: 534–542. Cherry, C. E. 1953. Some experiments on the recognition of speech with one and two ears. Journal of the Acoustical Society of America 25: 975–979. Corbetta, M., and G. L. Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience 3: 201–215. Dennett, D. 1991. Consciousness explained. Boston: Little, Brown & Co. Deutsch, J. A., and D. Deutsch. 1963. Attention: Some theoretical considerations. Psychological Review 70: 80–90. Doesburg, S. M., J. J. Green, J. J. McDonald, and L. M. Ward. 2009. From local inhibition to long-range integration: A functional dissociation of alpha-band synchronization across cortical scales in visuospatial attention. Brain Research 1303: 97–110. Driver, J., and T. Noesselt. 2008. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments. Neuron 57: 11–23. Driver, J., and C. Spence. 2004. Crossmodal spatial attention: Evidence from human performance. In Crossmodal space and crossmodal attention, ed. C. Spence and J. Driver, 179–220. Oxford: Oxford Univ. Press. Dufour, A. 1999. Importance of attentional mechanisms in audiovisual links. Experimental Brain Research 126: 215–222. Eimer, M., J. van Velzen, and J. Driver. 2002. Cross-modal interactions between audition, touch, and vision in endogenous spatial attention: ERP evidence on preparatory states and sensory modulations. Journal of Cognitive Neuroscience 14: 254–271. Eimer, M., and E. Schröger. 1998. ERP effects of intermodal attention and cross-modal links in spatial attention. Psychophysiology 35: 313–327. Eriksen, C. W., and J. E. Hoffman. 1972. Temporal and spatial characteristics of selective encoding from visual displays. Perception & Psychophysics 12: 201–204. Farah, M. J., A. B. Wong, M. A. Monheit, and L. A. Morrow. 1989. Parietal lobe mechanisms of spatial attention—modality-specific or supramodal. Neuropsychologia 27: 461–470. Fechner, G. T. 1882. Revision der Hauptpunkte der Psychophysik. Leipzig: Breitkopf & Härtel. Folk, C. L., R. W. Remington, and J. C. Johnston. 1992. Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance 18: 1030–1044. Foxe, J. J., and C. E. Schroeder. 2005. The case for feedforward multisensory convergence during early cortical processing. Neuroreport 16: 419–423. Fukuda, K., and E. K. Vogel, 2009. Human variation in overriding attentional capture. Journal of Neuroscience 29: 8726–8733. Fuller, S., R. Z. Rodriguez, and M. Carrasco. 2008. Apparent contrast differs across the vertical meridian: Visual and attentional factors. Journal of Vision 8: 1–16. Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley. Green, J. J., and J. J. McDonald. 2008. Electrical neuroimaging reveals timing of attentional control activity in human brain. PLoS Biology 6: e81. Heinze, H. J., G. R. Mangun, and S. A. Hillyard. 1990. Visual event-related potentials index perceptual accuracy during spatial attention to bilateral stimuli. In Psychophysiological Brain Research, ed. C. Brunia et al., 196–202. Tilburg, The Netherlands: Tilburg Univ. Press. Heinze, H. J., G. R. Mangun, W. Burchert et al. 1994. Combined spatial and temporal imaging of brain activity during visual selective attention in humans. Nature 372: 543–546. Helmholtz, H. V. 1866. Treatise on psychological optics, 3rd ed., Vols. 2 & 3. Rochester: Optical Society of America. Hillyard, S. A., G. V. Simpson, D. L. Woods, S. Vanvoorhis, and T. F. Münte. 1984. Event-related brain potentials and selective attention to different modalities. In Cortical Integration, ed. F. Reinoso-Suarez and C. Ajmone-Marsan, 395–414. New York: Raven Press. James, W. 1890. The principles of psychology. New York: Henry Holt. LaBerge, D. 1995. Attentional processing: The brain’s art of mindfulness. Cambridge, MA: Harvard Univ. Press.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception
525
Ling, S., and M. Carrasco. 2007. Transient covert attention does alter appearance: A reply to Schneider 2006. Perception & Psychophysics 69: 1051–1058. Lu, Z. L., and B. A. Dosher. 1998. External noise distinguishes attention mechanisms. Vision Research 38: 1183–1198. Luce, P. A. 1986. A computational analysis of uniqueness points in auditory word recognition. Perception & Psychophysics 39: 155–158. Luck, S. J., H. J. Heinze, G. R. Mangun, and S. A. Hillyard. 1990. Visual event-related potentials index focussed attention within bilateral stimulus arrays: II. Functional dissociation of P1 and N1 components. Electroencephalography and Clinical Neurophysiology 75: 528–542. Luck, S. J., S. A. Hillyard, M. Mouloua, and H. L. Hawkins. 1996. Mechanisms of visual–spatial attention: Resource allocation or uncertainty reduction? Journal of Experimental Psychology: Human Perception and Performance 22: 725–737. Luck, S. J., S. A. Hillyard, M. Mouloua, M. G. Woldorff, V. P. Clark, and H. L. Hawkins. 1994. Effects of spatial cuing on luminance detectability: Psychophysical and electrophysiological evidence for early selection. Journal of Experimental Psychology: Human Perception and Performance 20: 887–904. Macaluso, E., C. D. Frith, and J. Driver. 2000. Modulation of human visual cortex by crossmodal spatial attention. Science 289: 1206–1208. McDonald, J. J., and J. J. Green. 2008. Isolating event-related potential components associated with voluntary control of visuo-spatial attention. Brain Research 1227: 96–109. McDonald, J. J., W. A. Teder-Sälejärvi, F. Di Russo, and S. A. Hillyard. 2003. Neural substrates of perceptual enhancement by cross-modal spatial attention. Journal of Cognitive Neuroscience 15: 10–19. McDonald, J. J., W. A. Teder-Sälejärvi, F. Di Russo, and S. A. Hillyard. 2005. Neural basis of auditory-induced shifts in visual time-order perception. Nature Neuroscience 8: 1197–1202. McDonald, J. J., W. A. Teder-Sälejärvi, D. Heraldez, and S. A. Hillyard. 2001. Electrophysiological evidence for the “missing link” in crossmodal attention. Canadian Journal of Experimental Psychology 55: 141–149. McDonald, J. J., W. A. Teder-Sälejärvi, and S. A. Hillyard. 2000. Involuntary orienting to sound improves visual perception. Nature 407: 906–908. McDonald, J. J., and L. M. Ward. 1999. Spatial relevance determines facilitatory and inhibitory effects of auditory covert spatial orienting. Journal of Experimental Psychology: Human Perception and Performance 25: 1234–1252. McDonald, J. J., and L. M. Ward. 2000. Involuntary listening aids seeing: Evidence from human electrophysiology. Psychological Science 11: 167–171. Meredith, M. A., J. W. Nemitz, and B. E. Stein. 1987. Determinants of multisensory integration in superior colliculus neurons: 1. Temporal factors. Journal of Neuroscience 7: 3215–3229. Meredith, M. A., and B. E. Stein. 1996. Spatial determinants of multisensory integration in cat superior colliculus neurons. Journal of Neurophysiology 75: 1843–1857. Mishra, J., A. Martinez, T. J. Sejnowski, and S. A. Hillyard. 2007. Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. Journal of Neuroscience 27: 4120–4131. Mishra, J., A. Martinez, and S. A. Hillyard. 2010. Effect of attention on early cortical processes associated with the sound-induced extra flash illusion. Journal of Cognitive Neuroscience 22: 1714–1729. Pashler, H. E. 1998. The psychology of attention. Cambridge, MA: MIT Press. Posner, M. I. 1978. Chronometric explorations of mind. Hillsdale, NJ: Lawrence Erlbaum. Posner, M. I., Y. Cohen, and R. D. Rafal. 1982. Neural systems control of spatial orienting. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 298: 187–198. Posner, M. I., and M. E. Raichle 1994. Images of mind. New York: W. H. Freeman. Posner, M. I., J. A. Walker, F. J. Friedrich, and R. D. Rafal. 1984. Effects of parietal injury on covert orienting of attention. Journal of Neuroscience 4: 1863–1874. Prime, D. J., J. J. McDonald, J. Green, and L. M. Ward. 2008. When cross-modal spatial attention fails. Canadian Journal of Experimental Psychology 62: 192–197. Prinzmetal, W., V. Long, and J. Leonhardt. 2008. Involuntary attention and brightness contrast. Perception & Psychophysics 70: 1139–1150. Prinzmetal, W., C. McCool, and S. Park. 2005. Attention: Reaction time and accuracy reveal different mechanisms. Journal of Experimental Psychology: General 134: 73–92. Rhodes, G. 1987. Auditory attention and the representation of spatial information. Perception & Psychophysics 42: 1–14. Santangelo, V., and C. Spence. 2008. Crossmodal attentional capture in an unspeeded simultaneity judgement task. Visual Cognition 16: 155–165.
526
The Neural Bases of Multisensory Processes
Schneider, K. A., and D. Bavelier. 2003. Components of visual prior entry. Cognitive Psychology 47: 333–366. Schneider, K. A., and M. Komlos. 2008. Attention biases decisions but does not alter appearance. Journal of Vision 8: 1–10. Schroeder, C. E., and J. Foxe. 2005. Multisensory contributions to low-level, ‘unisensory” processing. Current Opinion in Neurobiology 15: 454–458. Shaw, M. L. 1982. Attending to multiple sources of information: 1.The integration of information in decisionmaking. Cognitive Psychology 14: 353–409. Shaw, M. L. 1984. Division of attention among spatial locations: A fundamental difference between detection of letters and detection of luminance increments. In Attention and Performance X, ed. H. Bouma and D. G. Bouwhui, 109–121. Hillsdale, NJ: Erlbaum. Shimojo, S., S. Miyauchi, and O. Hikosaka. 1997. Visual motion sensation yielded by non-visually driven attention. Vision Research 37: 1575–1580. Shiu, L. P., and H. Pashler. 1994. Negligible effect of spatial precueing on identification of single digits. Journal of Experimental Psychology: Human Perception and Performance 20: 1037–1054. Shore, D. I., C. Spence, and R. M. Klein. 2001. Visual prior entry. Psychological Science 12: 205–212. Smith, P. L. 2000. Attention and luminance detection: Effects of cues, masks, and pedestals. Journal of Experimental Psychology: Human Perception and Performance 26: 1401–1420. Smith, P. L., and R. Ratcliff. 2009. An integrated theory of attention and decision making in visual signal detection. Psychological Review 116: 283–317. Soto-Faraco, S., J. McDonald, and A. Kingstone. 2002. Gaze direction: Effects on attentional orienting and crossmodal target responses. Poster presented at the annual meeting of the Cognitive Neuroscience Society, San Francisco, CA. Spence, C. J., and J. Driver. 1994. Covert spatial orienting in audition—exogenous and endogenous mechanisms. Journal of Experimental Psychology: Human Perception and Performance 20: 555–574. Spence, C., and J. Driver. 1997. Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics 59: 1–22. Spence, C., and J. J. McDonald. 2004. The crossmodal consequences of the exogenous spatial orienting of attention. In The handbook of multisensory processing, ed. G. A. Calvert, C. Spence, and B. E. Stein, 3–25. Cambridge, MA: MIT Press. Spence, C., J. J. McDonald, and J. Driver. 2004. Exogenous spatial cuing studies of human crossmodal attention and multisensory integration. In Crossmodal space and crossmodal attention, ed. C. Spence and J. Driver, 277–320. Oxford: Oxford Univ. Press. Sperling, G., and B. A. Dosher. 1986. Strategy and optimization in human information processing. In Handbook of Perception and Human Performance, ed. K. R. Boff, L. Kaufman, and J. P. Thomas, 1–65. New York: Wiley. Stein, B. E., and M. A. Meredith 1993. The merging of the senses. Cambridge, MA: MIT Press. Stein, B. E., T. R. Stanford, R. Ramachandran, T. J. Perrault, and B. A. Rowland. 2009. Challenges in quantifying multisensory integration: Alternative criteria, models, and inverse effectiveness. Experimental Brain Research 198: 113–126. Stelmach, L. B., and C. M. Herdman. 1991. Directed attention and perception of temporal-order. Journal of Experimental Psychology: Human Perception and Performance 17: 539–550. Stevens, H. C. 1904. A simple complication pendulum for qualitative work. American Journal of Psychology 15: 581. Stone, J. V., N. M. Hunkin, J. Porrill et al. 2001. When is now? Perception of simultaneity. Proceedings of the Royal Society of London Series B: Biological Sciences 268: 31–38. Störmer, V. S., J. J. McDonald, and S. A. Hillyard. 2009. Cross-modal cueing of attention alters appearance and early cortical processing of visual stimuli. PNAS 106: 22456–22461. Talsma, D., and M. G. Woldorff. 2005. Selective attention and multisensory integration: Multiple phases of effects on the evoked brain activity. Journal of Cognitive Neuroscience 17: 1098–1114. Tassinari, G., S. Aglioti, L. Chelazzi, A. Peru, and G. Berlucchi. 1994. Do peripheral non-informative cues induce early facilitation of target detection. Vision Research 34: 179–189. Teder-Sälejärvi, W. A., T. F. Münte, F. J. Sperlich, and S. A. Hillyard. 1999. Intra-modal and cross-modal spatial attention to auditory and visual stimuli. An event-related brain potential study. Cognitive Brain Research 8: 327–343. Titchener, E. N. 1908. Lectures on the elementary psychology of feeling and attention. New York: The MacMillan Company.
Cross-Modal Spatial Cueing of Attention Influences Visual Perception
527
Treisman, A., and G. Geffen. 1967. Selective attention: Perception or response? Quarterly Journal of Experimental Psychology 19: 1–18. Vibell, J., C. Klinge, M. Zampini, C. Spence, and A. C. Nobre. 2007. Temporal order is coded temporally in the brain: Early event-related potential latency shifts underlying prior entry in a cross-modal temporal order judgment task. Journal of Cognitive Neuroscience 19: 109–120. Ward, L. M., J. J. McDonald, and N. Golestani. 1998. Cross-modal control of attention shifts. In Visual attention, ed. R. D. Wright, 232–268. New York: Oxford Univ. Press. Ward, L. M., J. J. McDonald, and D. Lin. 2000. On asymmetries in cross-modal spatial attention orienting. Perception & Psychophysics 62: 1258–1264. Watt, R. J. 1991. Understanding vision. San Diego, CA: Academic Press. Worden, M. S., J. J. Foxe, N. Wang, and G. V. Simpson. 2000. Anticipatory biasing of visuospatial attention indexed by retinotopically specific-band electroencephalography increases over occipital cortex. Journal of Neuroscience 20 (RC63): 1–6. Wright, R. D., and L. M. Ward. 2008. Orienting of attention. New York: Oxford Univ. Press. Wundt, W. 1874. Grundzüge der physiologischen psychologies [Foundations of physiological psychology]. Leipzig, Germany: Wilhelm Engelmann.
27
The Colavita Visual Dominance Effect Charles Spence, Cesare Parise, and Yi-Chuan Chen
CONTENTS 27.1 Introduction........................................................................................................................... 529 27.2 Basic Findings on Colavita Visual Dominance Effect.......................................................... 531 27.2.1 Stimulus Intensity...................................................................................................... 531 27.2.2 Stimulus Modality..................................................................................................... 531 27.2.3 Stimulus Type............................................................................................................ 532 27.2.4 Stimulus Position....................................................................................................... 532 27.2.5 Bimodal Stimulus Probability................................................................................... 532 27.2.6 Response Demands.................................................................................................... 533 27.2.7 Attention.................................................................................................................... 533 27.2.8 Arousal...................................................................................................................... 534 27.2.9 Practice Effects.......................................................................................................... 535 27.3 Interim Summary.................................................................................................................. 537 27.4 Prior Entry and Colavita Visual Dominance Effect.............................................................. 537 27.5 Explaining the Colavita Visual Dominance Effect...............................................................540 27.5.1 Accessory Stimulus Effects and Colavita Effect.......................................................540 27.5.2 Perceptual and Decisional Contributions to Colavita Visual Dominance Effect...... 541 27.5.3 Stimulus, (Perception), and Response?...................................................................... 542 27.6 Biased (or Integrated) Competition and Colavita Visual Dominance Effect........................ 545 27.6.1 Putative Neural Underpinnings of Modality-Based Biased Competition................. 545 27.6.2 Clinical Extinction and Colavita Visual Dominance Effect..................................... 547 27.7 Conclusions and Questions for Future Research...................................................................548 27.7.1 Modeling the Colavita Visual Dominance Effect..................................................... 549 27.7.2 Multisensory Facilitation versus Interference........................................................... 549 References....................................................................................................................................... 550
27.1 INTRODUCTION Visually dominant behavior has been observed in many different species, including birds, cows, dogs, and humans (e.g., Partan and Marler 1999; Posner et al. 1976; Uetake and Kudo 1994; Wilcoxin et al. 1971). This has led researchers to suggest that visual stimuli may constitute “prepotent” stimuli for certain classes of behavioral responses (see Colavita 1974; Foree and LoLordo 1973; LoLordo 1979; Meltzer and Masaki 1973; Shapiro et al. 1980). One particularly impressive example of vision’s dominance over audition (and more recently, touch) has come from research on the Colavita visual dominance effect (Colavita 1974). In the basic experimental paradigm, participants have to make speeded responses to a random series of auditory (or tactile), visual, and audiovisual (or visuotactile) targets, all presented at a clearly suprathreshold level. Participants are instructed to make one response whenever an auditory (or tactile) target is presented, another response whenever a visual target is presented, and to make both responses whenever the auditory (or tactile) and visual targets 529
530
The Neural Bases of Multisensory Processes
are presented at the same time (i.e., on the bimodal target trials). Typically, the unimodal targets are presented more frequently than the bimodal targets (the ratio of 40% auditory—or tactile—targets, 40% visual targets, and 20% bimodal targets has often been used; e.g., Koppen and Spence 2007a, 2007b, 2007c). The striking result to have emerged from a number of studies on the Colavita effect is that although participants have no problem in responding rapidly and accurately to the unimodal targets, they often fail to respond to the auditory (or tactile) targets on the bimodal target trials (see Figure 27.1a and b). It is almost as if the simultaneous presentation of the visual target leads to the “extinction” of the participants’ perception of, and/or response to, the nonvisual target on a proportion of the bimodal trials (see Egeth and Sager 1977; Hartcher-O’Brien et al. 2008; Koppen et al. 2009; Koppen and Spence 2007c). Although the majority of research on the Colavita effect has focused on the pattern of errors made by participants in the bimodal target trials, it is worth noting that visual dominance can also show up in reaction time (RT) data. For example, Egeth and Sager (1977) reported that although participants responded more rapidly to unimodal auditory targets than to unimodal visual targets, this pattern of results was reversed on the bimodal target trials—that is, participants responded
60 50 40 30 20 10 0
Audiovisual Colavita Colavita effect
(b) % of responses
% of responses
(a)
Both responses Vision-only Audition-only
60 50 40 30 20 10 0
Response type
60 50 40 30 20 10 0
Audiovisual Colavita - Coffeine
Colavita effect
Both responses Vision-only Audition-only
Response type
Colavita effect
Both responses Vision-only
Response type
(d) % of responses
% of responses
(c)
Visuotactile Colavita
60 50 40 30 20 10 0
Touch-only
Audiovisual Colavita - Placebo
Colavita effect
Both responses Vision-only Audition-only
Response type
FIGURE 27.1 Results of experiments conducted by Elcock and Spence (2009) highlighting a significant Colavita visual dominance effect over both audition (a) and touch (b). Values reported in the graphs refer to the percentage of bimodal target trials in which participants correctly made both responses, or else made either a visual-only or auditory- (tactile-) only response. The order in which the two experiments were performed was counterbalanced across participants. Nine participants (age, 18–22 years) completed 300 experimental trials (40% auditory; 40% visual, and 20% bimodal; plus 30 unimodal practice trials) in each experiment. In audiovisual experiment (a), auditory stimulus consisted of a 4000-Hz pure tone (presented at 63dB), visual stimulus consisted of illumination of loudspeaker cone by an LED (64.3 cd/m2). In the visuotactile experiment (b), the stimulus was presented to a finger on the participant’s left hand, and the visual target now consisted of illumination of the same finger. Thus, auditory, visual, and tactile stimuli were presented from exactly the same spatial location. Participants were given 2500 ms from the onset of the target in which to respond, and intertrial interval was set at 650 ms. The Colavita effect was significant in both cases, that is, participants in audiovisual experiment made 45% more visual-only than auditory-only responses, whereas participants in visuotactile experiment made 41% more visual-only than tactile-only responses. (c and d) Results from Elcock and Spence’s Experiment 3, in which they investigated the effects of caffeine (c) versus a placebo pill (d) on the audio visual Colavita visual dominance effect. The results show that participants made significantly more visual-only than auditory-only responses in both conditions (24% and 29% more, respectively), although there was no significant difference between the magnitude of Colavita visual dominance effect reported in two cases.
The Colavita Visual Dominance Effect
531
more rapidly to the visual targets than to the auditory targets. Note that Egeth and Sager made sure that their participants always responded to both the auditory and visual targets on the bimodal trials by presenting each target until the participant had made the relevant behavioral response.* A similar pattern of results in the RT data has also been reported in a number of other studies (e.g., Colavita 1974, 1982; Colavita and Weisberg 1979; Cooper 1998; Koppen and Spence 2007a; Sinnett et al. 2007; Zahn et al. 1994). In this article, we will focus mainly (although not exclusively) on the Colavita effect present in the error data (in line with the majority of published research on this phenomenon). We start by summarizing the basic findings to have emerged from studies of the Colavita visual dominance effect conducted over the past 35 years or so. By now, many different factors have been investigated in order to determine whether they influence the Colavita effect: Here, they are grouped into stimulusrelated factors (such as stimulus intensity, stimulus modality, stimulus type, stimulus position, and bimodal stimulus probability) and task/participant-related factors (such as attention, arousal, task/ response demands, and practice). A range of potential explanations for the Colavita effect are evaluated, and all are shown to be lacking. A new account of the Colavita visual dominance effect is therefore proposed, one that is based on the “biased competition” model put forward by Desimone and Duncan (1995; see also Duncan 1996; Peers et al. 2005). Although this model was initially developed in order to provide an explanation for the intramodal competition taking place between multiple visual object representations in both normal participants and clinical patients (suffering from extinction), here we propose that it can be extended to provide a helpful framework in which to understand what may be going on the Colavita visual dominance effect. In particular, we argue that a form of cross-modal biased competition can help to explain why participants respond to the visual stimulus while sometimes failing to respond to the nonvisual stimulus on the bimodal target trials in the Colavita paradigm. More generally, it is our hope that explaining the Colavita visual dominance effect may provide an important step toward understanding the mechanisms underlying multisensory interactions. First, though, we review the various factors that have been hypothesized to influence the Colavita visual dominance effect.
27.2 BASIC FINDINGS ON COLAVITA VISUAL DOMINANCE EFFECT 27.2.1 Stimulus Intensity The Colavita visual dominance effect occurs regardless of whether the auditory and visual stimuli are presented at the same (subjectively matched) intensity (e.g., Colavita 1974; Koppen et al. 2009; Zahn et al. 1994) or the auditory stimulus is presented at an intensity that is rated subjectively as being twice that of the visual stimulus (see Colavita 1974, Experiment 2). Hartcher-O’Brien et al. (2008; Experiment 4) have also shown that vision dominates over touch under conditions in which the intensity of the tactile stimulus is matched to that of the visual stimulus (presented at the 75% detection threshold). Taken together, these results suggest that the dominance of vision over both audition and touch in the Colavita paradigm cannot simply be attributed to any systematic differences in the relative intensity of the stimuli that have been presented to participants in previous studies (but see also Gregg and Brogden 1952; O’Connor and Hermelin 1963; Smith 1933).
27.2.2 Stimulus Modality Although the majority of the research on the Colavita visual dominance effect has investigated the dominance of vision over audition, researchers have recently shown that vision also dominates over * That is, the visual target was only turned off once the participants made a visual response, and the auditory target was only turned off when the participants made an auditory response. This contrasts with Colavita’s (1974) studies, in which a participant’s first response turned off all the stimuli, and with other more recent studies in which the targets were only presented briefly (i.e., for 50 ms; e.g., Koppen and Spence 2007a, 2007b, 2007c, 2007d).
532
The Neural Bases of Multisensory Processes
touch in normal participants (Hartcher-O’Brien et al. 2008, 2010; Hecht and Reiner 2009; see also Gallace et al. 2007). Costantini et al. (2007) have even reported that vision dominates over touch in extinction patients (regardless of whether the two stimuli were presented from the same position, or from different sides; see also Bender 1952). Interestingly, however, no clear pattern of sensory dominance has, as yet, been observed when participants respond to simultaneously presented auditory and tactile stimuli (see Hecht and Reiner 2009; Occelli et al. 2010; but see Bonneh et al. 2008, for a case study of an autistic child who exhibited auditory dominance over both touch and vision). Intriguingly, Hecht and Reiner (2009) have recently reported that vision no longer dominates when targets are presented in all three modalities (i.e., audition, vision, and touch) at the same time. In their study, the participants were given a separate button with which to respond to the targets in each modality, and had to press one, two, or three response keys depending on the combination of target modalities that happened to be presented on each trial. Whereas vision dominated over both audition and touch in the bimodal target trials, no clear pattern of dominance was shown on the trimodal target trials (see also Shapiro et al. 1984, Experiment 3). As yet, there is no obvious explanation for this result.
27.2.3 Stimulus Type The Colavita visual dominance effect has been reported for both onset and offset targets (Colavita and Weisberg 1979; see also Osborn et al. 1963). The effect occurs both with simple stimuli (i.e., tones, flashes of light, and brief taps on the skin) and also with more complex stimuli, including pictures of objects and realistic object sounds, and with auditory and visual speech stimuli (see Koppen et al. 2008; Sinnett et al. 2007, 2008). The Colavita effect not only occurs when the target stimuli are presented in isolation (i.e., in an otherwise dark and silent room), but also when they are embedded within a rapidly presented stream of auditory and visual distractors (Sinnett et al. 2007). Interestingly, however, the magnitude of the Colavita visual dominance effect does not seem to be affected by whether or not the auditory and visual targets on the bimodal trials are semantically congruent (see Koppen et al. 2008).
27.2.4 Stimulus Position Researchers have also considered what effect, if any, varying either the absolute and/or relative location from which the stimuli are presented might have on performance in the Colavita task. The Colavita visual dominance effect occurs both when the auditory stimuli are presented over headphones and when they are presented from an external loudspeaker placed in front of the participant (Colavita 1974, 1982). Researchers have demonstrated that it does not much matter whether the participants look in the direction of the visual or auditory stimulus or else fixate on some other intermediate location (see Colavita et al. 1976). Vision’s dominance over both audition and touch has also been shown to occur regardless of whether the stimuli are presented from the same spatial location or from different positions (one on either side of fixation), although the Colavita effect is somewhat larger in the former case (see Hartcher-O’Brien et al. 2008, 2010; Koppen and Spence 2007c). Taken together, these results therefore show that varying either the absolute position (e.g., presenting the stimuli from the center vs. in the periphery) or relative position (i.e., presenting the various stimuli from the same or different positions) from which the target stimuli are presented has, at most, a relatively modest impact on the magnitude of the Colavita visual dominance effect (see also Johnson and Shapiro 1989).
27.2.5 Bimodal Stimulus Probability As already noted, studies on the Colavita visual dominance effect usually present far fewer bimodal targets than unimodal targets. Nevertheless, researchers have shown that a robust Colavita visual
The Colavita Visual Dominance Effect
533
dominance effect can still be obtained if the probability of each type of target is equalized (i.e., when 33.3% auditory, 33.3% visual, and 33.3% bimodal targets are presented; see Koppen and Spence 2007a). Koppen and Spence (2007d) investigated the effect of varying the probability of bimodal target trials on the Colavita visual dominance effect (while keeping the relative proportion of unimodal auditory and visual target trials matched).* They found that although a significant Colavita effect was demonstrated whenever the bimodal targets were presented on 60% or less of the trials, vision no longer dominated when the bimodal targets were presented on 90% of the trials (see also Egeth and Sager 1974; Manly et al. 1999; Quinlan 2000). This result suggests that the Colavita effect is not caused by stimulus-related (i.e., sensory) factors, since these should not have been affected by any change in the probability of occurrence of bimodal targets (cf. Odgaard et al. 2003, 2004, on this point). Instead, the fact that the Colavita effect disappears if the bimodal targets are presented too frequently (i.e., on too high a proportion of the trials) would appear to suggest that response-related factors (linked to the probability of participants making bimodal target responses) are likely to play an important role in helping to explain the Colavita effect (see also Gorea and Sagi 2000).
27.2.6 Response Demands The majority of studies on the Colavita visual dominance effect have been conducted under conditions in which participants were given a separate response key with which to respond to the targets presented in each sensory modality. Normally, participants are instructed to respond to the (relatively infrequent) bimodal targets by pressing both response keys. Similar results have, however, now also been obtained under conditions in which the participants are given a separate response key with which to respond to the bimodal targets (Koppen and Spence 2007a; Sinnett et al. 2007). This result rules out the possibility that the Colavita effect is simply caused by participants having to make two responses at more or less the same time. Surprisingly, Colavita (1974; Experiment 4) showed that participants still made a majority of visual responses after having been explicitly instructed to respond to the bimodal targets by pressing the auditory response key instead. Koppen et al. (2008) have also reported that the Colavita effect occurs when participants are instructed to press one button whenever they either see or hear a dog, another button whenever they see or hear a cat, and to make both responses whenever a cat and a dog are presented at the same time. Under such conditions, the visual presentation of the picture of one of these animals resulted in participants failing to respond to the sound of the other animal (be it the woofing of the dog or the meowing of the cat) on 10% more of the trials than they failed to respond to the identity of the visually presented animal. Taken together, these results therefore confirm the fact that the Colavita visual dominance effect occurs under a variety of different task demands/response requirements (i.e., it occurs no matter whether participants respond to the sensory modality or semantic identity of the target stimuli).
27.2.7 Attention Originally, researchers thought that the Colavita visual dominance effect might simply reflect a predisposition by participants to direct their attention preferentially toward the visual modality (Colavita 1974; Posner et al. 1976). Posner et al.’s idea was that people endogenously (or voluntarily) directed their attention toward the visual modality in order to make up for the fact that visual stimuli are generally less alerting than stimuli presented in the other modalities (but see Spence et al. 2001b, footnote 5). Contrary to this suggestion, however, a number of more recent studies have actually * Note that researchers have also manipulated the relative probability of unimodal auditory and visual targets (see Egeth and Sager 1977; Quinlan 2000; Sinnett et al. 2007). However, since such probability manipulations have typically been introduced in the context of trying to shift the focus of a participant’s attention between the auditory and visual modalities, they will be discussed later (see Section 27.2.7).
534
The Neural Bases of Multisensory Processes
shown that although the manipulation of a person’s endogenous attention can certainly modulate the extent to which vision dominates over audition, it cannot in and of itself be used to reverse the Colavita effect. That is, even when a participant’s attention is directed toward the auditory modality (i.e., by verbally instructing them to attend to audition or by presenting unimodal auditory targets much more frequently than unimodal visual targets), people still exhibit either visually dominant behavior or else their behavior shows no clear pattern of dominance (see Koppen and Spence 2007a, 2007d; Sinnett et al. 2007). These results therefore demonstrate that any predisposition that participants might have to direct their attention voluntarily (or endogenously) toward the visual modality cannot explain why vision always seems to dominate in the Colavita visual dominance effect. De Reuck and Spence (2009) recently investigated whether varying the modality of a secondary task would have any effect on the magnitude of the Colavita visual dominance effect. To this end, a video game (“Food boy” by T3Software) and a concurrent auditory speech stream (consisting of pairs of auditory words delivered via a central loudspeaker) were presented in the background while participants performed the two-response version of the Colavita task (i.e., pressing one key in response to auditory targets, another key in response to visual targets, and both response keys on the bimodal target trials; the auditory targets in this study consisted of a 4000-Hz pure tone presented from a loudspeaker cone placed in front of the computer screen, whereas the visual target consisted of the illumination of a red light-emitting device (LED), also mounted in front of the computer screen). In the condition involving the secondary visual task, the participants performed the Colavita task with their right hand while playing the video game with their left hand (note that the auditory distracting speech streams were presented in the background, although they were irrelevant in this condition and so could be ignored). The participants played the video game using a computer mouse to control a character moving across the bottom of the computer screen. The participants had to “swallow” as much of the food dropping from the top of the screen as possible, while avoiding any bombs that happened to fall. In the part of the study involving an auditory secondary task, the video game was run in the demonstration mode to provide equivalent background visual stimulation to the participants who now had to respond by pressing a button with their left hand whenever they heard an animal name in the auditory stream. The results showed that the modality of the secondary task (auditory or visual) did not modulate the magnitude of the Colavita visual dominance effect significantly, that is, the participants failed to respond to a similar number of the auditory stimuli regardless of whether they were performing a secondary task that primarily involved participants having to attend to the auditory or visual modality. De Reuck and Spence’s (2009) results therefore suggest that the Colavita visual dominance effect may be insensitive to manipulations of participants’ attention toward either the auditory or visual modality that are achieved by varying the requirements of a simultaneously performed secondary task (see Spence and Soto-Faraco 2009). Finally, Koppen and Spence (2007a) have shown that exogenously directing a participant’s attention toward either the auditory or visual modality via the presentation of a task-irrelevant nonpredictive auditory or visual cue 200 ms before the onset of the target (see Rodway 2005; Spence et al. 2001a; Turatto et al. 2002) has only a marginal effect on the magnitude of vision’s dominance over audition (see also Golob et al. 2001). Taken together, the results reported in this section therefore highlight the fact that although attentional manipulations (be they exogenous or endogenous) can sometimes be used to modulate, or even to eliminate, the Colavita visual dominance effect, they cannot be used to reverse it.
27.2.8 Arousal Early animal research suggested that many examples of visual dominance could be reversed under conditions in which an animal was placed in a highly aroused state (i.e., when, for example, fearful of the imminent presentation of an electric shock; see Foree and LoLordo 1973; LoLordo and
The Colavita Visual Dominance Effect
535
Furrow 1976; Randich et al. 1978). It has been reported that although visual stimuli tend to control appetitive behaviors, auditory stimuli tend to control avoidance behaviors in many species. Shapiro et al. (1984) extended the idea that changes in the level of an organism’s arousal might change the pattern of sensory dominance in the Colavita task to human participants (see also Johnson and Shapiro 1989; Shapiro and Johnson 1987). They demonstrated what looked like auditory dominance (i.e., participants making more auditory-only than visual-only responses in the Colavita task) under conditions in which their participants were aversively motivated (by the occurrence of electric shock, or to a lesser extent by the threat of electric shock, or tactile stimulation, presented after the participants’ response on a random 20% of the trials). It should, however, be noted that no independent measure of the change in a participant’s level of arousal (i.e., such as a change in their galvanic skin response) was provided in this study. What is more, Shapiro et al.’s (1984) participants were explicitly told to respond to the stimulus that they perceived first on the bimodal target trials, that is, the participants effectively had to perform a temporal order judgment (TOJ) task. What this means in practice is that their results (and those from the study of Shapiro and Johnson (1987) and Johnson and Shapiro (1989), in which similar instructions were given) may actually reflect the effects of arousal on “prior entry” (see Spence 2010; Van Damme et al. 2009b), rather than, as the authors argued, the effects of arousal on the Colavita visual dominance effect. Indeed, the latest research has demonstrated that increased arousal can lead to the prior entry of certain classes of stimuli over others (when assessed by means of a participant’s responses on a TOJ task; Van Damme et al. 2009b). In Van Damme et al.’s study, auditory and tactile stimuli delivered from close to one of the participant’s hands were prioritized when an arousing picture showing physical threat to a person’s bodily tissues was briefly flashed beforehand from the same (rather than opposite) location. Meanwhile, Van Damme et al. (2009a) have shown that, when participants are instructed to respond to both of the stimuli in the bimodal trials, rather than just to the stimulus that the participant happens to have perceived first, the effects of arousal on the Colavita visual dominance effect are far less clear-cut (we return later to the question of what role, if any, prior entry plays in the Colavita visual dominance effect). Elcock and Spence (2009) recently investigated the consequences for the Colavita effect of pharmacologically modulating the participants’ level of arousal by administering caffeine. Caffeine is known to increase arousal and hence, given Shapiro et al.’s (1984) research, ingesting caffeine might be expected to modulate the magnitude of the Colavita visual dominance effect (Smith et al. 1992).* To this end, 15 healthy participants were tested in a within-participants, double-blind study, in which a 200-mg caffeine tablet (equivalent to drinking about two cups of coffee) was taken 40 min before one session of the Colavita task and a visually identical placebo pill was taken before the other session (note that the participants were instructed to refrain from consuming any caffeine in the morning before taking part in the study). The Colavita visual dominance effect was unaffected by whether the participants had ingested the caffeine tablet or the placebo (see Figure 27.1c and d). Taken together, the results reported in this section would therefore appear to suggest that, contrary to Shapiro et al.’s early claim, the magnitude of the Colavita visual dominance effect is not affected by changes in a participant’s level of arousal.
27.2.9 Practice Effects The largest Colavita visual dominance effects have been reported in studies in which only a small number of bimodal target trials were presented. In fact, by far the largest effects on record were reported by Frank B. Colavita himself in his early research (see Koppen and Spence 2007a, Table 1, * Caffeine is a stimulant that accelerates physiological activity, and results in the release of adrenaline and the increased production of the neurotransmitter dopamine. Caffeine also interferes with the operation of another neurotransmitter: adenosine (Smith 2002; Zwyghuizen-Doorenbos et al. 1990).
536
The Neural Bases of Multisensory Processes Auditory-only responses Auditory dominance
Errors (% bimodal trials)
25
Visual-only responses Visual dominance
20 15 10 5 0 –600 Audition first
–400
–200
0
SOA (ms)
200
400
600 Vision first
FIGURE 27.2 Graph highlighting the results of Koppen and Spence’s (2007b) study of Colavita effect in which auditory and visual targets on bimodal target trials could be presented at any one of 10 SOAs. Although a significant visual dominance effect was observed at a majority of asynchronies around objective simultaneity, a significant auditory dominance effect was only observed at the largest auditory-leading asynchrony. Shaded gray band in the center of the graph represents the temporal window of audiovisual integration. Shaded areas containing the ear and the eye schematically highlight SOAs at which auditory and visual dominance, respectively, were observed. Note though (see text on this point) that differences between the proportion of auditory-only and visual-only responses only reached statistical significance at certain SOAs (that said, the trend in the data is clear). The error bars represent standard errors of means.
for a review). In these studies, each participant was only ever presented with a maximum of five or six bimodal targets (see Colavita 1974, 1982; Colavita et al. 1976; Colavita and Weisberg 1979). Contrast this with the smaller Colavita effects that have been reported in more recent research, where as many as 120 bimodal targets were presented to each participant (e.g., Hartcher-O’Brien et al. 2008; Koppen et al. 2008; Koppen and Spence 2007a, 2007c). This observation leads on to the suggestion that the Colavita visual dominance effect may be more pronounced early on in the experimental session (see also Kristofferson 1965).* That said, significant Colavita visual dominance effects have nevertheless still been observed in numerous studies where participants’ performance has been averaged over many hundreds of trials. Here, it may also be worth considering whether any reduction in the Colavita effect resulting from increasing the probability of (and/or practice with responding to) bimodal stimuli may also be related to the phenomenon of response coupling (see Ulrich and Miller 2008). That is, the more often two independent target stimuli happen to be presented at exactly the same time, the more likely it is that the participant will start to couple (i.e., program) their responses to the two stimuli together. In the only study (as far as we are aware) to have provided evidence relevant to the question of the consequence of practice on the Colavita visual dominance effect, the vigilance performance of a group of participants was assessed over a 3-h period (Osborn et al. 1963). The participants in this study had to monitor a light and sound source continuously for the occasional (once every 2½ min) brief (i.e., lasting only 41 ms) offset of either or both of the stimuli. The participants were instructed to press one button whenever the light was extinguished and another button whenever the sound was interrupted. The results showed that although participants failed to respond to more of the auditory than visual targets during the first 30-min session (thus showing a typical Colavita visual dominance effect), this pattern of results reversed in the final four 30-min sessions (i.e., participants made * Note that if practice were found to reduce the magnitude of the Colavita visual dominance effect, then this might provide an explanation for why increasing the probability of occurrence of bimodal target trials up to 90% in Koppen and Spence’s (2007d) study has been shown to eliminate the Colavita effect (see Section 27.2.5). Alternatively, however, increasing the prevalence (or number) of bimodal targets might also lead to the increased coupling of a participants’ responses on the bimodal trials (see main text for further details; Ulrich and Miller 2008).
The Colavita Visual Dominance Effect
537
more auditory-only than visual-only responses on the bimodal target trials; see Osborn et al. 1963; Figure 27.2). It is, however, unclear whether these results necessarily reflect the effects of practice on the Colavita visual dominance effect, or whether instead they may simply highlight the effects of fatigue or boredom after the participants had spent several hours on the task (given that auditory events are more likely to be responded to than visual events should the participants temporarily look away or else close their eyes).
27.3 INTERIM SUMMARY To summarize, the latest research has confirmed the fact that the Colavita visual dominance effect is a robust empirical phenomenon. The basic Colavita effect—defined here in terms of participants failing to respond to the nonvisual stimulus more often than they fail to respond to the visual stimulus on the bimodal audiovisual or visuotactile target trials—has now been replicated in many different studies, and by a number of different research groups (although it is worth noting that the magnitude of the effect has fluctuated markedly from one study to the next). That said, the Colavita effect appears to be robust to a variety of different experimental manipulations (e.g., of stimulus intensity, stimulus type, stimulus position, response demands, attention, arousal, etc.). Interestingly, though, while many experimental manipulations have been shown to modulate the size of the Colavita visual dominance effect, and a few studies have even been able to eliminate it entirely, only two of the studies discussed thus far have provided suggestive evidence regarding a reversal of the Colavita effect in humans (i.e., evidence that is consistent with, although not necessarily providing strong support for, auditory dominance; see Osborn et al. 1963; Shapiro et al. 1984). Having reviewed the majority of the published research on the Colavita visual dominance effect, and having ruled out accounts of the effect in terms of people having a predisposition to attend endogenously to the visual modality (see Posner et al. 1976), differences in stimulus intensity (Colavita 1974), and/or difficulties associated with participants having to make two responses at the same time on the bimodal target trials (Koppen and Spence 2007a), how should the effect be explained? Well, researchers have recently been investigating whether the Colavita effect can be accounted for, at least in part, by the prior entry of the visual stimulus to participants’ awareness (see Spence 2010; Spence et al. 2001; Titchener 1908). It is to this research that we now turn.
27.4 PRIOR ENTRY AND COLAVITA VISUAL DOMINANCE EFFECT Koppen and Spence (2007b) investigated whether the Colavita effect might result from the prior entry of the visual stimulus into participants’ awareness on some proportion of the bimodal target trials. That is, even though the auditory and visual stimuli were presented simultaneously in the majority of published studies of the Colavita effect, research elsewhere has shown that a visual stimulus may be perceived first under such conditions (see Rutschmann and Link 1964). In order to evaluate the prior entry account of the Colavita visual dominance effect, Koppen and Spence assessed participants’ perception of the temporal order of pairs of auditory and visual stimuli that had been used in another part of the study to demonstrate the typical Colavita visual dominance effect.* Psychophysical analysis of participants’ TOJ performance showed that when the auditory and visual stimuli were presented simultaneously, participants actually judged the auditory stimulus to have been presented slightly, although not significantly, ahead of the visual stimulus (i.e., contrary to what would have been predicted according to the prior entry account; but see Exner 1875 and Hirsh and Sherrick 1961, for similar results; see also Jaśkowski 1996, 1999; Jaśkowski et al. 1990). * Note the importance of using the same stimuli within the same pool of participants, given the large individual differences in the perception of audiovisual simultaneity that have been reported previously (Smith 1933; Spence 2010; Stone et al. 2001).
538
The Neural Bases of Multisensory Processes
It is, however, important to note that there is a potential concern here regarding the interpretation of Koppen and Spence’s (2007b) findings. Remember that the Colavita visual dominance effect is eliminated when bimodal audiovisual targets are presented too frequently (e.g., see Section 27.2.5). Crucially, Koppen and Spence looked for any evidence of the prior entry of visual stimuli into awareness in their TOJ study under conditions in which a pair of auditory and visual stimuli were presented on each and every trial. The possibility therefore remains that visual stimuli may only be perceived before simultaneously presented auditory stimuli under those conditions in which the occurrence of bimodal stimuli is relatively rare (cf. Miller et al. 2009). Thus, in retrospect, Koppen and Spence’s results cannot be taken as providing unequivocal evidence against the possibility that visual stimuli have prior entry into participants’ awareness on the bimodal trials in the Colavita paradigm. Ideally, future research will need to look for any evidence of visual prior entry under conditions in which the bimodal targets (in the TOJ task) are actually presented as infrequently as when the Colavita effect is demonstrated behaviorally (i.e., when the bimodal targets requiring a detection/discrimination response are presented on only 20% or so of the trials). Given these concerns over the design (and hence interpretation) of Koppen and Spence’s (2007b) TOJ study, it is interesting to note that Lucey and Spence (2009) were recently able to eliminate the Colavita visual dominance effect by delaying the onset of the visual stimulus by a fixed 50 ms with respect to the auditory stimuli on the bimodal target trials. Lucey and Spence used a betweenparticipants experimental design in which one group of participants completed the Colavita task with synchronous auditory and visual targets on the bimodal trials (as in the majority of previous studies), whereas for the other group of participants, the onset of the visual target was always delayed by 50 ms with respect to that of the auditory target. The apparatus and materials were identical to those used by Elcock and Spence (2009; described earlier) although the participants in Lucey and Spence’s study performed the three-button version of the audiovisual Colavita task (i.e., in which participants had separate response keys for auditory, visual, and bimodal targets). The results revealed that although participants made significantly more vision-only than auditoryonly responses in the synchronous bimodal condition (10.3% vs. 2.4%, respectively), no significant Colavita visual dominance effect was reported when the onset of the visual target was delayed (4.6% vs. 2.9%, respectively; n.s.). These results therefore demonstrate that the Colavita visual dominance effect can be eliminated by presenting the auditory stimulus slightly ahead of the visual stimulus. The critical question here, following on from Lucey and Spence’s results, is whether auditory dominance would have been elicited had the auditory stimulus led the visual stimulus by a greater interval. Koppen and Spence (2007b) have provided an answer to this question. In their study of the Colavita effect, the auditory and visual stimuli on the bimodal target trials were presented at one of 10 stimulus onset asynchronies (SOAs; from auditory leading by 600 ms through to vision leading by 600 ms). Koppen and Spence found that the auditory lead needed in order to eliminate the Colavita visual dominance effect on the bimodal target trials was correlated with the SOA at which participants reliably started to perceive the auditory stimulus as having been presented before the visual stimulus (defined as the SOA at which participants make 75% audition first responses; see Koppen and Spence 2007b; Figure 27.3). This result therefore suggests that the prior entry of the visual stimulus to awareness plays some role in its dominance over audition in the Colavita effect. That said, however, Koppen and Spence also found that auditory targets had to be presented 600 ms before visual targets in order for participants to make significantly more auditory-only than visual only responses on the bimodal target trials (although a similar nonsignificant trend toward auditory dominance was also reported at an auditory lead of 300 ms; see Figure 27.2). It is rather unclear, however, what exactly caused the auditorily dominant behavior observed at the 600 ms SOA in Koppen and Spence’s (2007b) study. This (physical) asynchrony between the auditory and visual stimuli is far greater than any shift in the perceived timing of visual relative to auditory stimuli that might reasonably be expected due to the prior entry of the visual stimulus to awareness when the targets were actually presented simultaneously (see Spence 2010). In fact, this
539
The Colavita Visual Dominance Effect
RTV
40 ms
V criterion
A(V) criterion A criterion
RTV
R V(
A)
=
V(A) criterion
R
R A(
RTA
R
A
RTV(A)
V
+
Neural activity
(b) Neural activity
(a)
V)
=
Time
Time
Stimulus onset
RTA Unimodal RT
RTA(V) Time
Neural activity
35 ms
+
Neural activity
(c)
Criterion
R
RA
R
V
V(
A)
Criterion R A(
V)
Time
Time
Neural activity
(d)
A(V) criterion V(A) criterion
V
)
R
Un im
V( A)
od
al R
Unimodal criterion
R A(
Time
FIGURE 27.3 (a) Schematic illustration of the results of Sinnett et al.’s (2008; Experiment 2) speeded target detection study. The figure shows how the presentation of an accessory sound facilitates visual RTs (RT V(A)), whereas the presentation of an accessory visual stimulus delays auditory RTs (RTA(V)). Note that unimodal auditory (RTA) and visual (RT V) response latencies were serendipitously matched in this study (V, visual target; A, auditory stimulus). (b) Schematic diagrams showing how the asymmetrical cross-modal accessory stimulus effects reported by Sinnett et al. might lead to more (and more rapid) vision-only than auditory-only responses on bimodal trials. Conceptually simple models outlined in panels (b) and (c) account for Sinnett et al.’s asymmetrical RT effect in terms of changes in the criterion for responding to auditory and visual targets (on bimodal as opposed to unimodal trials; (b) or in terms of asymmetrical cross-modal changes in the rate of information accrual (c). We plot the putative rate of information accrual (R) as a function of stimuli presented. However, the results of Koppen et al.’s (2009) recent signal detection study of Colavita effect has now provided evidence that is inconsistent with both of these simple accounts (see Figure 27.4). Hence, in panel (d), a mixture model is proposed in which the presentation of an accessory stimulus in one modality leads both to a change in criterion for responding to targets in the other modality (in line with the results of Koppen et al.’s, study) and also to an asymmetrical effect on the rate of information accrual in the other modality (see Koppen et al. 2007a; Miller 1986).
540
The Neural Bases of Multisensory Processes
SOA is also longer than the mean RT of participants’ responses to the unimodal auditory (440 ms) targets. Given that the mean RT for auditory only responses on the bimodal target trials was only 470 ms (i.e., 30 ms longer, on average, than the correct responses on the bimodal trials; see Koppen and Spence 2007b, Figure 1 and Table 1), one can also rule out the possibility that this failure to report the visual stimulus occurred on trials in which the participants made auditory responses that were particularly slow. Therefore, given that the visual target on the bimodal trials (in the 600 ms SOA vision-lagging condition) was likely being extinguished by an already-responded-to auditory target, one might think that this form of auditory dominance reflects some sort of refractory period effect (i.e., resulting from the execution of the participants’ response to the first target; see Pashler 1994; Spence 2008), rather than the Colavita effect proper. In summary, although Koppen and Spence’s (2007b) results certainly do provide an example of auditory dominance, the mechanism behind this effect is most probably different from the one causing the visual dominance effect that has been reported in the majority of studies (of the Colavita effect), where the auditory and visual stimuli were presented simultaneously (see also Miyake et al. 1986). Thus, although recent research has shown that delaying the presentation of the visual stimulus can be used to eliminate the Colavita visual dominance effect (see Koppen and Spence 2007b; Lucey and Spence 2009), and although the SOA at which participants reliably start to perceive the auditory target as having been presented first correlates with the SOA at which the Colavita visual dominance effect no longer occurs (Koppen and Spence 2007b), we do not, as yet, have any convincing evidence that auditory dominance can be observed in the Colavita paradigm by presenting the auditory stimulus slightly before the visual stimulus on the bimodal target trials (i.e., at SOAs where the visual target is presented before the participants have initiated/executed their response to the already-presented auditory target). That is, to date, no simple relationship has been demonstrated between the SOA on the audiovisual target trials in the Colavita paradigm and modality dominance. Hence, we need to look elsewhere for an explanation of vision’s advantage in the Colavita visual dominance effect. Recent progress in understanding what may be going on here has come from studies looking at the effect of accessory stimuli presented in one modality on participants’ speeded responding to targets presented in another modality (Sinnett et al. 2008), and from studies looking at the sensitivity and criterion of participants’ responses in the Colavita task (Koppen et al. 2009).
27.5 EXPLAINING THE COLAVITA VISUAL DOMINANCE EFFECT 27.5.1 Accessory Stimulus Effects and Colavita Effect One of the most interesting recent developments in the study of the Colavita effect comes from an experiment reported by Sinnett et al. (2008; Experiment 2). The participants in this study had to make speeded target detection responses to either auditory or visual targets. An auditory stimulus was presented on 40% of the trials, a visual stimulus was presented on a further 40% of the trials, and both stimuli were presented simultaneously on the remaining 20% of trials (i.e., just as in a typical study of the Colavita effect; note, however, that this task can also be thought of as a kind of go/ no-go task; see Egeth and Sager 1977; Miller 1986; Quinlan 2000). The participants responded significantly more rapidly to the visual targets when they were accompanied by an accessory auditory stimulus than when they were presented by themselves (see Figure 27.3a). By contrast, participants’ responses to the auditory targets were actually slowed by the simultaneous presentation of an accessory visual stimulus (cf. Egeth and Sager 1977). How might the fact that the presentation of an auditory accessory stimulus speeds participants’ visual detection/discrimination responses, whereas the presentation of a visual stimulus slows their responses to auditory stimuli be used to help explain the Colavita visual dominance effect? Well, let us imagine that participants set one criterion for initiating their responses to the relatively common unimodal visual targets and another criterion for initiating their responses to the equally common
The Colavita Visual Dominance Effect
541
unimodal auditory targets. Note that the argument here is phrased in terms of changes in the criterion for responding set by participants, rather than in terms of changes in the perceptual threshold, given the evidence cited below that behavioral responses can sometimes be elicited under conditions in which participants remain unaware (i.e., they have no conscious access to the inducing stimulus). According to Sinnett et al.’s (2008) results, the criterion for initiating a speeded response to the visual targets should be reached sooner on the relatively infrequent bimodal trials than on the unimodal visual trials, whereas it should be reached more slowly (on the bimodal than on the unimodal trials) for auditory targets. There are at least two conceptually simple means by which such a pattern of behavioral results could be achieved. First, the participants could lower their criterion for responding to the visual targets on the bimodal trials while simultaneously raising their criterion for responding to the auditory target (see Figure 27.3b). Alternatively, however, the criterion for initiating a response might not change but the presentation of the accessory stimulus in one modality might instead have a crossmodal effect on the rate of information accrual (R) within the other modality (see Figure 27.3c). The fact that the process of information accrual (like any other internal process) is likely to be a noisy one might then help to explain why the Colavita effect is only observed on a proportion of the bimodal target trials. Evidence that is seemingly consistent with both of these simple accounts can be found in the literature. In particular, evidence consistent with the claim that bimodal (as compared to unimodal) stimulation can result in a change in the rate of information accrual comes from an older go/no-go study reported by Miller (1986). Unimodal auditory and unimodal visual target stimuli were presented randomly in this experiment together with trials in which both stimuli were presented at one of a range of different SOAs (0–167 ms). The participants had to make a simple speeded detection response whenever a target was presented (regardless of whether it was unimodal or bimodal). Catch trials, in which no stimulus was presented (and no response was required), were also included. Analysis of the results provided tentative evidence that visual stimuli needed less time to reach the criterion for initiating a behavioral response (measured from the putative onset of response-related activity) compared to the auditory stimuli on the redundant bimodal target trials—this despite the fact that the initiation of response-related activation after the presentation of an auditory stimulus started earlier in time than following the presentation of a visual stimulus (see Miller 1986, pp. 340– 341). Taken together, these results therefore suggest that stimulus-related information accrues more slowly for auditory targets in the presence (vs. absence) of concurrent visual stimuli than vice versa, just as highlighted in Figure 27.3c. Similarly, Romei et al.’s (2009) recent results showing that looming auditory signals enhance visual excitability in a preperceptual manner can also be seen as being consistent with the information accrual account. However, results arguing for the inclusion of some component of criterion shifting into one’s model of the Colavita visual dominance effect (although note that the results are inconsistent with the simple criterion-shifting model put forward in Figure 27.3b) comes from a more recent study reported by Koppen et al. (2009).
27.5.2 Perceptual and Decisional Contributions to Colavita Visual Dominance Effect Koppen et al. (2009) recently explicitly assessed the contributions of perceptual (i.e., threshold) and decisional (i.e., criterion-related) factors to the Colavita visual dominance effect in a study in which the intensities of the auditory and visual stimuli were initially adjusted until participants were only able to detect them on 75% of the trials. Next, a version of the Colavita task was conducted using these near-threshold stimuli (i.e., rather than the clearly suprathreshold stimuli that have been utilized in the majority of previous studies). A unimodal visual target was presented on 25% of the trials, a unimodal auditory target on 25% of trials, a bimodal audiovisual target on 25% of trials (and no target was presented on the remaining 25% of trials). The task of reporting which target modalities (if any) had been presented in each trial was unspeeded and the participants were instructed to refrain from responding on those trials in which no target was presented.
542
The Neural Bases of Multisensory Processes
Analysis of Koppen et al.’s (2009) results using signal detection theory (see Green and Swets 1966) revealed that although the presentation of an auditory target had no effect on visual sensitivity, the presentation of a visual target resulted in a significant drop in participants’ auditory sensitivity (see Figure 27.4a; see also Golob et al. 2001; Gregg and Brogden 1952; Marks et al. 2003; Odgaard et al. 2003; Stein et al. 1996; Thompson et al. 1958). These results therefore show that the presentation of a visual stimulus can lead to a small, but significant, lowering of sensitivity to a simultaneously presented auditory stimulus, at least when the participants’ task involves trying to detect which target modalities (if any) have been presented.* Koppen et al.’s results suggest that only a relatively small component of the Colavita visual dominance effect may be attributable to the asymmetrical cross-modal effect on auditory sensitivity (i.e., on the auditory perceptual threshold) that results from the simultaneous presentation of a visual stimulus. That is, the magnitude of the sensitivity drop hardly seems large enough to account for the behavioral effects observed in the normal speeded version of the Colavita task. The more important result to have emerged from Koppen et al.’s (2009) study in terms of the argument being developed here was the significant drop in participants’ criterion for responding on the bimodal (as compared to the unimodal) target trials. Importantly, this drop was significantly larger for visual than for auditory targets (see Figure 27.4b). The fact that the criterion dropped for both auditory and visual targets is inconsistent with the simple criterion shifting account of the asymmetrical cross-modal effects highlighted by Sinnett et al. (2008) that were put forward in Figure 27.3b. In fact, when the various results now available are taken together, the most plausible model of the Colavita visual dominance effect would appear to be one in which an asymmetrical lowering of the criterion for responding to auditory and visual targets (Koppen et al. 2009), is paired with an asymmetrical cross-modal effect on the rate of information accrual (Miller 1986; see Figure 27.3d). However, although the account outlined in Figure 27.3d may help to explain why it is that a participant will typically respond to the visual stimulus first on the bimodal target trials (despite the fact that the auditory and visual stimuli are actually presented simultaneously), it does not explain why participants do not quickly recognize the error of their ways (after making a visiononly response, say), and then quickly initiate an additional auditory response.† The participants certainly had sufficient time in which to make a response before the next trial started in many of the studies where the Colavita effect has been reported. For example, in Koppen and Spence’s (2007a, 2007b, 2007c) studies, the intertarget interval was in the region of 1500–1800 ms, whereas mean vision-only response latencies fell in the 500–700 ms range.
27.5.3 Stimulus, (Perception), and Response? We believe that in order to answer the question of why participants fail to make any response to the auditory (or tactile) targets on some proportion of the bimodal target trials in the Colavita paradigm, one has to break with the intuitively appealing notion that there is a causal link between (conscious) perception and action. Instead, it needs to be realized that our responses do not always rely on our first becoming aware of the stimuli that have elicited those responses. In fact, according to Neumann (1990), the only causal link that exists is the one between a stimulus and its associated response. Neumann has argued that conscious perception should not always be conceptualized as a * Note here that a very different result (i.e., the enhancement of perceived auditory intensity by a simultaneously-presented visual stimulus) has been reported in other studies in which the participants simply had to detect the presence of an auditory target (see Odgaard et al. 2004). This discrepancy highlights the fact that the precise nature of a participant’s task constitutes a critical determinant of the way in which the stimuli presented in different modalities interact to influence human information processing (cf. Gondan and Fisher 2009; Sinnett et al. 2008; Wang et al. 2008, on this point). † Note here that we are talking about the traditional two-response version of the Colavita task. Remember that in the threeresponse version, the participant’s first response terminates the trial, and hence there is no opportunity to make a second response.
543
The Colavita Visual Dominance Effect (a) 3.5 Sensitivity (d' )
3 2.5
Target type
2
Unimodal Bimodal
1.5 1 0.5 0
(b)
Auditory
Visual
1.2
Criterion (c)
1 0.8 0.6 0.4 0.2 0
Auditory
Visual
Target modality
FIGURE 27.4 Summary of mean sensitivity (d' ) values (panel a) and criterion (c) (panel b) for unimodal auditory, unimodal visual, bimodal auditory, and bimodal visual targets in Koppen et al.’s (2009) signal detection study of the Colavita visual dominance effect. Error bars indicate the standard errors of means. The results show that although the simultaneous presentation of auditory and visual stimuli resulted in a reduction of auditory sensitivity (when compared to performance in unimodal auditory target trials), no such effect was reported for visual targets. The results also show highlight the fact presentation of a bimodal audiovisual target resulted in a significant reduction in the criteria (c) for responding, and that this effect was significantly larger for visual targets than for auditory targets. (Redrawn from Koppen, C. et al., Exp. Brain Res., 196, 353–360, 2009. With permission.)
necessary stage in the chain of human information processing. Rather, he suggests that conscious perception can, on occasion, be bypassed altogether. Support for Neumann’s view that stimuli can elicit responses in the absence of awareness comes from research showing, for example, that participants can execute rapid and accurate discrimination responses to masked target stimuli that they are subjectively unaware of (e.g., Taylor and McCloskey 1996). The phenomenon of blindsight is also pertinent here (e.g., see Cowey and Stoerig 1991). Furthermore, researchers have shown that people sometimes lose their memory for the second of two stimuli as a result of their having executed a response to the first stimulus (Crowder 1968; Müsseler and Hommel 1997a, 1997b; see also Bridgeman 1990; Ricci and Chatterjee 2004; Rizzolatti and Berti 1990). On the basis of such results, then, our suggestion is that a participant’s awareness (of the target stimuli) in the speeded version of the Colavita paradigm may actually be modulated by the responses that they happen to make (select or initiate) on some proportion of the trials, rather than necessarily always being driven by their conscious perception of the stimuli themselves (see also Hefferline and Perera 1963). To summarize, when participants try to respond rapidly in the Colavita visual dominance task, they may sometimes end up initiating their response before becoming aware of the stimulus (or stimuli) that have elicited that response. Their awareness of which stimuli have, in fact, been presented is then constrained by the response(s) that they actually happen to make. In other words, if (as a participant) I realize that I have made (or am about to make) a vision-only response, it would seem unsurprising that I only then become aware of the visual target, even if an auditory target had also been presented at the same time (although it perhaps reached the threshold for initiating a response
544
The Neural Bases of Multisensory Processes
more slowly than the visual stimulus; see above). Here, one might even consider the possibility that participants simply stop processing (or stop responding to) the target stimulus (or stimuli) after they have selected/triggered a response (to the visual target; i.e., perhaps target processing reflects a kind of self-terminating processing). Sinnett et al.’s (2008) research is crucial here in showing that, as a result of the asymmetrical cross-modal effects of auditory and visual stimuli on each other, the first response that a participant makes on a bimodal target trial is likely to be to a visual (rather than an auditory) stimulus. If this hypothesis regarding people’s failure to respond to some proportion of the auditory (or tactile) stimuli on the bimodal trials in the Colavita paradigm were to be correct, one would expect the fastest visual responses to occur on those bimodal trials in which participants make a visualonly response. Koppen and Spence’s (2007a; Experiment 3) results show just such a result in their three-response study of the Colavita effect (i.e., where participants made one response to auditory targets, one to visual targets, and a third to the bimodal targets; note, however, that the participants did not have the opportunity to respond to the visual and auditory stimuli sequentially in this study). In Koppen and Spence’s study, the visual-only responses on the bimodal target trials were actually significantly faster, on average (mean RT = 563 ms), than the visual-only responses on unimodal visual trials (mean RT = 582 ms; see Figure 27.5). This result therefore demonstrates that even though participants failed to respond to the auditory target, its presence nevertheless still facilitated their behavioral performance. Finally, the vision-only responses (on the bimodal trials) were also found, on average, to be significantly faster than the participants’ correct bimodal responses on the bimodal target trails (mean = 641 ms). Interestingly, however, participants’ auditory-only responses on the bimodal target trials in Koppen and Spence’s (2007a) study were significantly slower, on average, than on the unimodal auditory target trials (mean RTs of 577 and 539 ms, respectively). This is the opposite pattern of results to that seen for the visual target detection data (i.e., a bimodal slowing of responding for auditory targets paired with a bimodal speeding of responding to the visual targets). This result provides additional evidence for the existence of an asymmetrical cross-modal effect on the rate of information accrual). Indeed, taken together, these results mirror those reported by Sinnett et al. (2008) in their speeded target detection task, but note here that the data come from a version of the Colavita task instead. Thus, it really does seem as though the more frequent occurrence of visiononly as compared to auditory-only responses on the bimodal audiovisual target trials in the Colavita visual dominance paradigm is tightly linked to the speed with which a participant initiates his/ *
*
*
n.s.
Colavita effect
Target stimulus 0
539
563
577
582
641
Reaction time (ms)
FIGURE 27.5 Schematic timeline showing the mean latency of participants’ responses (both correct and incorrect responses) in Koppen et al.’s (2007a) three-button version of the Colavita effect. Significant differences between particular conditions of interest (p < .05) are highlighted with an asterisk. (See text for details.)
The Colavita Visual Dominance Effect
545
her response. When participants respond rapidly, they are much more likely to make an erroneous visual-only response than to make an erroneous auditory-only response.*
27.6 BIASED (OR INTEGRATED) COMPETITION AND COLAVITA VISUAL DOMINANCE EFFECT How can the asymmetric cross-modal effects of simultaneously presented auditory and visual targets on each other (that were highlighted in the previous section) be explained? We believe that a fruitful approach may well come from considering them in the light of the biased (or integrated) competition hypothesis (see Desimone and Duncan 1995; Duncan 1996). According to Desimone and Duncan, brain systems (both sensory and motor) are fundamentally competitive in nature. What is more, within each system, a gain in the activation of one object/event representation always occurs at a cost to others. That is, the neural representation of different objects/events is normally mutually inhibitory. An important aspect of Desimone and Duncan’s biased competition model relates to the claim that the dominant neural representation suppresses the neural activity associated with the representation of the weaker stimulus (see Duncan 1996). In light of the discussion in the preceding section (see Section 27.5.2), one might think of biased competition as affecting the rate of information accrual, changing the criterion for responding, and/or changing perceptual sensitivity (but see Gorea and Sagi 2000, 2002). An extreme form of this probabilistic winner-takes-all principle might therefore help to explain why it is that the presentation of a visual stimulus can sometimes have such a profound effect on people’s awareness of the stimuli coded by a different brain area (i.e., modality; see also Hahnloser et al. 1999). Modality-based biased competition can perhaps also provide a mechanistic explanation for the findings of a number of other studies of multisensory information processing. For example, over the years, many researchers have argued that people’s attention is preferentially directed toward the visual modality when pairs of auditory and visual stimuli are presented simultaneously (e.g., see Falkenstein et al. 1991; Golob et al. 2001; Hohnsbein and Falkenstein 1991; Hohnsbein et al. 1991; Oray et al. 2002). As Driver and Vuilleumier (2001, p. 75) describe the biased (or integrated) competition hypothesis: “ . . . multiple concurrent stimuli always compete to drive neurons and dominate the networks (and ultimately to dominate awareness and behavior).” They continue: “various phenomena of ‘attention’ are cast as emergent properties of whichever stimuli happen to win the competition.” In other words, particularly salient stimuli will have a competitive advantage and may thus tend to “attract attention” on purely bottom-up grounds. Visual stimuli might then, for whatever reason (see below), constitute a particularly salient class of stimuli. Such stimulus-driven competition between the neural activation elicited by the auditory (or tactile) and visual targets on bimodal target trials might also help to explain why the attentional manipulations that have been utilized previously have proved so ineffective in terms of reversing the Colavita visual dominance effect (see Koppen and Spence 2007d; Sinnett et al. 2007). That is, although the biasing of a participant’s attention toward one sensory modality (in particular, the nonvisual modality) before stimulus onset may be sufficient to override the competitive advantage resulting from any stimulus-driven biased competition (see McDonald et al. 2005; Spence 2010; Vibell et al. 2007), it cannot reverse it.
27.6.1 Putative Neural Underpinnings of Modality-Based Biased Competition Of course, accounting for the Colavita visual dominance effect in terms of biased competition does not itself explain why it is the visual stimulus that always wins the competition more frequently than the nonvisual stimulus. Although a satisfactory neurally inspired answer to this question will need * One final point to note here concerns the fact that when participants made an erroneous response on the bimodal target trials, the erroneous auditory-only responses were somewhat slower than the erroneous vision-only responses, although this difference failed to reach statistical significance.
546
The Neural Bases of Multisensory Processes
to await future research, it is worth noting here that recent research has highlighted the importance of feedback activity from higher order to early sensory areas in certain aspects of visual awareness (e.g., Lamme 2001; Lamme et al. 2000; Pascual-Leone and Walsh 2001; but see also Macknik 2009; Macknik and Martinez-Conde 2007, in press). It is also pertinent to note that far more of the brain is given over to the processing of visual stimuli than to the processing of stimuli from the other sensory modalities. For example, Sereno et al. (1995) suggest that nearly half of the cortex is involved in the processing of visual information. Meanwhile, Felleman and van Essen (1991) point out that in the macaque there are less than half the number of brain areas involved in the processing of tactile information as involved in the processing of visual information. In fact, in their authoritative literature review, they estimate that 55% of neocortex (by volume), is visual, as compared to 12% somatosensory, 8% motor, 3% auditory, and 0.5% gustatory. Given such statistics, it would seem probable that the visual system might have a better chance of setting-up such feedback activity following the presentation of a visual stimulus than would the auditory or tactile systems following the simultaneous presentation of either an auditory or tactile stimulus. Note that this account suggests that visual dominance is natural, at least for humans, in that it may have a hardwired physiological basis (this idea was originally captured by Colavita et al.’s (1976) suggestion that visual stimuli might be “prepotent”). It is interesting to note in this context that the amount of cortex given over to the processing of auditory and tactile information processing is far more evenly matched than for the competition between audition and vision, hence perhaps explaining the lack of a clear pattern of dominance when stimuli are presented in these two modalities at the same time (see Hecht and Reiner 2009; Occelli et al. 2010). It is also important to note here that progress in terms of explaining the Colavita effect at a neural level might also come from a more fine-grained study of the temporal dynamics of multisensory integration in various brain regions. In humans, the first wave of activity in primary auditory cortex in response to the presentation of suprathreshold stimuli is usually seen at a latency of about 10–15 ms (e.g., Liegeois-Chauvel et al. 1994; Howard et al. 2000; Godey et al. 2001; Brugge et al. 2003). Activity in primary visual cortex starts about 40–50 ms after stimulus presentation (e.g., Foxe et al. 2008; see also Schroeder et al. 1998), whereas for primary somatosensory cortex the figure is about 8–12 ms (e.g., Inui et al., 2004; see also Schroeder et al. 2001). Meanwhile, Schroeder and Foxe (2002, 2004) have documented the asymmetrical time course of the interactions taking place between auditory and visual cortex. Their research has shown that the visual modulation of activity in auditory cortex occurs several tens of milliseconds after the feedforward sweep of activation associated with the processing of auditory stimuli, under conditions where auditory and visual stimuli happen to be presented simultaneously from a location within peripersonal space (i.e., within arm’s reach; see Rizzolatti et al. 1997). This delay is caused by the fact that much of the visual input to auditory cortex is routed through superior temporal polysensory areas (e.g., Foxe and Schroeder 2002; see also Ghazanfar et al. 2005; Kayser et al. 2008; Smiley et al. 2007), and possibly also through prefrontal cortex. It therefore seems plausible to suggest that such delayed visual (inhibitory) input to auditory cortex might play some role in disrupting the setting-up of the feedback activity from higher (auditory) areas.* That said, Falchier et al. (2010) recently reported evidence suggesting the existence of a more direct routing of information from visual to auditory cortex (i.e., from V2 to caudal auditory cortex), hence potentially confusing the story somewhat. By contrast, audition’s influence on visual information processing occurs more rapidly, and involves direct projections from early auditory cortical areas to early visual areas. That is, direct projections have now been documented from the primary auditory cortex A1 to the primary visual cortex V1 (e.g., see Wang et al. 2008; note, however, that these direct connections tend to target * Note here also the fact that visual influences on primary and secondary auditory cortex are greatest when the visual stimulus leads the auditory stimulus by 20–80 ms (see Kayser et al. 2008), the same magnitude of visual leads that have also been shown to give rise to the largest Colavita effect (see Figure 2; Koppen and Spence 2007b).
The Colavita Visual Dominance Effect
547
peripheral, rather than central, locations in the visual field; that said, other projections may well be more foveally targeted). Interestingly, however, until very recently no direct connections had as yet been observed in the opposite direction (see Falchier et al. 2010). These direct projections from auditory to visual cortex may help to account for the increased visual cortical excitability seen when an auditory stimulus is presented together with a visual stimulus (e.g., Martuzzi et al. 2007; Noesselt et al. 2007; Rockland and Ojima 2003; Romei et al. 2007, 2009; see also Besle et al. 2009; Clavagnier et al. 2004; Falchier et al. 2003). Indeed, Bolognini et al. (2010) have recently shown that transcranic magnetic stimulation (TMS)-elicited phosphenes (presented near threshold) are more visible when a white noise burst is presented approximately 40 ms before the TMS pulse (see also Romei et al. 2009). It is also interesting to note here that when auditory and tactile stimuli are presented simulta neously from a distance of less than 1 m (i.e., in peripersonal space), the response in multisensory convergence regions of auditory association cortex is both rapid and approximately simultaneous for these two input modalities (see Schroeder and Foxe 2002, p. 193; see also Foxe et al. 2000, 2002; Murray et al. 2005; Schroeder et al. 2001). Such neurophysiological timing properties may then also help to explain why no clear Colavita dominance effect has as yet been reported between these two modalities (see also Sperdin et al. 2009).* That said, any neurally inspired account of the Colavita effect will likely also have to incorporate the recent discovery of feedforward multisensory interactions to early cortical areas taking place in the thalamus (i.e., via the thalamocortical loop; Cappe et al. 2009). Although any attempt to link human behavior to single-cell neurophysiological data in either awake and anesthetized primates is clearly speculative at this stage, we are nevertheless convinced that this kind of interdisciplinary approach will be needed if we are to develop a fuller understanding of the Colavita effect in the coming years. It may also prove fruitful, when trying to explain why it is that participants fail to make an auditory (or tactile) response once they have made a visual one to consider the neuroscience research on the duration (and decay) of sensory memory in the different modalities (e.g., Lu et al. 1992; Harris et al. 2002; Uusitalo et al. 1996; Zylberberg et al. 2009). Here, it would be particularly interesting to know whether there are any systematic modalityspecific differences in the decay rate of visual, auditory, and tactile sensory memory.
27.6.2 Clinical Extinction and Colavita Visual Dominance Effect It will most likely also be revealing in future research to explore the relationship between the Colavita visual dominance effect and the clinical phenomenon of extinction that is sometimes seen in clinical patients following lateralized (typically right parietal) brain damage (e.g., Baylis et al. 1993; Bender 1952; Brozzoli et al. 2006; Driver and Vuilleumier 2001; Farnè et al. 2007; Rapp and Hendel 2003; Ricci and Chatterjee 2004). The two phenomena share a number of similarities: Both are sensitive to the relative spatial position from which the stimuli are presented (Costantini et al. 2007; Hartcher-O’Brien et al. 2008, 2010; Koppen and Spence 2007c); both are influenced by the relative timing of the two stimuli (Baylis et al. 2002; Costantini et al. 2007; Koppen and Spence 2007b; Lucey and Spence 2009; Rorden et al. 1997); both affect perceptual sensitivity as well as being influenced by response-related factors (Koppen et al. 2009; Ricci and Chatterjee 2004; * It would be interesting here to determine whether the feedforward projections between primary auditory and tactile cortices are any more symmetrical than those between auditory and visual cortices (see Cappe and Barone 2005; Cappe et al. 2009; Hackett et al. 2007; Schroeder et al. 2001; Smiley et al. 2007, on this topic), since this could provide a neural explanation for why no Colavita effect has, as yet, been reported between the auditory and tactile modalities (Hecht and Reiner 2009; Occelli et al. 2010). That said, it should also be borne in mind that the nature of auditory-somatosensory interactions have recently been shown to differ quite dramatically as a function of the body surface stimulated (e.g., different audio–tactile interactions have been observed for stimuli presented close to the hands in frontal space vs. close to the back of the neck in rear space; see Fu et al. 2003; Tajadura-Jiminez et al. 2009; cf. Critchley 1953, p. 19). The same may, of course, also turn out to be true for the auditory–tactile Colavita effect.
548
The Neural Bases of Multisensory Processes
Sinnett et al. 2008; see also Gorea and Sagi 2002). The proportion of experimental trials on which each phenomenon occurs in the laboratory has also been shown to vary greatly between studies. In terms of the biased (or integrated) competition hypothesis (Desimone and Duncan 1995; Duncan 1996), extinction (in patients) is thought to reflect biased competition against stimuli from one side (Driver and Vuilleumier 2001; Rapp and Hendel 2003), whereas here we have argued that the Colavita effect reflects biased competition that favors the processing of visual stimuli. Although extinction has typically been characterized as a spatial phenomenon (i.e., it is the contralesional stimulus that normally extinguishes a simultaneously presented ipsilesional stimulus), it is worth noting that nonspatial extinction effects have also been reported (Costantini et al. 2007; Humphreys et al. 1995; see also Battelli et al. 2007). Future neuroimaging research will hopefully help to determine the extent to which the neural substrates underlying the Colavita visual dominance effect in healthy individuals and the phenomenon of extinction in clinical patients are similar (Sarri et al. 2006). Intriguing data here come from a neuroimaging study of a single patient with visual–tactile extinction reported by Sarri et al. In this patient, awareness of touch on the bimodal visuotactile trials was associated with increased activity in right parietal and frontal regions. Sarri et al. argued that the cross-modal extinction of the tactile stimulus in this patient resulted from increased competition arising from the functional coupling of visual and somatosensory cortex with multisensory parietal cortex. The literature on unimodal and cross-modal extinction suggests that the normal process of biased competition can be interrupted by the kinds of parietal damage that lead to neglect and/or extinction. It would therefore be fascinating to see whether one could elicit the same kinds of biases in neural competition (usually seen in extinction patients) in normal participants, simply by administering TMS over posterior parietal areas (see Driver and Vuilleumier 2001; Duncan 1996; Sarri et al. 2006). Furthermore, following on from the single-cell neurophysiological work conducted by Schroeder and his colleagues (e.g., see Schroeder and Foxe 2002, 2004; Schroeder et al. 2004), it might also be interesting to target superior temporal polysensory areas, and/or the prefrontal cortex in order to try and disrupt the modality-based biased competition seen in the Colavita effect (i.e., rather than the spatial or temporal competition that is more typically reported in extinction patients; see Battelli et al. 2007). There are two principle outcomes that could emerge from such a study, and both seem plausible: (1) TMS over one or more such cortical sites might serve to magnify the Colavita visual dominance effect observed in normal participants, based on the consequences of pathological damage to these areas observed in extinction patients; (2) TMS over these cortical sites might also reduce the magnitude of the Colavita effect, by interfering with the normal processes of biased competition, and/or by interfering with the late-arriving cross-modal feedback activity from visual to auditory cortex (see Section 27.6.1). It would, of course, also be very interesting in future research to investigate whether extinction patients exhibit a larger Colavita effect than normal participants in the traditional version of the Colavita task (cf. Costantini et al. 2007).
27.7 CONCLUSIONS AND QUESTIONS FOR FUTURE RESEARCH Research conducted over the past 35 years or so has shown the Colavita visual dominance effect to be a robust empirical phenomenon. However, traditional explanations of the effect simply cannot account for the range of experimental data that is currently available. In this article, we argue that the Colavita visual dominance effect may be accounted for in terms of Desimone and Duncan’s (1995; see also Duncan 1996) model of biased (or integrated) competition. According to the explanation outlined here, the Colavita visual dominance effect can be understood in terms of the cross-modal competition between the neural representations of simultaneously presented visual and auditory (or tactile) stimuli. Cognitive neuroscience studies would certainly help to further our understanding of the mechanisms underlying the Colavita effect. It would be particularly interesting, for example, to compare the pattern of brain activation on those trials in which participants fail to respond correctly to the nonvisual stimulus to the activation seen on those trials in which they respond appropriately
The Colavita Visual Dominance Effect
549
(cf. Fink et al. 2000; Golob et al. 2001; Sarri et al. 2006; ������������������������������������������� Schubert et al. 2006����������������������� ). Event-related potential studies could also help to determine just how early (or late, see Falkenstein et al. 1991; Quinlan 2000; Zahn et al. 1994) the processing of ignored and reported auditory (or tactile) stimuli differs (see Hohnsbein et al. 1991).
27.7.1 Modeling the Colavita Visual Dominance Effect There is also a considerable amount of interesting work to be done in terms of modeling the Colavita visual dominance effect. Cooper (1998) made a start on this more than a decade ago. He developed a computational model that was capable of simulating the pattern of participants’ RTs in the Colavita task. Cooper’s model consisted of separate modality-specific input channels feeding into a single “object representation network” (whose function involved activating specific response schemas— presumably equivalent to a target stimuli reaching the criterion for responding, as discussed earlier) in which the speed of each channel was dependent on the strength (i.e., weight) of the channel itself. By assuming that the visual channel was stronger than the auditory channel, the model was able to successfully account for the fact that although responses to auditory stimuli are faster than responses to visual stimuli in unimodal trials, the reverse pattern is typically found on bimodal target trials. The challenge for researchers in this area will be to try and develop models that are also capable of accounting for participants’ failure to respond to the nonvisual stimulus (i.e., the effect that has constituted the focus for the research discussed in this article; cf. Peers et al. 2005); such models might presumably include the assignment of different weights to visual and auditory cues, biases to preferentially respond to either visual or auditory stimuli, different gain/loss functions associated with responding, or failing to respond, to auditory and visual target stimuli, etc. It will be especially interesting here to examine whether the recent models of Bayesian multisensory integration (see Ernst 2005) that have proved so successful in accounting for many aspects of cross-modal perception, sensory dominance, and multisensory information processing, can also be used to account for the Colavita visual dominance effect.
27.7.2 Multisensory Facilitation versus Interference Finally, in closing, it is perhaps worth pausing to consider the Colavita effect in the context of so many other recent studies that have demonstrated the benefits of multisensory over unisensory stimulus presentation (e.g., in terms of speeding simple speeded detection responses; Nickerson 1973; Sinnett et al. 2008, Experiment 1; see also Calvert et al. 2004). To some, the existence of the Colavita effect constitutes a puzzling example of a situation in which multisensory stimulation appears to impair (rather than to facilitate) human performance. It is interesting to note here though that whether one observes benefits or costs after multisensory (as compared to unisensory) stimulation seems to depend largely on the specific requirements of the task faced by participants. For example, Sinnett et al. (2008; Experiment 2) reported the facilitation of simple speeded detection latencies on bimodal audiovisual trials (i.e., they observed a violation of the race model; Miller 1982, 1991) when their participants had to make the same simple speeded detection responses to auditory, visual, and audiovisual targets. By contrast, they observed an inhibitory effect when their participants had to respond to the targets in each modality by pressing a separate response key (i.e., the typical Colavita paradigm). However, this latter result is not really so surprising if one stops to consider the fact that in the Colavita task participants can really be thought of as performing two tasks at once: that is, in the traditional two-response version of the Colavita task, the participants perform both a speeded auditory target detection task as well as a speeded visual target detection task. Although on the majority of (unimodal) trials the participants only have to perform one task, on a minority of (bimodal) trials they have to perform both tasks at the same time (and it is on these
550
The Neural Bases of Multisensory Processes
trials that the Colavita effect occurs when the nonvisual stimulus is seemingly ignored).* By contrast, in the redundant target effect paradigm (see earlier), both stimuli are relevant to the same task (i.e., to making a simple speeded target detection response). Researchers have known for more than half a century that people find it difficult to perform two tasks at the same time (regardless of whether the target stimuli relevant to performing those tasks are presented in the same versus different sensory modalities (e.g., Pashler 1994; Spence 2008). One can therefore think of the Colavita paradigm in terms of a form of dual-task interference (resulting from modality-based biased competition at the response-selection level)—interference that appears to be intimately linked to the making of speeded responses to the target stimuli (however, see Koppen et al. 2009). More generally, it is important to stress that although multisensory integration may, under the appropriate conditions, give rise to improved perception/performance, the benefits may necessarily come at the cost of some loss of access to the component unimodal signals (cf. Soto-Faraco and Alsius 2007, 2009). In closing, it is perhaps worth highlighting the fact that the task-dependent nature of the consequences of multisensory integration that show up in studies related to the Colavita effect have now also been demonstrated in a number of different behavioral paradigms, in both humans (see Cappe et al. in press; Gondan and Fischer 2009; Sinnett et al. 2008; Spence et al. 2003) and monkeys (see Besle et al. 2009; Wang et al. 2008).
REFERENCES Battelli, L., A. Pascual-Leone, and P. Cavanagh. 2007. The ‘when’ pathway of the right parietal lobe. Trends in Cognitive Sciences 11: 204–210. Baylis, G. C., J. Driver, and R. D. Rafal. 1993. Visual extinction and stimulus repetition. Journal of Cognitive Neuroscience 5: 453–466. Baylis, G. C., S. L. Simon, L. L. Baylis, and C. Rorden. 2002. Visual extinction with double simultaneous stimulation: What is simultaneous? Neuropsychologia 40: 1027–1034. Bender, M. B. 1952. Disorders in perception. Springfield, IL: Charles Thomas. Besle, J., O. Bertrand, and M. H. Giard. 2009. Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex. Hearing Research 258(1–2): 143–151. Bolognini, N., I. Senna, A. Maravita, A. Pasqual-Leone, and L. B. Merabeth. 2010. Auditory enhancement of visual phosphene perception: The effect of temporal and spatial factors and of stimulus intensity. Neuroscience Letters 477: 109–114. Bonneh, Y. S., M. K. Belmonte, F. Pei, P. E. Iversen, T. Kenet, N. Akshoomoff, Y. Adini, H. J. Simon, C. I. Moore, J. F. Houde, and M. M. Merzenich. 2008. Cross-modal extinction in a boy with severely autistic behavior and high verbal intelligence. Cognitive Neuropsychology 25: 635–652. Bridgeman, B. 1990. The physiological basis of the act of perceiving. In Relationships between perception and action: Current approaches, ed. O. Neumann and W. Prinz, 21–42. Berlin: Springer. Brozzoli, C., M. L. Demattè, F. Pavani, F. Frassinetti, and A. Farnè. 2006. Neglect and extinction: Within and between sensory modalities. Restorative Neurology and Neuroscience 24: 217–232. Brugge, J. F., I. O. Volkov, P. C. Garell, R. A. Reale, and M. A. Howard 3rd. 2003. Functional connections between auditory cortex on Heschl’s gyrus and on the lateral superior temporal gyrus in humans. Journal of Neurophysiology 90: 3750–3763. Calvert, G. A., C. Spence, and B. E. Stein (eds.). 2004. The handbook of multisensory processes. Cambridge, MA: MIT Press. Cappe, C., and P. Barone, P. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience 22: 2886–2902.
* One slight complication here though relates to the fact that people typically start to couple multiple responses to different stimuli into response couplets under the appropriate experimental conditions (see Ulrich and Miller 2008). Thus, one could argue about whether participants’ responses on the bimodal target trials actually counts as a third single (rather than dual) task, but one that, in the two-response version of the Colavita task involves a bi-finger, rather than a unifingered response. When considered in this light, the interference of performance seen in the Colavita task does not seem quite so surprising.
The Colavita Visual Dominance Effect
551
Cappe, C., A. Morel, P. Barone, and E. M. Rouiller. 2009. The thalamocortical projection systems in primates: An anatomical support for multisensory and sensorimotor interplay. Cerebral Cortex 19: 2025–2037. Clavagnier, S., A. Falchier, and H. Kennedy. 2004. Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness. Cognitive, Affective, and Behavioural Neuroscience 4: 117–126. Colavita, F. B. 1974. Human sensory dominance. Perception and Psychophysics 16: 409–412. Colavita, F. B. 1982. Visual dominance and attention in space. Bulletin of the Psychonomic Society 19: 261–262. Colavita, F. B., R. Tomko, and D. Weisberg. 1976. Visual prepotency and eye orientation. Bulletin of the Psychonomic Society 8: 25–26. Colavita, F. B., and D. Weisberg. 1979. A further investigation of visual dominance. Perception and Psychophysics 25: 345–347. Cooper, R. 1998. Visual dominance and the control of action. In Proceedings of the 20th Annual Conference of the Cognitive Science Society, ed. M. A. Gernsbacher and S. J. Derry, 250–255. Mahwah, NJ: Lawrence Erlbaum Associates. Costantini, M., D. Bueti, M. Pazzaglia, and S. M. Aglioti. 2007. Temporal dynamics of visuo-tactile extinction within and between hemispaces. Neuropsychology 21: 242–250. Cowey, A., and P. Stoerig. 1991. The neurobiology of blindsight. Trends in the Neurosciences 14: 140–145. Critchley, M. 1953. Tactile thought, with special reference to the blind. Brain 76: 19–35. Crowder, R. G. 1968. Repetition effects in immediate memory when there are no repeated elements in the stimuli. Journal of Experimental Psychology 78: 605–609. De Reuck, T., and C. Spence. 2009. Attention and visual dominance. Unpublished manuscript. Desimone, R., and J. Duncan. 1995. Neural mechanisms of selective visual attention. Annual Review of Neuroscience 18: 193–222. Driver, J., and P. Vuilleumier. 2001. Perceptual awareness and its loss in unilateral neglect and extinction. Cognition 79: 39–88. Duncan, J. 1996. Cooperating brain systems in selective perception and action. In Attention and performance XVI: Information integration in perception and communication, ed. T. Inui and J. L. McClelland, 549– 578. Cambridge, MA: MIT Press. Egeth, H. E., and L. C. Sager. 1977. On the locus of visual dominance. Perception and Psychophysics 22: 77–86. Elcock, S., and C. Spence. 2009. Caffeine and the Colavita visual dominance effect. Unpublished manuscript. Ernst, M. 2005. A Bayesian view on multimodal cue integration. In Perception of the human body from the inside out, ed. G. Knoblich, I. Thornton, M. Grosejan, and M. Shiffrar, 105–131. New York: Oxford Univ. Press. Exner, S. 1875. Experimentelle Untersuchung der einfachsten psychischen Processe (Experimental study of the most simple psychological processes). Archiv für die gesammte Physiologie des menschens und der Thiere (Pflüger’s Archive) 11: 403–432. Falchier, A., S. Clavagnier, P. Barone, and H. Kennedy. 2003. Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience 22: 5749–5759. Falchier, A., C. E. Schroeder, T. A. Hackett, P. Lakatos, S. Nascimento-Silva, I. Ulbert, G. Karmos, and J. F. Smiley. 2010. Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. Cerebral Cortex 20: 1529–1538. Falkenstein, M., J. Hohnsbein, J. Hoormann, and L. Blanke. 1991. Effects of crossmodal divided attention on late ERP components: II. Error processing in choice reaction tasks. Electroencephalography and Clinical Neurophysiology 78: 447–455. Farnè, A., C. Brozzoli, E. Làdavas, and T. Ro. 2007. Investigating multisensory spatial cognition through the phenomenon of extinction. In Attention and performance XXII: Sensorimotor foundations of higher cognition, ed. P. Haggard, Y. Rossetti, and M. Kawato, 183–206. Oxford: Oxford Univ. Press. Felleman, D. J., and D. C. Van Essen. 1991. Distributed hierarchical processing in primate cerebral cortex. Cerebral Cortex 1: 1–47. Fink, G. R., J. Driver, C. Rorden, T. Baldeweg, and R. J. Dolan. 2000. Neural consequences of competing stimuli in both visual hemifields: A physiological basis for visual extinction. Annals of Neurology 47: 440–446. Foree, D. D., and V. M. J. LoLordo. 1973. Attention in the pigeon: Differential effects of food-getting versus shock-avoidance procedures. Journal of Comparative and Physiological Psychology 85: 551–558. Foxe, J. J., I. A. Morocz, M. M. Murray, B. A. Higgins, D. C. Javitt, and C. E. Schroeder. 2000. Multisensory auditory–somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Cognitive Brain Research 10: 77–83.
552
The Neural Bases of Multisensory Processes
Foxe, J. J., E. C. Strugstad, P. Sehatpour, S. Molholm, W. Pasieka, C. E., Schroeder, and M. E. McCourt. 2008. Parvocellular and magnocellular contributions to the initial generators of the visual evoked potential: High-density electrical mapping of the “C1” component. Brain Topography 21: 11–21. Foxe, J. J., G. R. Wylie, A. Martinez, C. E. Schroeder, D. C. Javitt, D. Guilfoyle, W. Ritter, and M. M. Murray. 2002. Auditory–somatosensory multisensory processing in auditory association cortex: An fMRI study. Journal of Neurophysiology 88: 540–543. Fu, K.-M. G., T. A. Johnston, A. S. Shah, L. Arnold, J. Smiley, T. A. Hackett, P. E. Garraghty, and C. E. Schroeder. 2003. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience 23: 7510–7515. Gallace, A., H. Z. Tan, and C. Spence. 2007. Multisensory numerosity judgments for visual and tactile stimuli. Perception and Psychophysics 69: 487–501. Ghazanfar, A. A., J. X. Maier, K. L. Hoffman, and N. K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in Rhesus monkey auditory cortex. Journal of Neuroscience 25: 5004–5012. Godey, B., D. Schwartz, J. B. de Graaf, P. Chauvel, and C. Liegeois-Chauvel. 2001. Neuromagnetic source localization of auditory evoked fields and intracerebral evoked potentials: A comparison of data in the same patients. Clinical Neurophysiology 112: 1850–1859. Golob, E. J., G. G. Miranda, J. K. Johnson, and A. Starr. 2001. Sensory cortical interactions in aging, mild cognitive impairment, and Alzheimer’s disease. Neurobiology of Aging 22: 755–763. Gondan, M., and V. Fischer. 2009. Serial, parallel, and coactive processing of double stimuli presented with onset asynchrony. Perception 38(Suppl.): 16. Gorea, A., and D. Sagi. 2000. Failure to handle more than one internal representation in visual detection tasks. Proceedings of the National Academy of Sciences of the United States of America 97: 12380–12384. Gorea, A., and D. Sagi, D. 2002. Natural extinction: A criterion shift phenomenon. Visual Cognition 9: 913–936. Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley. Gregg, L. W., and W. J. Brogden. 1952. The effect of simultaneous visual stimulation on absolute auditory sensitivity. Journal of Experimental Psychology 43: 179–186. Hackett, T. A., L. De La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C. E. Schroeder. 2007. Multisensory convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology 502: 924–952. Hahnloser, R., R. J. Douglas, M. Mahowald, and K. Hepp. 1999. Feedback interactions between neuronal pointers and maps for attentional processing. Nature Neuroscience 2: 746–752. Harris, J. A., C. Miniussi, I. M. Harris, and M. E. Diamond. 2002. Transient storage of a tactile memory trace in primary somatosensory cortex. Journal of Neuroscience 22: 8720–8725. Hartcher-O’Brien, J., A. Gallace, B. Krings, C. Koppen, and C. Spence. 2008. When vision ‘extinguishes’ touch in neurologically-normal people: Extending the Colavita visual dominance effect. Experimental Brain Research 186: 643–658. Hartcher-O’Brien, J., C. Levitan, and C. Spence. 2010. Out-of-touch: Does vision dominate over touch when it occurs off the body? Brain Research 1362: 48–55. Hecht, D., and M. Reiner. 2009. Sensory dominance in combinations of audio, visual and haptic stimuli. Experimental Brain Research 193: 307–314. Hefferline, R. F., and T. B. Perera. 1963. Proprioceptive discrimination of a covert operant without its observation by the subject. Science 139: 834–835. Hirsh, I. J., and C. E. Sherrick Jr. 1961. Perceived order in different sense modalities. Journal of Experimental Psychology 62: 423–432. Hohnsbein, J., and M. Falkenstein. 1991. Visual dominance: Asymmetries in the involuntary processing of visual and auditory distractors. In Channels in the visual nervous system: Neurophysiology, psychophysics and models, ed. B. Blum, 301–313. London: Freund Publishing House. Hohnsbein, J., M. Falkenstein, and J. Hoormann. 1991. Visual dominance is reflected in reaction times and event-related potentials (ERPs). In Channels in the visual nervous system: Neurophysiology, psychophysics and models, ed. B. Blum, 315–333. London: Freund Publishing House. Howard, M. A., I. O. Volkov, R. Mirsky, P. C. Garell, M. D. Noh, M. Granner, H. Damasio, M. Steinschneider, R. A. Reale, J. E. Hind, and J. F. Brugge. 2000. Auditory cortex on the human posterior superior temporal gyrus. Journal of Comparative Neurology 416: 79–92. Humphreys, G. W., C. Romani, A. Olson, M. J. Riddoch, and J. Duncan. 1995. Nonspatial extinction following lesions of the parietal lobe in man. Nature 372: 357–359. Inui, K., X. Wang, Y. Tamura, Y. Kaneoke, and R. Kakigi. 2004. Serial processing in the human somatosensory system. Cerebral Cortex 14: 851–857.
The Colavita Visual Dominance Effect
553
Jaśkowski, P. 1996. Simple reaction time and perception of temporal order: Dissociations and hypotheses. Perceptual and Motor Skills 82: 707–730. Jaśkowski, P. 1999. Reaction time and temporal-order judgment as measures of perceptual latency: The problem of dissociations. In Cognitive contributions to the perception of spatial and temporal events, ed. G. Aschersleben, T. Bachmann, and J. Műsseler, 265–282. North-Holland: Elsevier Science. Jaśkowski, P., F. Jaroszyk, and D. Hojan-Jesierska. 1990. Temporal-order judgments and reaction time for stimuli of different modalities. Psychological Research 52: 35–38. Johnson, T. L., and K. L. Shapiro. 1989. Attention to auditory and peripheral visual stimuli: Effects of arousal and predictability. Acta Psychologica 72: 233–245. Kayser, C., C. I. Petkov, and N. K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18: 1560–1574. Koppen, C., A. Alsius, and C. Spence. 2008. Semantic congruency and the Colavita visual dominance effect. Experimental Brain Research 184: 533–546. Koppen, C., C. Levitan, and C. Spence. 2009. A signal detection study of the Colavita effect. Experimental Brain Research 196: 353–360. Koppen, C., and C. Spence. 2007a. Seeing the light: Exploring the Colavita visual dominance effect. Experimental Brain Research 180: 737–754. Koppen, C., and C. Spence. 2007b. Audiovisual asynchrony modulates the Colavita visual dominance effect. Brain Research 1186: 224–232. Koppen, C., and C. Spence. 2007c. Spatial coincidence modulates the Colavita visual dominance effect. Neuroscience Letters 417: 107–111. Koppen, C., and C. Spence. 2007d. Assessing the role of stimulus probability on the Colavita visual dominance effect. Neuroscience Letters 418: 266–271. Kristofferson, A. B. 1965. Attention in time discrimination and reaction time. NASA Contractors Report 194. Washington, D.C.: Office of Technical Services, U.S. Department of Commerce. Lamme, V. A. F. 2001. Blindsight: The role of feedforward and feedback corticocortical connections. Acta Psychologica 107: 209–228. Lamme, V. A. F., H. Supèr, R. Landman, P. R. Roelfsema, and H. Spekreijse. 2000. The role of primary visual cortex (V1) in visual awareness. Vision Research 40: 1507–1521. Liegeois-Chauvel, C., A. Musolino, J. M. Badier, P. Marquis, and P. Chauvel. 1994. Evoked potentials recorded from the auditory cortex in man: Evaluation and topography of the middle latency components. Electroencephalography and Clinical Neuroscience 92: 204–214. LoLordo, V. M. 1979. Selective associations. In Mechanisms of learning and motivation: A memorial to Jerzy Konorsky, ed. A. Dickinson and R. A. Boakes, 367–399. Hillsdale, NJ: Erlbaum. LoLordo, V. M., and D. R. Furrow. 1976. Control by the auditory or the visual element of a compound discriminative stimulus: Effects of feedback. Journal of the Experimental Analysis of Behavior 25: 251–256. Lu, Z.-L., S. J. Williamson, and L. Kaufman. 1992. Behavioral lifetime of human auditory sensory memory predicted by physiological measures. Science 258: 1669–1670. Lucey, T., and C. Spence. 2009. Visual dominance. Unpublished manuscript. Macaluso, E., and J. Driver. 2005. Multisensory spatial interactions: A window onto functional integration in the human brain. Trends in Neurosciences 28: 264–271. Macknik, S. L. 2009. The role of feedback in visual attention and awareness. Perception 38(Suppl.): 162. Macknik, S., and S. Martinez-Conde. 2007. The role of feedback in visual masking and visual processing. Advances in Cognitive Psychology 3: 125–152. Macknik, S., and S. Martinez-Conde. In press. The role of feedback in visual attention and awareness. In The new cognitive neurosciences, ed. M. S. A. Gazzaniga, 1163–1177. Cambridge, MA: MIT Press. Manly, T., I. H. Robertson, M. Galloway, and K. Hawkins. 1999. The absent mind: Further investigations of sustained attention to response. Neuropsychologia 37: 661–670. Marks, L. E., E. Ben-Artzi, and S. Lakatos. 2003. Cross-modal interactions in auditory and visual discrimination. International Journal of Psychophysiology 50: 125–145. Martuzzi, R., M. M. Murray, C. M. Michel, J. P. Thiran, P. P. Maeder, S. Clarke, and R. A. Meuli. 2007. Multisensory interactions within human primary cortices revealed by BOLD dynamics. Cerebral Cortex 17: 1672–1679. McDonald, J. J., W. A. Teder-Sälejärvi, F. Di Russo, and S. A. Hillyard. 2005. Neural basis of auditory-induced shifts in visual time-order perception. Nature Neuroscience 8: 1197–1202. Meltzer, D., and M. A. Masaki. 1973. Measures of stimulus control and stimulus dominance. Bulletin of the Psychonomic Society 1: 28–30.
554
The Neural Bases of Multisensory Processes
Miller, J. O. 1982. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology 14: 247–279. Miller, J. O. 1986. Time course of coactivation in bimodal divided attention. Perception and Psychophysics 40: 331–343. Miller, J. O. 1991. Channel interaction and the redundant targets effect in bimodal divided attention. Journal of Experimental Psychology: Human Perception and Performance 17: 160–169. Miller, J., R. Ulrich, and B. Rolke. 2009. On the optimality of serial and parallel processing in the psychological refractory period paradigm: Effects of the distribution of stimulus onset asynchronies. Cognitive Psychology 58: 273–310. Miyake, S., S. Taniguchi, and K. Tsuji. 1986. Effects of light stimulus upon simple reaction time and EP latency to the click presented with different SOA. Japanese Psychological Research 28: 1–10. Murray, M. M., S. Molholm, C. M. Michel, D. J. Heslenfeld, W. Ritter, D. C. Javitt, C. E. Schroeder, C. E., and J. J. Foxe. 2005. Grabbing your ear: Auditory–somatosensory multisensory interactions in early sensory cortices are not constrained by stimulus alignment. Cerebral Cortex 15: 963–974. Müsseler, J., and B. Hommel. 1997a. Blindness to response-compatible stimuli. Journal of Experimental Psychology: Human Perception and Performance 23: 861–872. Müsseler, J., and B. Hommel. 1997b. Detecting and identifying response-compatible stimuli. Psychonomic Bulletin and Review 4: 125–129. Neumann, O. 1990. Direct parameter specification and the concept of perception. Psychological Research 52: 207–215. Nickerson, R. 1973. Intersensory facilitation of reaction time: Energy summation or preparation enhancement? Psychological Review 80: 489–509. Noesselt, T., J. W. Rieger, M. A. Schoenfeld, M. Kanowski, H. Hinrichs, H.-J. Heinze, and J. Driver. 2007. Audiovisual temporal correspondence modulates human multisensory temporal sulcus plus primary sensory cortices. Journal of Neuroscience 27: 11431–11441. Occelli, V., J. Hartcher O’Brien, C. Spence, and M. Zampini. 2010. Assessing the audiotactile Colavita effect in near and rear space. Experimental Brain Research 203: 517–532. O'Connor, N., and B. Hermelin. 1963. Sensory dominance in autistic children and subnormal controls. Perceptual and Motor Skills 16: 920. Odgaard, E. C., Y. Arieh, and L. E. Marks. 2003. Cross-modal enhancement of perceived brightness: Sensory interaction versus response bias. Perception and Psychophysics 65: 123–132. Odgaard, E. C., Y. Arieh, and L. E. Marks. 2004. Brighter noise: Sensory enhancement of perceived loudness by concurrent visual stimulation. Cognitive, Affective, and Behavioral Neuroscience 4: 127–132. Oray, S., Z. L. Lu, and M. E. Dawson. 2002. Modification of sudden onset auditory ERP by involuntary attention to visual stimuli. International Journal of Psychophysiology 43: 213–224. Osborn, W. C., R. W. Sheldon, and R. A. Baker. 1963. Vigilance performance under conditions of redundant and nonredundant signal presentation. Journal of Applied Psychology 47: 130–134. Partan, S., and P. Marler. 1999. Communication goes multimodal. Science 283: 1272–1273. Pascual-Leone, A., and V. Walsh. 2001. Fast backprojections from the motion to the primary visual area necessary for visual awareness. Science 292: 510–512. Pashler, H. 1994. Dual-task interference in simple tasks: Data and theory. Psychological Bulletin 116: 220–244. Peers, P. V., C. J. H. Ludwig, C. Rorden, R. Cusack, C. Bonfiglioli, C. Bundesen, J. Driver, N. Antoun, and J. Duncan. 2005. Attentional functions of parietal and frontal cortex. Cerebral Cortex 15: 1469–1484. Posner, M. I., M. J. Nissen, and R. M. Klein. 1976. Visual dominance: An information-processing account of its origins and significance. Psychological Review 83: 157–171. Quinlan, P. 2000. The “late” locus of visual dominance. Abstracts of the Psychonomic Society 5: 64. Randich, A., R. M. Klein, and V. M. LoLordo. 1978. Visual dominance in the pigeon. Journal of the Experimental Analysis of Behavior 30: 129–137. Rapp, B., and S. K. Hendel. 2003. Principles of cross-modal competition: Evidence from deficits of attention. Psychonomic Bulletin and Review 10: 210–219. Ricci, R., and A. Chatterjee. 2004. Sensory and response contributions to visual awareness in extinction. Experimental Brain Research 157: 85–93. Rizzolatti, G., and A. Berti. 1990. Neglect as a neural representation deficit. Revue Neurologique (Paris) 146: 626–634. Rizzolatti, G., L. Fadiga, L. Fogassi, and V. Gallese. 1997. The space around us. Science 277: 190–191. Rockland, K. S., and H. Ojima. 2003. Multisensory convergence in calcarine visual areas in macaque monkey. International Journal of Psychophysiology 50: 19–26.
The Colavita Visual Dominance Effect
555
Rodway, P. 2005. The modality shift effect and the effectiveness of warning signals in different modalities. Acta Psychologica 120: 199–226. Romei, V., M. M. Murray, C. Cappe, and G. Thut. 2009. Preperceptual and stimulus-selective enhancement of low-level human visual cortex excitability by sounds. Current Biology 19: 1799–1805. Romei, V., M. M. Murray, L. B. Merabet, and G. Thut. 2007. Occipital transcranial magnetic stimulation has opposing effects on visual and auditory stimulus detection: Implications for multisensory interactions. Journal of Neuroscience 27: 11465–11472. Rorden, C., J. B. Mattingley, H.-O. Karnath, and J. Driver. 1997. Visual extinction and prior entry: Impaired perception of temporal order with intact motion perception after unilateral parietal damage. Neuro psychologia 35: 421–433. Rutschmann, J., and R. Link. 1964. Perception of temporal order of stimuli differing in sense mode and simple reaction time. Perceptual and Motor Skills 18: 345–352. Sarri, M., F. Blankenburg, and J. Driver. 2006. Neural correlates of crossmodal visual–tactile extinction and of tactile awareness revealed by fMRI in a right-hemisphere stroke patient. Neuropsychologia 44: 2398–2410. Schroeder, C. E., and J. J. Foxe. 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Research: Cognitive Brain Research 14: 187–198. Schroeder, C. E., and J. J. Foxe. 2004. Multisensory convergence in early cortical processing. In The handbook of multisensory processes, ed. G. A. Calvert, C. Spence, and B. E. Stein, 295–309. Cambridge, MA: MIT Press. Schroeder, C. E., R. W. Lindsley, C. Specht, A. Marcovici, J. F. Smiley, and D. C. Javitt. 2001. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85: 1322–1327. Schroeder, C. E., A. D. Mehta, and S. J. Givre. 1998. A spatiotemporal profile of visual system activation revealed by current source density analysis in the awake macaque. Cerebral Cortex 8: 575–592. Schroeder, C. E., S. Molholm, P. Lakatos, W. Ritter, and J. J. Foxe. 2004. Human simian correspondence in the early cortical processing of multisensory cues. Cognitive Processing 5: 140–151. Schubert, R., F. Blankenberg, S. Lemm, A. Villringer, and G. Curio. 2006. Now you feel it, now you don’t: ERP correlates of somatosensory awareness. Psychophysiology 43: 31–40. Sereno, M. I., A. M. Dale, J. B. Reppas, K. K., Kwong, J. W. Belliveau, T. J. Brady, B. R. Rosen, and R. B. H. Tootell. 1995. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 268: 889–893. Shapiro, K. L., B. Egerman, and R. M. Klein. 1984. Effects of arousal on human visual dominance. Perception and Psychophysics 35: 547–552. Shapiro, K. L., W. J. Jacobs, and V. M. LoLordo. 1980. Stimulus–reinforcer ���������������������������������������������������� interactions in Pavlovian conditioning of pigeons: Implications for selective associations. Animal Learning and Behavior 8: 586–594. Shapiro, K. L., and T. L. Johnson. 1987. Effects of arousal on attention to central and peripheral visual stimuli. Acta Psychologica 66: 157–172. Sinnett, S., S. Soto-Faraco, and C. Spence. 2008. The co-occurrence of multisensory competition and facilitation. Acta Psychologica 128: 153–161. Sinnett, S., C. Spence, and S. Soto-Faraco. 2007. Visual dominance and attention: The Colavita effect revisited. Perception and Psychophysics 69: 673–686. Smiley, J., T. A. Hackett, I. Ulbert, G. Karmos, P. Lakatos, D. C. Javitt, and C. E. Schroeder. 2007. Multisensory convergence in auditory cortex: I. Cortical connections of the caudal superior temporal plane in Macaque monkey. Journal of Comparative Neurology 502: 894–923. Smith, A. 2002. Effects of caffeine on human behavior. Food Chemistry and Toxicology 40: 1243–1255. Smith, A. P., A. M. Kendrick, and A. L. Maben. 1992. Effects of breakfast and caffeine on performance and mood in the late morning and after lunch. Neuropsychobiology 26: 198–204. Smith, W. F. 1933. The relative quickness of visual and auditory perception. Journal of Experimental Psychology 16: 239–257. Soto-Faraco, S., and A. Alsius. 2007. Conscious access to the unisensory components of a cross-modal illusion. Neuroreport 18: 347–350. Soto-Faraco, S., and A. Alsius. 2009. Deconstructing the McGurk–MacDonald illusion. Journal of Experimental Psychology: Human Perception and Performance 35: 580–587. Spence, C. 2008. Cognitive neuroscience: Searching for the bottleneck in the brain. Current Biology 18: R965–R968. Spence, C. 2010. Prior entry: Attention and temporal perception. In Attention and time, ed. A. C. Nobre and J. T. Coull, 89–104. Oxford: Oxford Univ. Press. Spence, C., R. Baddeley, M. Zampini, R. James, and D. I. Shore. 2003. Crossmodal temporal order judgments: When two locations are better than one. Perception and Psychophysics 65: 318–328.
556
The Neural Bases of Multisensory Processes
Spence, C., M. E. R. Nicholls, and J. Driver. 2001a. The cost of expecting events in the wrong sensory modality. Perception and Psychophysics 63: 330–336. Spence, C., D. I. Shore, and R. M. Klein. 2001b. Multisensory prior entry. Journal of Experimental Psychology: General 130: 799–832. Spence, C., and S. Soto-Faraco. 2009. Auditory perception: Interactions with vision. In Auditory perception, ed. C. Plack, 271–296. Oxford: Oxford Univ. Press. Sperdin, H. F., C. Cappe, J. J. Foxe, and M. M. Murray. 2009. Early, low-level auditory–somatosensory multisensory interactions impact reaction time speed. Frontiers in Integrative Neuroscience 3(2): 1–10. Stein, B. E., N. London, L. K. Wilkinson, and D. P. Price. 1996. Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience 8: 497–506. Stone, J. V., N. M. Hunkin, J. Porrill, R. Wood, V. Keeler, M. Beanland, M. Port, and N. R. Porter. 2001. When is now? Perception of simultaneity. Proceedings of the Royal Society (B) 268: 31–38. Tajadura-Jiménez, A., N. Kitagawa, A. Väljamäe, M. Zampini, M. M. Murray, and C. Spence. 2009. Auditory– somatosensory multisensory interactions are spatially modulated by stimulated body surface and acoustic spectra. Neuropsychologia 47: 195–203. Taylor, J. L., and D. I. McCloskey. 1996. Selection of motor responses on the basis of unperceived stimuli. Experimental Brain Research 110: 62–66. Thompson, R. F., J. F. Voss, and W. J. Brogden. 1958. Effect of brightness of simultaneous visual stimulation on absolute auditory sensitivity. Journal of Experimental Psychology 55: 45–50. Titchener, E. B. 1908. Lectures on the elementary psychology of feeling and attention. New York: Macmillan. Turatto, M., F. Benso, G. Galfano, L. Gamberini, and C. Umilta. 2002. Non-spatial attentional shifts between audition and vision. Journal of Experimental Psychology: Human Perception and Performance 28: 628–639. Uetake, K., and Y. Kudo. 1994. Visual dominance over hearing in feed acquisition procedure of cattle. Applied Animal Behaviour Science 42: 1–9. Ulrich, R., and J. Miller. 2008. Response grouping in the psychological refractory period (PRP) paradigm: Models and contamination effects. Cognitive Psychology 57: 75–121. Uusitalo, M. A., S. J. Williamson, and M. T. Seppä. 1996. Dynamical organisation of the human visual system revealed by lifetimes of activation traces. Neuroscience Letters 213: 149–152. Van Damme, S., G. Crombez, and C. Spence. 2009a. Is the visual dominance effect modulated by the threat value of visual and auditory stimuli? Experimental Brain Research 193: 197–204. Van Damme, S., A. Gallace, C. Spence, and G. L. Moseley. 2009b. Does the sight of physical threat induce a tactile processing bias? Modality-specific attentional facilitation induced by viewing threatening pictures. Brain Research 1253: 100–106. Vibell, J., C. Klinge, M. Zampini, C. Spence, and A. C. Nobre. 2007. Temporal order is coded temporally in the brain: Early ERP latency shifts underlying prior entry in a crossmodal temporal order judgment task. Journal of Cognitive Neuroscience 19: 109–120. Wang, Y., S. Celebrini, Y. Trotter, and P. Barone. 2008. Visuo-auditory interactions in the primary visual cortex of the behaving monkey: Electrophysiological evidence. BMC Neuroscience 9: 79. Wilcoxin, H. C., W. B. Dragoin, and P. A. Kral. 1971. Illness-induced aversions in rat and quail: Relative salience of visual and gustatory cues. Science 171: 826–828. Zahn, T. P., D. Pickar, and R. J. Haier. 1994. Effects of clozapine, fluphenazine, and placebo on reaction time measures of attention and sensory dominance in schizophrenia. Schizophrenia Research 13: 133–144. Zwyghuizen-Doorenbos, A., T. A. Roehrs, L. Lipschutz, V. Timms, and T. Roth. 1990. Effects of caffeine on alertness. Psychopharmacology 100: 36–39. Zylberberg, A., S. Dehaene, G. B. Mindlin, and M. Sigman. 2009. Neurophysiological bases of exponential sensory decay and top-down memory retrieval: A model. Frontiers in Computational Neuroscience 3(4): 1–16.
28
The Body in a Multisensory World Tobias Heed and Brigitte Röder
CONTENTS 28.1 Introduction........................................................................................................................... 557 28.2 Construction of Body Schema from Multisensory Information............................................ 558 28.2.1 Representing Which Parts Make Up the Own Body................................................. 558 28.2.2 Multisensory Integration for Limb and Body Ownership......................................... 559 28.2.3 Extending the Body: Tool Use................................................................................... 561 28.2.4 Rapid Plasticity of Body Shape................................................................................. 562 28.2.5 Movement and Posture Information in the Brain...................................................... 563 28.2.6 The Body Schema: A Distributed versus Holistic Representation............................564 28.2.7 Interim Summary...................................................................................................... 565 28.3 The Body as a Modulator for Multisensory Processing........................................................ 565 28.3.1 Recalibration of Sensory Signals and Optimal Integration....................................... 565 28.3.2 Body Schema and Peripersonal Space....................................................................... 566 28.3.3 Peripersonal Space around Different Parts of the Body............................................ 568 28.3.4 Across-Limb Effects in Spatial Remapping of Touch............................................... 569 28.3.5 Is the External Reference Frame a Visual One?........................................................ 570 28.3.6 Investigating the Body Schema and Reference Frames with Electrophysiology...... 572 28.3.7 Summary................................................................................................................... 574 28.4 Conclusion............................................................................................................................. 574 References....................................................................................................................................... 575
28.1 INTRODUCTION It is our body through which we interact with the environment. We have a very clear sense about who we are in the sense that we know where our body ends, and what body parts we own. Above that, we usually are (or can easily become) aware of where each of our body parts is currently located, and most of our movements seem effortless, whether performed under conscious control or not. When we think about ourselves, we normally perceive our body as a stable entity. For example, when we go to bed, we do not expect that our body will be different when we wake up the next morning. Quite contrary to such introspective assessment, the brain has been found to be surprisingly flexible in updating its representation of the body. As an illustration, consider what happens when an arm or leg becomes numb after you have sat or slept in an unsuitable position for too long. Touching the numb foot feels very strange, as if you touch someone else’s foot. When you lift a numb hand with the other hand, it feels far too heavy. Somehow, it feels as if the limb does not belong to the own body. Neuroscientists have long been fascinated with how the brain represents the body. It is usually assumed that there are several different types of body representations, but there is no consensus about what these representations are, or how many there may be (de Vignemont 2010; Gallagher 1986; Berlucchi and Aglioti 2010; see also Dijkerman and de Haan 2007 and commentaries thereof). 557
558
The Neural Bases of Multisensory Processes
The most common distinction is that between a body schema and a body image. The body schema is usually defined as a continuously updated sensorimotor map of the body that is important in the context of action, informing the brain about what parts belongs to the body, and where those parts are currently located (de Vignemont 2010). In contrast, the term body image is usually used to refer to perceptual, emotional, or conceptual knowledge about the body. However, other taxonomies have been proposed (see Berlucchi and Aglioti 2010; de Vignemont 2010), and the use of the terms body schema and body image has been inconsistent. This chapter will not present an exhaustive debate about these definitions, and we refer the interested reader to the articles cited above for detailed discussion; in this article, we will use the term body schema with the sensorimotor definition introduced above, referring to both aspects of what parts make up the body, and where they are located. The focus of this chapter will be on the importance of multisensory processing for representing the body, as well as on the role of body representations for multisensory processing. On one hand, one can investigate how the body schema is constructed and represented in the brain, and Section 28.2 will illustrate that the body schema emerges from the interaction of multiple sensory modalities. For this very reason, one can, on the other hand, ask how multisensory interactions between the senses are influenced by the fact that the brain commands a body. Section 28.3, therefore, will present research on how the body schema is important in multisensory interactions, especially for spatial processing.
28.2 CONSTRUCTION OF BODY SCHEMA FROM MULTISENSORY INFORMATION 28.2.1 Representing Which Parts Make Up the Own Body There is some evidence suggesting that an inventory of the normally existing body parts is genetically predetermined. Just like amputees, people born without arms and/or legs can have vivid sensations of the missing limbs, including the feeling of using them for gestural movements during conversation and for finger-aided counting (Ramachandran 1993; Ramachandran and Hirstein 1998; Brugger et al. 2000; Saadah and Melzack 1994; see also Lacroix et al. 1992). This phenomenon has therefore been termed phantom limbs. Whereas the existence of a phantom limb in amputees could be explained with the persistence of experience-induced representations of this limb after the amputation, such an explanation does not hold for congenital phantom limbs. In one person with congenital phantom limbs, transcranial magnetic stimulation (TMS) over primary motor, premotor, parietal, and primary sensory cortex evoked sensations and movements of the congenital phantom limbs (Brugger et al. 2000). This suggests that the information about which parts make up the own body is distributed across different areas of the brain. There are not many reports of congenital phantoms in the literature, and so the phenomenon may be rare. However, the experience of phantom limbs after the loss of a limb, for example, due to amputation, is very common. It has been reported (Simmel 1962) that the probability of perceiving phantom limbs gradually increases with the age of limb loss from very young (2 of 10 children with amputations below the age of 2) to the age of 9 years and older (all of 60 cases), suggesting that developmental factors within this age interval may be crucial for the construction of the body schema (and, in turn, for the occurrence of phantom limbs). The term “phantom limb” refers to limbs that would normally be present in a healthy person. In contrast, a striking impairment after brain damage, for example, to the basal ganglia (Halligan et al. 1993), the thalamus (Bakheit and Roundhill 2005), or the frontal lobe (McGonigle et al. 2002), is the report of one or more supernumerary limbs in addition to the normal limbs. The occurrence of a supernumerary limb is usually associated with the paralysis of the corresponding real limb, which is also attributable to the brain lesion. The supernumerary limb is vividly felt, and patients confabulate to rationalize why the additional limb is present (e.g., it was attached by the clinical staff during
The Body in a Multisensory World
559
sleep), and why it is not visible (e.g., it was lost 20 years ago) (Halligan et al. 1993; Sellal et al. 1996; Bakheit and Roundhill 2005). It has therefore been suggested that the subjective presence of a supernumerary limb may result from cognitive conflicts between different pieces of sensory information (e.g., visual vs. proprioceptive) or fluctuations in the awareness about the paralysis, which in turn may be resolved by assuming the existence of two (or more) limbs rather than one (Halligan et al. 1993; Ramachandran and Hirstein 1998). Whereas a patient with a phantom or a supernumerary limb perceives more limbs than he actually owns, some brain lesions result in the opposite phenomenon of patients denying the ownership of an existing limb. This impairment, termed somatoparaphrenia, has been reported to occur after temporo-parietal (Halligan et al. 1995) or thalamic-temporo-parietal damage (Daprati et al. 2000)—notably all involving the parietal lobe, which is thought to mediate multisensory integration for motor planning. Somatoparaphrenia is usually observed in conjunction with hemineglect and limb paralysis (Cutting 1978; Halligan et al. 1995; Daprati et al. 2000) and has been suggested to reflect a disorder of body awareness due to the abnormal sensorimotor feedback for the (paralyzed) limb after brain damage (Daprati et al. 2000). Lesions can also affect the representation of the body and self as a whole, rather than just affecting single body parts. These experiences have been categorized into three distinct phenomena (Blanke and Metzinger 2009). During out-of-body experiences, a person feels to be located outside of her real body and to look at herself, often from above. In contrast, during an autoscopic illusion, the person localizes herself in her real body, but sees an illusory body in extrapersonal space (e.g., in front of herself). Finally, during heautoscopy, a person sees a second body and feels to be located in both, either at the same time, or in sometimes rapid alternation. In patients, such illusions have been suggested to be related to damage to the temporo-parietal junction (TPJ) (Blanke et al. 2004), and an out-of-body experience was elicited by stimulation of an electrode implanted over the TPJ for presurgical assessment (Blanke et al. 2002). Interestingly, whole body illusions can coincide with erroneous visual perceptions about body parts, for example, an impression of limb shortening or illusory flexion of an arm. It has therefore been suggested that whole body illusions are directly related to the body schema, resulting from a failure to integrate multisensory (e.g., vestibular and visual) information about the body and its parts, similar to the proposed causes of supernumerary limbs (Blanke et al. 2004). In sum, many brain regions are involved in representing the configuration of the body; some aspects of these representations seem to be innate, and are probably refined during early development. Damage to some of the involved brain regions can lead to striking modifications of the perceived body configuration, as well as to illusions about the whole body.
28.2.2 Multisensory Integration for Limb and Body Ownership Although the previous section suggests that some aspects of the body schema may be hardwired, the example of the sleeping foot with which we started this chapter suggests that the body schema is a more flexible representation. Such fast changes of the body’s representation have been demonstrated with an ingenious experimental approach: to mislead the brain as to the status of ownership of a new object and to provoke its inclusion into the body schema. This trick can be achieved by using rubber hands: a rubber hand is placed in front of a participant in such a way that it could belong to her own body, and it is then stroked in parallel with the participant’s real, hidden hand. Most participants report that they feel the stroking at the location of the rubber hand, and that they feel as if the rubber hand were their own (Botvinick and Cohen 1998). One of the main determinants for this illusion to arise is the synchrony of the visual and tactile stimulation. In other words, the touches felt at the own hand and those seen to be delivered to the rubber hand must match. It might in fact be possible to trick the brain into integrating other than handlike objects into its body schema using this synchronous stroking technique: when the experimenter stroked not a rubber hand but a shoe placed on the table (Ramachandran and Hirstein 1998) or even the table surface (Armel and
560
The Neural Bases of Multisensory Processes
Ramachandran 2003), participants reported that they “felt” the touch delivered to their real hand to originate from the shoe and the table. Similarly, early event-related potentials (ERPs) in response to tactile stimuli were enhanced after synchronous stimulation of a rubber hand as well as of a nonhand object (Press et al. 2008). Even more surprisingly, participants in Armel and Ramachandran’s study displayed signs of distress and an increased skin conductance response when the shoe was hit with a hammer, or a band-aid was ripped off the table surface. Similar results, that is, signs of distress, were also observed when the needle of a syringe was stabbed into the rubber hand, and these behavioral responses were associated with brain activity in anxiety-related brain areas (Ehrsson et al. 2007). Thus, the mere synchrony of visual events at an object with the tactile sensations felt at the hand seem to have led to some form of integration of the objects (the rubber hand, the shoe, or the table surface) into the body schema, resulting in physiological and emotional responses usually reserved for the real body. It is important to understand that participants in the rubber hand illusion (RHI) do not feel additional limbs; rather, they feel a displacement of their own limb, which is reflected behaviorally by reaching errors after the illusion has manifested itself (Botvinick and Cohen 1998; Holmes et al. 2006; but see Kammers et al. 2009a, 2009c, and discussion in de Vignemont 2010), and by an adjustment of grip aperture when finger posture has been manipulated during the RHI (Kammers et al. 2009b). Thus, a new object (the rubber hand) is integrated into the body schema, but is interpreted as an already existing part (the own but hidden arm). The subjective feeling of ownership of a rubber hand has also been investigated using functional magnetic resonance imaging (fMRI). Activity emerged in the ventral premotor cortex and (although, statistically, with only a tendency for significance) in the superior parietal lobule (SPL) (Ehrsson et al. 2004). In the monkey, both of these areas respond to peripersonal stimuli around the hand and head. Activity related to multisensory integration—synchrony of tactile and visual events, as well as the alignment of visual and proprioceptive information about arm posture—was observed in the SPL, presumably in the human homologue of an area in the monkey concerned with arm reaching [the medial intraparietal (MIP) area]. Before the onset of the illusion, that is, during its buildup, activity was seen in the intraparietal sulcus (IPS), in the dorsal premotor cortex (PMd), and in the supplementary motor area (SMA), which are all thought to be part of an arm-reaching circuit in both monkeys and humans. Because the rubber arm is interpreted as one’s own arm, the illusion may be based on a recalibration of perceived limb position, mediated parietally, according to the visual information about the rubber arm (Ehrsson et al. 2004; Kammers et al. 2009c). As such, the integration of current multisensory information about the alleged position of the hand must be integrated with long-term knowledge about body structure (i.e., the fact that there is a hand to be located) (de Vignemont 2010; Tsakiris 2010). Yet, as noted earlier, an integration of a non-body-like object also seems possible in some cases. Besides the illusory integration of a shoe or the table surface due to synchronous stimulation, an association of objects with the body has been reported in a clinical case of a brain-lesioned patient who denied ownership of her arm and hand; when she wore the wedding ring on that hand, she did not recognize it as her own. When it was taken off the neglected hand, the patient immediately recognized the ring as her own (Aglioti et al. 1996). Such findings might therefore indicate an involvement of higher cognitive processes in the construction of the body schema. It was mentioned in the previous section that brain damage can lead to misinterpretations of single limbs (say, an arm or a leg), but also of the whole body. Similarly, the rubber hand paradigm has been modified to study also the processes involved in the perception of the body as a whole and of the feeling of self. Participants viewed a video image of themselves filmed from the back (Ehrsson 2007) or a virtual reality character at some distance in front of them (Lenggenhager et al. 2007). They could see the back of the figure in front of them being stroked in synchrony with feeling their own back being stroked. This manipulation resulted in the feeling of the self being located outside the own body and of looking at oneself (Ehrsson 2007). Furthermore, when participants were displaced from their viewing position and asked to walk to the location at which they felt “themselves” during the illusion, they placed themselves in between the real and the virtual body’s
The Body in a Multisensory World
561
locations (Lenggenhager et al. 2007). Although both rubber hand and whole body illusions use the same kind of multisensory manipulation, the two phenomena have been proposed to tap into different aspects of body processing (Blanke and Metzinger 2009): whereas the rubber hand illusion leads to the attribution of an object into the body schema, the whole body illusion manipulates the location of a global “self” (Blanke and Metzinger 2009; Metzinger 2009), and accordingly the firstperson perspective (Ehrsson 2007). This distinction notwithstanding, both illusions convincingly demonstrate how the representation of the body in the brain is determined by the integration of multisensory information. To sum up, our brain uses the synchrony of multisensory (visual and tactile) stimulation to determine body posture. Presumably, because touch is necessarily located on the body, such synchronous visuo-tactile stimulation can lead to illusions about external objects to belong to our body, and even to mislocalization of the location of the whole body. However, the illusion is not of a new body part having been added, but rather of a non-body object taking the place of an already existing body part (or, in the case of the whole body illusion, the video image indicating our body’s location).
28.2.3 Extending the Body: Tool Use At first sight, the flexibility of the body schema demonstrated with the rubber hand illusion and the whole body illusion may seem impedimental rather than useful. However, a very common situation in which such integration may be very useful is the use of tools. Humans, and to some extent also monkeys, use tools to complement and extend the abilities and capacity of their own body parts to act upon their environment. In this situation, visual events at the tip of the tool (or, more generally, at the part of the tool used to manipulate the environment) coincides with tactile information received at the hand—a constellation that is very similar to the synchronous stroking of a non-body object and a person’s hand. Indeed, some neurons in the intraparietal part of area PE (PEip) of monkeys respond to tactile stimuli to the hand, as well as to visual stimuli around the tactile location (see also Section 28.3.2). When the monkey was trained to use a tool to retrieve otherwise unreachable food, the visual receptive fields (RFs), which encompassed only the hand when no tool was used, now encompassed both the hand and the tool (Iriki et al. 1996). In a similar manner, when the monkey learned to observe his hand in a monitor rather than seeing it directly, the visual RFs now encompassed the monitor hand (Obayashi et al. 2000). These studies have received some methodological criticism (Holmes and Spence 2004), but their results are often interpreted as some form of integration of the tool into the monkey’s body schema. Neurons with such RF characteristics might therefore be involved in the mediation of rapid body schema modulations illustrated by the rubber hand illusion in humans. Although these monkey findings are an important step toward understanding tool use and its relation to the body schema, it is important to note that the mechanisms discovered in the IPS cannot explain all phenomena involved either in tool use or in ownership illusions. For example, it has been pointed out that a tool does not usually feel as an own body part, even when it is frequently used, as for example, a fork (Botvinick 2004). Such true ownership feelings may rather be restricted to body part–shaped objects such as a prosthesis or a rubber hand, given that they are located in an anatomically plausible location (Graziano and Gandhi 2000; Pavani et al. 2000). For the majority of tools, one might rather feel that the sensation of a touch is projected to the action-related part of the tool (usually the tip), such as one may feel the touch of the pen to occur between the paper and the pen tip, and not at the fingers holding the pen (see also Yamamoto and Kitazawa 2001b; Yamamoto et al. 2005). Accordingly, rather than the tool being integrated into the body schema, it may be that tool use results in the directing of attention toward that part in space that is relevant for the currently performed action. Supporting such interpretations, it has recently been shown that visual attention was enhanced at the movement endpoint of the tool as well as at the movement endpoint of the hand when a reach was planned with a tool. Attention was not enhanced, however, in between those locations along the tool (Collins et al. 2008). Similarly, cross-modal (visual–tactile) interactions have been shown to be enhanced at the tool tip and at the hand, but not
562
The Neural Bases of Multisensory Processes
in locations along the tool (Holmes et al. 2004; Yue et al. 2009). Finally, in a recent study, participants were asked to make tactile discrimination judgments about stimuli presented to the tip of a tool. Visual distractors were presented in parallel to the tactile stimuli. fMRI activity in response to the visual distractors near the end of the tool was enhanced in the occipital cortex, compared to locations further away from the tool (Holmes et al. 2008). These findings were also interpreted to indicate an increase of attention at the tool tip, due to the use of the tool. Experimental results such as these challenge the idea of an extension of the body schema. Other results, in contrast, do corroborate the hypothesis of an extension of the body schema due to tool use. For example, tool use resulted in a change of the perceived distance between two touches to the arm, which was interpreted to indicate an elongated representation of the arm (Cardinali et al. 2009b). It has recently been pointed out that the rubber hand illusion seems to consist of several dissociable aspects (Longo et al. 2008), revealed by the factor-analytic analysis of questionnaire related to the experience of the rubber hand illusion. More specific distinctions may need to be made about the different processes (and, as a consequence, the different effects found in experiments) involved in the construction of the body schema, and different experimental paradigms may tap into only a subset of these processes. In sum, multisensory signals are not only important for determining what parts we perceive to be made of. Multisensory mechanisms are also important in mediating the ability to use tools. It is currently under debate if tools extend the body schema by integrating the tool as a body part, or if other multisensory processes, for example, a deployment of attention to the space manipulated by the tool, are at the core of our ability to use tools.
28.2.4 Rapid Plasticity of Body Shape The rubber hand illusion demonstrates that what the brain interprets as the own body can be rapidly adjusted to the information that is received from the senses. Rapid changes of the body schema are, however, not restricted to the inventory of body parts considered to belong to the body, or their current posture. They also extend to the body’s shape. We already mentioned that the representation of the arm may be prolonged after tool use (Cardinali et al. 2009b). An experience most of us have had is the feeling of an increased size of an anesthesized body part, for example, the lip during a dentist’s appointment (see also Türker et al. 2005; Paqueron et al. 2003). Somewhat more spectacular, when a participant holds the tip of his nose with his thumb and index finger while his biceps muscle is vibrated to induce the illusion of the arm moving away from the body, many participants report that they perceive their nose to elongate to a length of up to 30 cm (sometimes referred to as the Pinocchio illusion; Lackner 1988). A related illusion can be evoked when an experimenter leads the finger of a participant to irregularly tap the nose of a second person (seated next to the participant), while he synchronously taps the participant’s own nose (Ramachandran and Hirstein 1997; see also discussion in Ramachandran and Hirstein 1998). Both illusions are induced by presenting the brain with mismatching information about touch and proprioception. They demonstrate that, despite the fact that our life experience would seem to preclude sudden elongations of the nose (or any other body part, for that matter), the body schema is readily adapted when sensory information from different modalities (here, tactile and proprioceptive) calls for an integration of initially mismatching content. The rubber hand illusion has been used also to investigate effects of the perception of body part size. Participants judged the size of a coin to be bigger when the illusion was elicited with a rubber hand bigger than their own, and to be smaller when the rubber hand was smaller (Bruno and Bertamini 2010). The rubber hand illusion thus influenced tactile object perception. This influence was systematic: as the real object held by the participants was always the same size, their finger posture was identical in all conditions. With the illusion of a small hand, this posture would indicate a relatively small distance between the small fingers. In contrast, with the illusion of a big hand, the same posture would indicate a larger distance between the large fingers.
The Body in a Multisensory World
563
Similarly, visually perceived hand size has also been shown to affect grip size, although more so when the visual image of the hand (a projection of an online video recording of the hand) was bigger than normal (Marino et al. 2010). The rubber hand illusion has also been used to create the impression of having an elongated arm by having participants wear a shirt with an elongated sleeve from which the rubber hand protruded (Schaefer et al. 2007). By recording magnetoencephalographic (MEG) responses to tactile stimuli to the illusion hand, this study also demonstrated an involvement of primary somatosensory cortex in the illusion. These experiments demonstrate that perception of the body can be rapidly adjusted by the brain, and that these perceptual changes in body shape affect object perception as well as hand actions.
28.2.5 Movement and Posture Information in the Brain The rubber hand illusion shows how intimately body part ownership and body posture are related: in this illusion, an object is felt to belong to the own body, but at the same time, posture of the real arm is felt to be at the location of the rubber arm. In the same way, posture is, of course, intimately related to movement, as every movement leads to a change in posture. However, different brain areas seem to be responsible for perceiving movement and posture. The perception of limb movement seems to depend on the primary sensory and motor cortex as well as on the premotor and supplementary motor cortex (reviewed by Naito 2004). This is true also for the illusory movement of phantom limbs, which is felt as real movement (Bestmann et al. 2006; Lotze et al. 2001; Roux et al. 2003; Brugger et al. 2000). However, the primary motor cortex may play a crucial role in movement perception. One can create an illusion of movement by vibration of the muscles responsible for the movement of a body part, for example, the arm or hand. When a movement illusion is created for one hand, then this illusion transfers to the other hand if the palms of the two hands touch. For both hands, fMRI activity increased in primary motor cortex, suggesting a primary role of this motor-related structure also for the sensation of movement (Naito et al. 2002). In contrast, the current body posture seems to be represented quite differently from limb movement perception. Proprioceptive information arrives in the cortex via the somatosensory cortex. Accordingly, neuronal responses in secondary somatosensory cortex (SII) to tactile stimuli to a monkeys hand were shown to be modulated by the monkey’s arm posture (Fitzgerald et al. 2004). In humans, the proprioceptive drift associated with the rubber hand illusion—that is, the change of the subjective position of the own hand toward the location of the rubber hand—was correlated with activity in SII acquired with PET (Tsakiris et al. 2007). (SII) was also implicated in body schema functions by a study in which participants determined the laterality of an arm seen on a screen by imagining to turn their own arm until it matched the seen one, as compared to when they determined the onscreen arm’s laterality by imagining its movement toward the appropriate location on a body that was also presented on the screen (Corradi-Dell’Acqua et al. 2009). SII was thus active during the imagination of specifically one’s own posture when making a postural judgment. However, many other findings implicate hierarchically higher, more posterior parietal areas in the maintenance of a posture representation. When participants were asked to reach with their hand to another body part, activity increased in the SPL after a posture change as compared to when participants repeated a movement they had just executed before. This posture change effect was observed both when the reaching hand changed its posture, as well as when participants reached with one hand to the other, and the target hand rather than the reaching hand changed its posture (Pellijeff et al. 2006). Although the authors interpreted their results as reflecting postural updating, they may instead be attributable to reach planning. However, a patient with an SPL lesion displayed symptoms that corroborate the view that the SPL is involved in the maintenance of a continuous postural model of the body (Harris and Wolpert 1998). This patient complained that her arm and leg felt like they drifted and then faded, unless she could see them. This subjective feeling was accompanied by an inability to retain grip force as well as a loss of tactile perception of a vibratory
564
The Neural Bases of Multisensory Processes
stimulus after it was displayed for several seconds. Because the patient’s deficit was not a general disability to detect tactile stimulation or perform hand actions, these results seem to imply that it is the maintenance of the current postural state of the body that was lost over time unless new visual, tactile, or proprioceptive information forced an update of the model. The importance of the SPL for posture control is also evident from a patient who, after SPL damage, lost her ability to correctly interact with objects requiring whole body coordination, such as sitting on a chair (Kase et al. 1977). Still further evidence for an involvement of the SPL in posture representation comes from experiments in healthy participants. When people are asked to judge the laterality of a hand presented in a picture, these judgments are influenced by the current hand posture adopted by the participant: the more unnatural it would be to align the own hand with the displayed hand, the longer participants take to respond (Parsons 1987; Ionta et al. 2007). A hand posture change during the hand laterality task led to an activation in the SPL in fMRI (de Lange et al. 2006). Hand crossing also led to a change in intraparietal activation during passive tactile stimulation (Lloyd et al. 2003). Finally, recall that fMRI activity during the buildup of the rubber hand illusion, thought to involve postural recalibration due to the visual information about the rubber arm, was also observed in the SPL. These findings are consistent with data from neurophysiological recordings in monkeys showing that neurons in area 5 (Sakata et al. 1973) in the superior parietal lobe as well as neurons in area PEc (located just at the upper border of the IPS and extending into the sulcus to border MIP; Breveglieri et al. 2008) respond to complex body postures, partly involving several limbs. Neurons in these areas respond to tactile, proprioceptive, and visual input (Breveglieri et al. 2008; Graziano et al. 2000). Furthermore, some area 5 neurons fire most when the felt and the seen position of the arm correspond rather than when they do not (Graziano 1999; Graziano et al. 2000). These neurons respond not only to vision of the own arm, but also to vision of a fake arm, if it is positioned in an anatomically plausible way such that they look as if they might belong to the animal’s own body, reminiscent of the rubber hand illusion in humans. Importantly, some neurons fire most when the visual information of the fake arm matches the arm posture of the monkey’s real, hidden arm, but reduce their firing rate when vision and proprioception do not match. To summarize, body movement and body posture are represented by different brain regions. Movement perception relies on the motor structures of the frontal lobe. Probably, the most important brain region for the representation of body posture, in contrast, is the SPL. This region is known to integrate signals from different sensory modalities, and damage to it results in dysfunctions of posture perception and actions requiring postural adaptations. However, other brain regions are involved in posture processing as well.
28.2.6 The Body Schema: A Distributed versus Holistic Representation The evidence reviewed so far has shown that what has been subsumed under the term body schema is not represented as one single, unitary entity in the brain—even if, from a psychological standpoint, it would seem to constitute an easily graspable and logically coherent concept. However, as has often proved to be the case in psychology and in the neurosciences, what researchers have hypothesized to be functional entities for the brain’s organization is not necessarily the way nature has indeed evolved the brain. The organization of the parietal and frontal areas seems to be modular, and they appear to be specialized for certain body parts (Rizzolatti et al. 1998; Grefkes and Fink 2005; Andersen and Cui 2009), for example, for hand grasping, arm reaching, and eye movements. Similarly, at least in parts of the premotor cortex, RFs for the different sensory modalities are body part–centered (e.g., around the hand; see also Section 28.3.2), suggesting that, possibly, other body part–specific areas may feature coordinate frames anchored to those body parts (Holmes and Spence 2004). As a consequence, the holistic body schema that we subjectively experience has been proposed to emerge from the interaction of multiple space-, body-, and action-related brain areas (Holmes and Spence 2004).
The Body in a Multisensory World
565
28.2.7 Interim Summary The first part of this chapter has highlighted how important the integration of multisensory information is for body processing. We showed that a representation of our body parts is probably innate, and that lesions to different brain structures such as the parietal and frontal lobes as well as subcortical structures can lead to malfunctions of this representation. Patients can perceive lost limbs as still present, report additional limbs to the normal ones, and deny the ownership of a limb. We went on to show how the integration of multisensory (usually visual and tactile) information is used in an online modification or “construction” of the body schema. In the rubber hand illusion, synchronous multisensory information leads to the integration of an external object into the body schema in the sense that the location of the real limb is felt to be at the external object. Multisensory information can also lead to adjustments of perceived body shape, as in the Pinocchio illusion. Information about body parts—their movement and their posture—are represented in a widespread network in the brain. Whereas limb movement perception seems to rely on motor structures, multisensory parietal areas are especially important for the maintenance of a postural representation. Finally, we noted that the current concept of the body schema in the brain is that of an interaction between many body part–specific representations.
28.3 THE BODY AS A MODULATOR FOR MULTISENSORY PROCESSING The first part of this chapter has focused on the multisensory nature of the body schema with its two aspects of what parts make up the body, and where those parts are located in space and in relation to one another. These studies form the basis for an exploration of the specific characteristics of body processing and its relevance for perception, action, and the connection of these two processes. The remainder of this article, therefore, will adopt the opposite view than the first part: it will assume the existence of a body schema and explore its influence on multisensory processing. One of the challenges for multisensory processing is that information from the different senses is received by sensors that are arranged very differently from modality to modality. In vision, light originating from neighboring spatial locations falls on neighboring rods and cones on the retina. When the eyes move, light from the same spatial origin falls on different sensors on the retina. Visual information is therefore initially eye-centered. Touch is perceived through sensors all over the skin. Because the body parts constantly move in relation to each other, a touch to the same part of the skin can correspond to very different locations in external, visual space. Similar challenges arise for the spatial processing in audition, but we will focus here on vision and touch.
28.3.1 Recalibration of Sensory Signals and Optimal Integration In some cases, the knowledge about body posture and movement is used to interpret sensory information. For example, Lackner and Shenker (1985) attached a light or a sound source to each hand of their participants who sat in a totally dark room. They then vibrated the biceps muscles of the two arms; recall that muscle vibration induces the illusion of limb movement. In this experimental setup, participants perceived an outward movement of the two arms. Both the lights and the sound were perceived as moving with the apparent location of the hands, although the sensory information on the retina and in the cochlea remained identical throughout these manipulations. Such experimental findings have lead to the proposal that the brain frequently recalibrates the different senses to ensure that the actions carried out with the limbs are in register with the external world (Lackner and DiZio 2000). The brain seems to use different sensory input to do this, depending on the experimental situation. In the rubber hand illusion, visual input about arm position apparently overrules proprioceptive information about the real position of the arm. In other situations, such as in the arm vibration illusion, proprioception can overrule vision.
566
The Neural Bases of Multisensory Processes
Although winner-take-all schemes for such dominance of one sense over another have been proposed (e.g., Ramachandran and Hirstein 1998), there is ample evidence that inconsistencies in the information from the different senses does not simply lead to an overruling of one by the other. Rather, the brain seems to combine the different senses to come up with a statistically optimal estimate of the true environmental situation, allowing for statistically optimal movements (Körding and Wolpert 2004; Trommershäuser et al. 2003) as well as perceptual decisions (Ernst and Banks 2002; Alais and Burr 2004). Because in many cases one of our senses outperforms the others in a specific sensory ability—for example, spatial acuity is superior in vision (Alais and Burr 2004), and temporal acuity is best in audition (Shams et al. 2002; Hötting and Röder 2004)—many experimental results have been interpreted in favor of an “overrule” hypothesis. Nevertheless, it has been demonstrated, for example, in spatial tasks, that the weight the brain assigns to the information received through a sensory channel is directly related to its spatial acuity, and that audition (Alais and Burr 2004) and touch (Ernst and Banks 2002) will overrule vision when visual acuity is sufficiently degraded. Such integration is probably involved also in body processing and in such phenomena as the rubber hand and Pinocchio illusions. In sum, the body schema influences how multisensory information is interpreted by the brain. The weight that a piece of multisensory information is given varies with its reliability (see also de Vignemont 2010).
28.3.2 Body Schema and Peripersonal Space Many neurons in a brain circuit involving the ventral intraparietal (VIP) area and the ventral premotor cortex (PMv) feature tactile RFs, mainly around the monkey’s mouth, face, or hand. These tactile RF are supplemented by visual and sometimes auditory RFs that respond to the area up to ~30 cm around the body part (Fogassi et al. 1996; Rizzolatti et al. 1981a, 1981b; Graziano et al. 1994, 1999; Duhamel et al. 1998; Graziano and Cooke 2006). Importantly, when either the body part or the eyes are moved, the visual RF is adjusted online such that the tactile and the visual modality remain aligned within a given neuron (Graziano et al. 1994). When one of these neurons is electrically stimulated, the animal makes defensive movements (Graziano and Cooke 2006). Because of these unique RF properties, the selective part of space represented by this VIP–PMv circuit has been termed the peripersonal space, and it has been suggested to represent a defense zone around the body. Note that the continuous spatial adjustment of the visual to the tactile RF requires both body posture and eye position to be integrated in a continuous manner. Two points therefore become immediately clear: first, the peripersonal space and the body schema are intimately related (see also Cardinali et al. 2009a); and second, as the body schema, the representation of peripersonal space includes information from several (if not all) sensory modalities. As is the case with the term “body schema,” the term “peripersonal space” has also been defined in several ways. It is sometimes used to denote the space within arm’s reach (see, e.g., Previc 1998). For the purpose of this review, “peripersonal space” will be used to denote the space directly around the body, in accord with the findings in monkey neurophysiology. Different approaches have been taken to investigate if peripersonal space is similarly represented in humans as in monkeys. One of them has been the study of patients suffering from extinction. These patients are usually able to report single stimuli in all spatial locations, but fail to detect contralesional stimuli when these are concurrently presented with ipsilesional stimuli (Ladavas 2002). The two stimuli can be presented in two different modalities (Ladavas et al. 1998), indicating that the process that is disrupted by extinction is multisensory in nature. More importantly, extinction is modulated in some patients by the distance of the distractor stimulus (i.e., the ipsilesional stimulus that extinguishes the contralesional stimulus) from the hand. For example, in some patients a tactile stimulus to the contralesional hand is extinguished by an ipsilesional visual stimulus to a much higher degree when it is presented in the peripersonal space of the patient’s ipsilesional hand than when it is presented far from it (di Pellegrino and Frassinetti 2000). Therefore, extinction is
The Body in a Multisensory World
567
modulated by two manipulations that are central to neurons representing peripersonal space in monkeys: (1) extinction can be multisensory and (2) it can dissociate between peripersonal and extrapersonal space. In addition, locations of lesions associated with extinction coincide (at least coarsely) with the brain regions associated with peripersonal spatial functions in monkeys (Mort et al. 2003; Karnath et al. 2001). The study of extinction patients has therefore suggested that a circuit for peripersonal space exists in humans, analogously to the monkey. The peripersonal space has also been investigated in healthy humans. One of the important characteristics of the way the brain represents peripersonal space is the alignment of visual and tactile events. In an fMRI study in which participants had to judge if a visual stimulus and a tactile stimulus to the hand were presented from the same side of space, hand crossing led to an increase of activation in the secondary visual cortex, indicating an influence of body posture on relatively low-level sensory processes (Misaki et al. 2002). In another study, hand posture was manipulated in relation to the eye: rather than changing hand posture itself, gaze was directed such that a tactile stimulus occurred either in the right or the left visual hemifield. The presentation of bimodal visual–tactile stimuli led to higher activation in the visual cortex in the hemisphere contralateral to the visual hemifield of the tactile location, indicating that the tactile location was remapped with respect to the visual space and then influenced visual cortex (Macaluso et al. 2002). These influences of posture and eye position on early sensory cortex may be mediated by parietal cortex. For example, visual stimuli were better detected when a tactile stimulus was concurrently presented (Bolognini and Maravita 2007). This facilitatory influence of the tactile stimulus was best when the hand was held near the visual stimulus, both when this implied a normal or a crossed hand posture. However, hand crossing had a very different effect when neural processing in the posterior parietal cortex was impaired by repetitive TMS: now a tactile stimulus was most effective when it was delivered to the hand anatomically belonging to that side of the body at which the visual stimulus was presented; when the hands were crossed, a right hand stimulus, for example, facilitated a right-side visual stimulus, although the hand was located in the left visual space (Bolognini and Maravita 2007). This result indicates that after disruption of parietal processing, body posture was no longer taken into account during the integration of vision and touch, nicely in line with the findings about the role of parietal cortex for posture processing (see Section 28.2.5). A more direct investigation of how the brain determines if a stimulus is located in the peri personal space was undertaken in an fMRI study that independently manipulated visual and proprioceptive cues about hand posture to modulate the perceived distance of a small visual object from the participants’ hand. Vision of the arm could be occluded, and the occluded arm was then located near the visual object (i.e., peripersonally) or far from it; the distance from the object could be determined by the brain only by using proprioceptive information. Alternatively, vision could be available to show that the hand was either close or far from the stimulus. Ingeniously, the authors manipulated these proprioceptive and visual factors together by using a rubber arm: when the real arm was held far away from the visual object, the rubber hand could be placed near the object so that visually the object was in peripersonal space (Makin et al. 2007). fMRI activity due to these manipulations was found in posterior parietal areas. There was some evidence that for the determination of posture in relation to the visual object, proprioceptive signals were more prominent in the anterior IPS close to the somatosensory cortex, and that vision was more prominent in more posterior IPS areas, closer to visual areas. Importantly, however, all of these activations were located in the SPL and IPS, the areas that have repeatedly been shown to be relevant for the representation of posture and of the body schema. Besides these neuroimaging approaches, behavioral studies have also been successful in investigating the peripersonal space and the body schema. One task that has rendered a multitude of findings is a cross-modal interference paradigm, the cross-modal congruency (CC) task (reviewed by Spence et al. 2004b). In this task, participants receive a tactile stimulus to one of four locations; two of these locations are located “up” and two are located “down” (see Figure 28.1). Participants are asked to judge the elevation of the tactile stimulus in each trial, regardless of its side (left or right).
568
The Neural Bases of Multisensory Processes
tactile stimuli visual distractors
FIGURE 28.1 Standard cross-modal congruency task. Tactile stimuli are presented to two locations on the hand (often index finger and thumb holding a cube; here, back and palm of the hand). In each trial, one of the tactile stimuli is presented concurrently with one of the visual distractor stimuli. Participants report if tactile stimulus came from an upper or a lower location. Although they are to ignore visual distractors, tactile judgment is biased toward location of the light. This influence is biggest when the distractor is presented at the same hand as tactile stimulus, and reduced when the distractor occurs at the other hand.
However, a to-be-ignored visual distractor stimulus is presented with every tactile target stimulus, also located at one of the four locations at which the tactile stimuli can occur. The visual distractor is independent of the tactile target; it can therefore occur at a congruent location (tactile and visual stimulus have the same elevation) or at an incongruent location (tactile and visual stimulus have opposing elevations). Despite the instruction to ignore the visual distractors, participants’ reaction times and error probabilities are influenced by them. When the visual distractors are congruent, participants perform faster and with higher accuracy than when the distractors are incongruent. The difference of the incongruent minus the congruent conditions (e.g., in RT and in accuracy) is referred to as the CC effect. Importantly, the CC effect is larger when the distractors are located close to the stimulated hands rather than far away (Spence et al. 2004a). Moreover, the CC effect is larger when the distractors are placed near rubber hands, but only if those are positioned in front of the participant in such a way that, visually, they could belong to the participant’s body (Pavani et al. 2000). The CC effect is also modulated by tool use in a similar manner as by rubber hands; when a visual distractor is presented in far space, the CC effect is relatively small, but it increases when a tool is held near the distractor (Maravita et al. 2002; Maravita and Iriki 2004; Holmes et al. 2007). Finally, the CC effect is increased during the whole body illusion (induced by synchronous stroking; see Section 28.2.2) when the distractors are presented on the back of the video image felt to be the own body, compared to when participants see the same video image and distractor stimuli, but without the induction of the whole body illusion (Aspell et al. 2009). These findings indicate that cross-modal interaction, as indexed in the CC effect, is modulated by the distance of the distractors from what is currently represented as the own body (i.e., the body schema) and thus suggest that the CC effect arises in part from the processing of peripersonal space. To summarize, monkey physiology, neuropsychological findings, and behavioral research suggest that the brain specially represents the space close around the body, the peripersonal space. There is a close relationship between the body schema and the representation of peripersonal space, as body posture must be taken into account to remap, from moment to moment, which part of external space is peripersonal.
28.3.3 Peripersonal Space around Different Parts of the Body All of this behavioral research—just as much as the largest part of all neurophysiological, neuro psychological, and neuroimaging research—has explored peripersonal space and the body schema using stimulation to and near the hands. The hands may be considered special in that they are used for almost any kind of action we perform. Processing principles revealed for the hands may therefore not generalize to other body parts. As an example, hand posture, but not foot posture, has been reported to influence the mental rotation of these limbs (Ionta et al. 2007; Ionta and Blanke 2009; but see Parsons 1987). Moreover, monkey work has demonstrated multisensory neurons with peripersonal spatial characteristics only for the head, hand, and torso, but neurons with equivalent
The Body in a Multisensory World
569
characteristics for the lower body have so far not been reported (Graziano et al. 2002). The peripersonal space representation may thus be limited to body parts that are important for the manipulation of objects under (mainly) visual control. To test this hypothesis in humans, body schema–related effects such as the CC effect, which have been conducted for the hands, must be investigated for other body parts. The aforementioned study of the CC effect during the whole body illusion (Aspell et al. 2009; see also Section 28.2.2) demonstrated a peripersonal spatial effect near the back. The CC effect was observable also when stimuli were delivered to the feet (Schicke et al. 2009), suggesting that a representation of the peripersonal space exists also for the space around these limbs. If the hypothesis is correct that the body schema is created from body part–specific representations, one might expect that the representation of the peripersonal space of the hand and that of the foot do not interact. To test this prediction, tactile stimuli were presented to the hands while visual distractors were flashed either near the participant’s real foot, near a fake foot, or far from both the hand and the foot. The cross-modal interference of the visual distractors, indexed by the CC effect, was larger when they were presented in the peripersonal space of the real foot than when they were presented near the fake foot or in extrapersonal space (Schicke et al. 2009). The spatial judgment of tactile stimuli at the hand was thus modulated when a visual distractor appeared in the peripersonal space of another body part. This effect cannot be explained with the current concept of peripersonal space as tactile RFs encompassed by visual RFs. These results rather imply either a holistic body schema representation, or, more probably, interactions beyond simple RF overlap between the peripersonal space representations of different body parts (Holmes and Spence 2004; Spence et al. 2004b). In sum, the peripersonal space is represented not just for the hands, but also for other body parts. Interactions between the peripersonal spatial representations of different body parts challenge the concept of peripersonal space being represented merely by overlapping RFs.
28.3.4 Across-Limb Effects in Spatial Remapping of Touch The fact that visual distractors in the CC paradigm have a higher influence when they are presented in the peripersonal space implies that the brain matches the location of the tactile stimulus with that of the visual one. The tactile stimulus is registered on the skin; to match this skin location to the location of the visual stimulus requires that body posture be taken into account and the skin location be projected into an external spatial reference frame. Alternatively, the visual location of the distractor could be computed with regard to the current location of the tactile location, that is, with respect to the hand, and thus be viewed as a projection of external onto the somatotopic space (i.e., the skin). This remapping of visual–tactile space has been more thoroughly explored by manipulating hand posture. As in the standard CC task described earlier, stimuli were presented to the two hands and the distractors were placed near the tactile stimuli (Spence et al. 2004a). However, in half of the trials, participants crossed their hands. If spatial remapping occurs in this task, then the CC effect should be high whenever the visual distractor is located near the stimulated hand. In contrast, if tactile stimuli were not remapped into external space, then a tactile stimulus on the right hand should always be influenced most by a right-hemifield visual stimulus, independent of body posture. The results were clear-cut: when the hands were crossed, the distractors that were now near the hand were most effective. In fact, in this experiment the CC effect pattern of left and right distractor stimuli completely reversed, which the authors interpreted as a “complete remapping of visuotactile space” (p. 162). Spatial remapping could thus be viewed as a means of integrating spatial information from the different senses in multisensory contexts. However, spatial remapping has also been observed in purely tactile tasks that do not involve any distractor stimuli of a second modality. One example is the temporal order judgment (TOJ) task, in which participants judge which of two tactile stimuli occurred first. Performance in this task is impaired when participants cross their hands (Yamamoto
570
The Neural Bases of Multisensory Processes
and Kitazawa 2001a; Shore et al. 2002; Röder et al. 2004; Schicke and Röder 2006; Azanon and Soto-Faraco 2007). It is usually assumed that the performance deficit after hand crossing in the TOJ task is due to a conflict between two concurrently active reference frames: one anatomical and one external (Yamamoto and Kitazawa 2001a; Röder et al. 2004; Schicke and Röder 2006). The right–left coordinate axes of these two reference frames are opposed to each other when the hands are crossed; for example, the anatomically right arm is located in the externally left hemispace during hand crossing. This remapping takes place despite the task being purely tactile, and despite the detrimental effect of using the external reference frame in the task. Remapping of stimulus location by accounting for current body posture therefore seems to be an automatically evoked process in the tactile system. In the typical TOJ task, the two stimuli are applied to the two hands. It would therefore be possible that the crossing effect is simply due to a confusion regarding the two homologous limbs, rather than to the spatial location of the stimuli. This may be due to a coactivation of homologous brain areas in the two hemispheres (e.g., in SI or SII), which may make it difficult to assign the two concurrent tactile percepts to their corresponding visual spatial locations. However, a TOJ crossing effect was found for tactile stimuli delivered to the two hands, to the two feet, or to one hand and the contralateral foot (Schicke and Röder 2006). In other words, participants were confused not only about which of the two hands or the two feet was stimulated first, but they were equally impaired in deciding if it was a hand or a foot that received the first stimulus. Therefore, the tactile location originating on the body surface seems to be remapped into a more abstract spatial code for which the original skin location, and the somatotopic coding of primary sensory cortex, is no longer a dominating feature. In fact, it has been suggested that the location of a tactile stimulus on the body may be reconstructed by determining which body part currently occupies the part of space at which the tactile stimulus has been sensed (Kitazawa 2002). The externally anchored reference frame is activated in parallel with a somatotopic one, and their concurrent activation leads to the observed behavioral impairment. To summarize, remapping of stimulus location in a multisensory experiment such as the CC paradigm is a necessity for aligning signals from different modalities. Yet, even when stimuli are purely unimodal, and the task would not require a recoding of tactile location into an external coordinate frame, such a transformation nonetheless seems to take place. Thus, even for purely tactile processing, posture information (e.g., proprioceptive and visual) is automatically integrated.
28.3.5 Is the External Reference Frame a Visual One? The representation of several reference frames is, of course, not unique to the TOJ crossing effect. In monkeys, the parallel existence of multiple reference frames has been demonstrated in the different subareas of the IPS, for example, in VIP (Schlack et al. 2005), which is involved in the representation of peripersonal space, in MIP, which is involved in arm reaching (Batista et al. 1999), and in LIP, which is engaged in saccade planning (Stricanne et al. 1996). Somewhat counterintuitively, many neurons in these areas do not represent space in a reference frame that can be assigned to one of the sensory systems (e.g., a retinotopic one for vision, a head-centered one for audition) or a specific limb (e.g., a hand-centered reference frame for hand reach planning). Rather, there are numerous intermediate coding schemes present in the different neurons (Mullette-Gillman et al. 2005; Schlack et al. 2005). However, such intermediate coding has been shown to enable the transformation of spatial codes between different reference frames, possibly even in different directions, for example, from somatotopic to eye-centered and vice versa (Avillac et al. 2005; Pouget et al. 2002; Cohen and Andersen 2002; Xing and Andersen 2000). Similar intermediate coding has been found in posture-related area 5, which codes hand position in an intermediate manner between eye- and hand-centered coordinates (Buneo et al. 2002). Further downstream, in some parts of MIP, arm reaching coordinates may, in contrast, be represented fully in eye-centered coordinates, independent of whether the sensory target for reaching is visual (Batista et al. 1999; Scherberger et al.
The Body in a Multisensory World
571
2003; Pesaran et al. 2006) or auditory (Cohen and Andersen 2000). In addition to these results from monkeys, an fMRI experiment in humans has suggested common spatial processing of visual and tactile targets for saccade as well as for reach planning (Macaluso et al. 2007). Still further downstream, in the motor-related PMv, which has been proposed to form the peripersonal space circuit together with VIP, visual RFs are aligned with hand position (Graziano and Cooke 2006). These findings have led to the suggestion that the external reference frame involved in tactile localization is a visual one, and that remapping occurs automatically to aid the fusion of spatial information of the different senses. Such use of visual coordinates may be helpful not only for action planning (e.g., the reach of the hand toward an object), but also for an efficient online correction of motor error with respect to the visual target (Buneo et al. 2002; Batista et al. 1999). A number of variants of the TOJ paradigm have been employed to study the visual origin of the external reference frame in humans. For example, the crossing effect could be ameliorated when participants viewed uncrossed rubber hands (with their real hands hidden), indicating that visual (and not just proprioceptive) cues modulate spatial remapping (Azanon and Soto-Faraco 2007). In the same vein, congenitally blind people did not display a TOJ crossing effect, suggesting that they do not by default activate an external reference frame for tactile localization (Röder et al. 2004). Congenitally blind people also outperformed sighted participants when the use of an anatomically anchored reference frame was advantageous to solve a task, whereas they performed worse than the sighted when an external reference frame was better suited to solve a task (Röder et al. 2007). Importantly, people who turned blind later in life were influenced by an external reference frame in the same manner as sighted participants, indicating that spatial remapping develops during ontogeny when the visual system is available, and that the lack of automatic coordinate transformations into an external reference frame is not simply an unspecific effect of long-term visual deprivation (Röder et al. 2004, 2007). In conclusion, the use of an external reference frame seems to be induced by the visual system, and this suggests that the external coordinates used in the remapping of sensory information are visual coordinates. Children did not show a TOJ crossing effect before the age of ~5½ years (Pagel et al. 2009). This late use of external coordinates suggests that spatial remapping requires a high amount of learning and visual–tactile experience during interaction with the environment. One might therefore expect remapping to take place only in regions that are accessible to vision. In the TOJ paradigm, one would thus expect a crossing effect when the hands are held in front, but no such crossing effect when the hands are held behind the back (as, because of the lack of tactile–visual experience in that part of space, no visual–tactile remapping should take place). At odds with these predictions, Kobor and colleagues (2006) observed a TOJ crossing effect (although somewhat reduced) also behind the back. We conducted the same experiment in our laboratory, and found that the size of the crossing effect did not differ in the front and in the back [previously unpublished data; n = 11 young, healthy, blindfolded adults, just noticeable difference (JND) for correct stimulus order: front uncrossed: 66 ± 10 ms; uncrossed back: 67 ± 11 ms; crossed front: 143 ± 39 ms; crossed back: 138 ± 25 ms; ANOVA main effect of part of space and interaction of hand crossing with hand of space, both F1,10 <1]. Because we must assume only minimal visual–tactile experience for the space behind our body, the results of these two experiments do not support the idea that the external coordinate system in tactile remapping is purely visual. It is possible that the brain uses an action-based reference frame that represents the environment (or action target location) in external coordinates, which can be used to orient not only the eye, but also gaze, trunk, or the whole body, rather than simply visual (i.e., eye- or retina-centered) coordinates. In other words, the external reference frame may be anchored to the eyes for that part of space that is currently accessible to the eyes, but may be related to head, trunk, or body movement parameters for those parts of space currently out of view. Such a coordinate system would benefit from using external coordinates, because eye-, head-, and possibly body or trunk position must all be fused to allow the directing of the eyes (and, with them, usually the focus of attention) onto an externally located target.
572
The Neural Bases of Multisensory Processes
Such an action-related reference frame seems plausible for several reasons. At least eye and head orienting (together referred to as gaze orienting) are both mediated by the superior colliculus (SC) (Walton et al. 2007), a brain structure that is important for multisensory processing (Stein et al. 1995; Freedman and Sparks 1997; Stuphorn et al. 2000) and that is connected to the IPS (Pare and Wurtz 1997, 2001). IPS as well as PMv (also connected to IPS) encode reaching (i.e., action) targets also in the dark, that is, in the absence of visual information (Fattori et al. 2005; Graziano et al. 1997). Moreover, a recent fMRI study demonstrated activation of the frontal eye fields—a structure thought to be involved in saccadic eye movements and visual attention—to sounds behind the head (Tark and Curtis 2009), which would suggest either a representation of unseen space (Tark and Curtis 2009) or, alternatively, the representation of a target coordinate in “action space” rather than in eye-centered space. For locations that one can orient toward with an eye–head movement, an action-based reference frame could be identical to a visual reference frame and use eye- or gaze-centered coordinates, in line with eye-centered coding of saccade as well as hand reach targets in LIP and MIP. The monkey’s head is usually fixed during single-cell recording experiments, making it impossible to differentiate between eye-centered and gaze-centered (let alone trunk- or body-centered target) coding. In addition, the spatial coding of reach targets that are out of view (but not in the dark) has, to our knowledge, not been investigated. To sum up, many electrophysiological and behavioral studies have suggested that touch is remapped into visual coordinates, presumably to permit its integration with information from other modalities. Remapping refers to a recoding of the location of a tactile event on the skin onto its external-spatial coordinates; in other words, remapping accounts for body posture when matching visual and tactile spatial locations. Because of the influence of the visual system during ontogeny (and, therefore, not in the congenitally blind), remapping occurs even for unimodal tactile events. Yet, the external reference frame may be “more than” visual, subserving orienting actions also to parts of space outside the current visual field.
28.3.6 Investigating the Body Schema and Reference Frames with Electrophysiology Most of the evidence for coordinate transformations in humans discussed in this chapter so far has used behavioral measures. Electrophysiological measures (e.g., ERPs) offer an additional approach to investigate these processes. ERPs record electrical brain signals with millisecond resolution and therefore allow a very detailed investigation of functional brain activity. One fruitful approach is the manipulation of the attentional processing of sensory stimuli: it is known that the ERP is enhanced when a stimulus is presented at a location the person is currently attending to. In fact, there have been reports about the effect of hand crossing on the attentional processing of tactile stimuli delivered to the two hands. When a tactile stimulus is delivered to a hand while participants direct their attention to that hand, ERP deflections in the time range of 80–150 ms as well as between 200 and 300 ms are enhanced compared to when the same stimuli are delivered to the same hand while it is not attended. However, when participants crossed their hands, early attentional effects disappeared, and later effects were significantly reduced (Eimer et al. 2001, 2003). These ERP results imply that tactile spatial attentional processes do not rely on an anatomical reference frame alone, as posture should otherwise have had no influence on attention-related ERP effects. A disadvantage of this experimental design is the only coarse differentiation between attended and unattended stimulation to determine the influence of reference frames: the lack of difference between attended and unattended conditions after hand crossing may be due to mere confusion of the two hands, effectively preventing selective direction of attention to one hand. Alternatively, it may be due to attentional enhancement to one hand in a somatotopic, and to the other hand in an external reference frame. However, the difference in ERP magnitude between attended and unattended spatial locations is not binary. Rather, ERPs gradually decrease with the distance at which a stimulus is presented from
573
The Body in a Multisensory World
the attended location, a phenomenon termed the spatial attentional gradient (Mangun and Hillyard 1988). The spatial gradient can be exploited to test more thoroughly if the ERP effects due to hand crossing are attributable to hand confusion, and to investigate if coordinate transformations are calculated for body parts other than the hands. To this end, participants were asked to attend to one of their feet, while tactile stimuli were presented to both hands and feet in random stimulus order. The hands were placed near the feet. Crucially, in some blocks each hand lay near its ipsilateral foot, whereas in some blocks, the hands were crossed so that each hand lay next to its contralateral foot. Thus, each hand could be near to or far from the attended location (one of the feet). The external spatial distance of each hand to the attended foot reversed with hand crossing, whereas of course the anatomical distance from each hand to the attended foot remained identical in both uncrossed and crossed conditions. Investigating the spatial gradient in ERPs to hand and foot stimuli thus made it possible to investigate whether the tactile system defines spatial distance in somatotopic or in external coordinates. In the time interval 100–140 ms after stimulus presentation, ERPs of unattended tactile hand stimuli were more similar to the ERP of an attended hand stimulus when the hands were located close to the attended foot than when they were located far away, demonstrating that tactile attention uses an external reference frame (Heed and Röder 2010; see Figure 28.2). At the same time, ERPs were also influenced by the anatomical distance between the attended and the stimulated locations.
x
[µV] –1
x
0 x
1 2
x
3 4 –100
0
100
200
300
400 [ms]
x
FIGURE 28.2 ERP results for hand stimulation. Traces from a fronto-central electrode ipsilateral to stimulation. In the figures depicting the different conditions, the attended foot is indicated by a filled gray dot; the stimulated right hand is indicated by a gray cross. Thin black (lowest) trace (last figure): the hand was attended and stimulated. Signal should be highest in this condition. Note that the direction of ERP (positive or negative deflection) does not carry meaning in this context. Black traces (first and second figures): stimulation of the hand contralateral to attended foot. Gray traces (third and fourth figures): stimulation of the hand ipsilateral to attended foot. Thin traces: close spatial distance (according to an external reference frame) between stimulated and attended limb. Bold traces: far spatial distance between stimulated and attended limbs. ERPs started to differ after ~100 ms after stimulus. Differences were largest in the 100- to 140-ms time interval, which has been known to be modulated by spatial attention. For hand stimulation both ipsilateral and contralateral to the attended foot, a short spatial distance from the attended foot led to a more positive ERP in this time interval; in other words, the ERP was more similar to the thin black trace (stimulation at attended location) for near than for far spatial distance (thin vs. bold traces), indicating the use of an external spatial reference frame for the representation of nonattended tactile stimuli. At the same time, anatomical distance (black vs. gray colors) also modulated ERPs, indicating the use of an anatomical reference frame.
574
The Neural Bases of Multisensory Processes
ERPs to unattended hand stimuli were more similar to the ERP of an attended hand stimulus when the ipsilateral rather than the contralateral foot was attended. ERPs in this time range are thought to originate in the secondary somatosensory cortex (SII) (Frot and Mauguiere 1999; Eimer and Forster 2003). Recall that SII was implicated in the integration of a rubber hand, as indexed by the perceptual drift of the own hand toward the rubber hand (Tsakiris et al. 2007), as well as in making postural judgments (Corradi-Dell’Acqua et al. 2009). These findings thus converge with the ERP results in emphasizing the importance of relatively lower-level somatosensory areas in the representation of our body schema by coding not only the current position of our hands, but also the current spatial relationship of different body parts to each other, both in anatomical and external coordinates.
28.3.7 Summary The second part of this chapter focused on the influence of the body and the body schema on multisensory processing. We started showing that body posture can be used to calibrate the spatial relationship between the senses, and we discussed that the brain weighs information from the different senses depending on their reliability. Such statistically optimal integrational processes may also be at the heart of the phenomena presented in the first part of the chapter, for example, the rubber hand illusion. The remainder of the chapter focused on multisensory spatial processing, starting out with the evidence for a special representation of the space directly around our body, demonstrating the link between the body schema and multisensory spatial processing. We showed that the peripersonal space is represented not only for the hands, but also for other body parts, and that not all experimental results can be explained by the common notion of the peripersonal space being represented simply by tactile RFs on a body part with a matching visual RF. We then showed that the body schema is also important in multisensory processing, but also in purely tactile processing, in that tactile locations are automatically remapped into external spatial coordinates. These external coordinates are closely related to the visual modality, but extend beyond the current visual field into the space that cannot be seen. Finally, ERPs were shown to be modulated both by anatomical and external coordinate frames. This highlights that although in some situations tactile locations seem to be fully remapped into purely external coordinates, the original, anatomical location of the touch is never quite forgotten. Such concurrent representations of both anatomical and external location seem useful in the context of action control. For example, to fend off a dangerous object—be it an insect ready to sting or the hand of an adversary who has grabbed one’s arm—it is crucial to know which limb can be chosen to initiate the defensive action, but also where the action must be guided in space. Thus, when the right arm has been grabbed, one cannot use this arm to strike against the opponent—and this is independent of the current external location of the captured arm. However, once it has been determined which arm is free for use in a counterattack, it becomes crucial to know where (in space) this arm should strike to fend off the attacker.
28.4 CONCLUSION Our different senses enable us to perceive and act toward the environment. However, they also enable us to perceive ourselves and, foremost, our body. Because we can move in many different ways, our brain must keep track of our current posture at all times to guide actions effectively. However, the brain is also surprisingly flexible with respect to representing what it assumes to belong to the body at any given point in time, and about the body’s current shape. One of the main principles of the brain’s body processing seems to be the attempt to “make sense of all the senses” by integrating all available information. As we saw, this processing principle can lead to surprising illusions, such as the rubber hand illusion, the Pinocchio nose, or the feeling of being located outside the body, displaced toward a video image. As is often the case in psychology, these illusions also enlighten us about the brain processes in normal circumstances.
The Body in a Multisensory World
575
As much as multisensory information is important for the construction of our body schema, this body representation is in turn important for many instances of multisensory processing. Visual events in the peripersonal space are specially processed to protect our body, and our flexibility to move in many ways requires that the spatial locations of the different sensory modalities are transformed into a common reference system. None of these functions could work without some representation of the body’s current configuration.
REFERENCES Aglioti, S., N. Smania, M. Manfredi, and G. Berlucchi. 1996. Disownership of left hand and objects related to it in a patient with right brain damage. Neuroreport 8: 293–296. Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14: 257–262. Andersen, R. A., and H. Cui. 2009. Intention, action planning, and decision making in parietal–frontal circuits. Neuron 63: 568–583. Armel, K. C., and V. S. Ramachandran. 2003. Projecting sensations to external objects: Evidence from skin conductance response. Proc R Soc Lond B Biol Sci 270: 1499–1506. Aspell, J. E., B. Lenggenhager, and O. Blanke. 2009. Keeping in touch with one’s self: Multisensory mechanisms of self-consciousness. PLoS ONE 4: e6488. Avillac, M., S. Deneve, E. Olivier, A. Pouget, and J. R. Duhamel. 2005. Reference frames for representing visual and tactile locations in parietal cortex. Nat Neurosci 8: 941–949. Azanon, E., and S. Soto-Faraco. 2007. Alleviating the ‘crossed-hands’ deficit by seeing uncrossed rubber hands. Exp Brain Res 182: 537–548. Bakheit, A. M., and S. Roundhill. 2005. Supernumerary phantom limb after stroke. Postgrad Med J 81: e2. Batista, A. P., C. A. Buneo, L. H. Snyder, and R. A. Andersen. 1999. Reach plans in eye-centered coordinates. Science 285: 257–260. Berlucchi, G., and S. M. Aglioti. 2010. The body in the brain revisited. Exp Brain Res 200: 25–35. Bestmann, S., A. Oliviero, M. Voss, P. Dechent, E. Lopez-Dolado, J. Driver, and J. Baudewig. 2006. Cortical correlates of TMS-induced phantom hand movements revealed with concurrent TMS-fMRI. Neuropsychologia 44: 2959–2971. Blanke, O., T. Landis, L. Spinelli, and M. Seeck. 2004. Out-of-body experience and autoscopy of neurological origin. Brain 127: 243–258. Blanke, O., and T. Metzinger. 2009. Full-body illusions and minimal phenomenal selfhood. Trends Cogn Sci 13: 7–13. Blanke, O., S. Ortigue, T. Landis, and M. Seeck. 2002. Stimulating illusory own-body perceptions. Nature 419: 269–270. Bolognini, N., and A. Maravita. 2007. Proprioceptive alignment of visual and somatosensory maps in the posterior parietal cortex. Curr Biol 17: 1890–1895. Botvinick, M. 2004. Neuroscience. Probing the neural basis of body ownership. Science 305: 782–783. Botvinick, M., and J. Cohen. 1998. Rubber hands ‘feel’ touch that eyes see. Nature 391: 756. Breveglieri, R., C. Galletti, S. Monaco, and P. Fattori. 2008. Visual, somatosensory, and bimodal activities in the macaque parietal area PEc. Cereb Cortex 18: 806–816. Brugger, P., S. S. Kollias, R. M. Muri, G. Crelier, M. C. Hepp-Reymond, and M. Regard. 2000. Beyond remembering: Phantom sensations of congenitally absent limbs. Proc Natl Acad Sci U S A 97: 6167–6172. Bruno, N., and M. Bertamini. 2010. Haptic perception after a change in hand size. Neuropsychologia 48: 1853–1856. Buneo, C. A., M. R. Jarvis, A. P. Batista, and R. A. Andersen. 2002. Direct visuomotor transformations for reaching. Nature 416: 632–636. Cardinali, L., C. Brozzoli, and A. Farne. 2009a. Peripersonal space and body schema: Two labels for the same concept? Brain Topogr 21: 252–260. Cardinali, L., F. Frassinetti, C. Brozzoli, C. Urquizar, A. C. Roy, and A. Farne. 2009b. Tool-use induces morphological updating of the body schema. Curr Biol 19: R478–R479. Cohen, Y. E., and R. A. Andersen. 2000. Reaches to sounds encoded in an eye-centered reference frame. Neuron 27: 647–652. Cohen, Y. E., and R. A. Andersen. 2002. A common reference frame for movement plans in the posterior parietal cortex. Nat Rev Neurosci 3: 553–562. Collins, T., T. Schicke, and B. Röder. 2008. Action goal selection and motor planning can be dissociated by tool use. Cognition 109: 363–371.
576
The Neural Bases of Multisensory Processes
Corradi-Dell’Acqua, C., B. Tomasino, and G. R. Fink. 2009. What is the position of an arm relative to the body? Neural correlates of body schema and body structural description. J Neurosci 29: 4162–4171. Cutting, J. 1978. Study of anosognosia. J Neurol Neurosurg Psychiatry 41: 548–555. Daprati, E., A. Sirigu, P. Pradat-Diehl, N. Franck, and M. Jeannerod. 2000. Recognition of self-produced movement in a case of severe neglect. Neurocase 6: 477–486. de Lange, F. P., R. C. Helmich, and I. Toni. 2006. Posture influences motor imagery: An fMRI study. Neuroimage 33: 609–617. de Vignemont, F. 2010. Body schema and body image—pros and cons. Neuropsychologia 48: 669–680. di Pellegrino, G., and F. Frassinetti. 2000. Direct evidence from parietal extinction of enhancement of visual attention near a visible hand. Curr Biol 10: 1475–1477. Dijkerman, H. C., and E. H. de Haan. 2007. Somatosensory processes subserving perception and action. Behav Brain Sci 30: 189–201; discussion 201–239. Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral intraparietal area of the macaque: Congruent visual and somatic response properties. J Neurophysiol 79: 126–136. Ehrsson, H. H. 2007. The experimental induction of out-of-body experiences. Science 317: 1048. Ehrsson, H. H., C. Spence, and R. E. Passingham. 2004. That’s my hand! Activity in premotor cortex reflects feeling of ownership of a limb. Science 305: 875–877. Ehrsson, H. H., K. Wiech, N. Weiskopf, R. J. Dolan, and R. E. Passingham. 2007. Threatening a rubber hand that you feel is yours elicits a cortical anxiety response. Proc Natl Acad Sci U S A 104: 9828–9833. Eimer, M., D. Cockburn, B. Smedley, and J. Driver. 2001. Cross-modal links in endogenous spatial attention are mediated by common external locations: Evidence from event-related brain potentials. Exp Brain Res 139: 398–411. Eimer, M., and B. Forster. 2003. Modulations of early somatosensory ERP components by transient and sustained spatial attention. Exp Brain Res 151: 24–31. Eimer, M., B. Forster, and J. Van Velzen. 2003. Anterior and posterior attentional control systems use different spatial reference frames: ERP evidence from covert tactile–spatial orienting. Psychophysiology 40: 924–933. Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429–433. Fattori, P., D. F. Kutz, R. Breveglieri, N. Marzocchi, and C. Galletti. 2005. Spatial tuning of reaching activity in the medial parieto-occipital cortex (area V6A) of macaque monkey. Eur J Neurosci 22: 956–972. Fitzgerald, P. J., J. W. Lane, P. H. Thakur, and S. S. Hsiao. 2004. Receptive field properties of the macaque second somatosensory cortex: Evidence for multiple functional representations. J Neurosci 24: 11193–11204. Fogassi, L., V. Gallese, L. Fadiga, G. Luppino, M. Matelli, and G. Rizzolatti. 1996. Coding of peripersonal space in inferior premotor cortex (area F4). J Neurophysiol 76: 141–157. Freedman, E. G., and D. L. Sparks. 1997. Activity of cells in the deeper layers of the superior colliculus of the rhesus monkey: Evidence for a gaze displacement command. J Neurophysiol 78: 1669–1690. Frot, M., and F. Mauguiere. 1999. Timing and spatial distribution of somatosensory responses recorded in the upper bank of the sylvian fissure (SII area) in humans. Cereb Cortex 9: 854–863. Gallagher, S. 1986. Body image and body schema: A conceptual clarification. J Mind Behav 7: 541–554. Graziano, M. S. 1999. Where is my arm? The relative role of vision and proprioception in the neuronal representation of limb position. Proc Natl Acad Sci U S A 96: 10418–10421. Graziano, M. S., and D. F. Cooke. 2006. Parieto-frontal interactions, personal space, and defensive behavior. Neuropsychologia 44: 845–859. Graziano, M. S., D. F. Cooke, and C. S. Taylor. 2000. Coding the location of the arm by sight. Science 290: 1782–1786. Graziano, M. S., and S. Gandhi. 2000. Location of the polysensory zone in the precentral gyrus of anesthetized monkeys. Exp Brain Res 135: 259–266. Graziano, M. S., X. T. Hu, and C. G. Gross. 1997. Visuospatial properties of ventral premotor cortex. J Neurophysiol 77: 2268–2292. Graziano, M. S., L. A. Reiss, and C. G. Gross. 1999. A neuronal representation of the location of nearby sounds. Nature 397: 428–430. Graziano, M. S., C. S. Taylor, and T. Moore. 2002. Complex movements evoked by microstimulation of precentral cortex. Neuron 34: 841–851. Graziano, M. S., G. S. Yap, and C. G. Gross. 1994. Coding of visual space by premotor neurons. Science 266: 1054–1057. Grefkes, C., and G. R. Fink. 2005. The functional organization of the intraparietal sulcus in humans and monkeys. J Anat 207: 3–17.
The Body in a Multisensory World
577
Halligan, P. W., J. C. Marshall, and D. T. Wade. 1993. Three arms: A case study of supernumerary phantom limb after right hemisphere stroke. J Neurol Neurosurg Psychiatry 56: 159–166. Halligan, P. W., J. C. Marshall, and D. T. Wade. 1995. Unilateral somatoparaphrenia after right hemisphere stroke: A case description. Cortex 31: 173–182. Harris, C. M., and D. M. Wolpert. 1998. Signal-dependent noise determines motor planning. Nature 394: 780–784. Heed, T., and B. Röder. 2010. Common anatomical and external coding for hands and feet in tactile attention: Evidence from event-related potentials. J Cogn Neurosci 22: 184–202. Holmes, N. P., G. A. Calvert, and C. Spence. 2004. Extending or projecting peripersonal space with tools? Multisensory interactions highlight only the distal and proximal ends of tools. Neurosci Lett 372: 62–67. Holmes, N. P., G. A. Calvert, and C. Spence. 2007. Tool use changes multisensory interactions in seconds: Evidence from the crossmodal congruency task. Exp Brain Res 183: 465–476. Holmes, N. P., H. J. Snijders, and C. Spence. 2006. Reaching with alien limbs: Visual exposure to prosthetic hands in a mirror biases proprioception without accompanying illusions of ownership. Percept Psychophys 68: 685–701. Holmes, N. P., and C. Spence. 2004. The body schema and multisensory representation(s) of peripersonal space. Cogn Process 5: 94–105. Holmes, N. P., C. Spence, P. C. Hansen, C. E. Mackay, and G. A. Calvert. 2008. The multisensory attentional consequences of tool use: A functional magnetic resonance imaging study. PLoS ONE 3: e3502. Hötting, K., and B. Röder. 2004. Hearing cheats touch, but less in congenitally blind than in sighted individuals. Psychol Sci 15: 60–64. Ionta, S., and O. Blanke. 2009. Differential influence of hands posture on mental rotation of hands and feet in left and right handers. Exp Brain Res 195: 207–217. Ionta, S., A. D. Fourkas, M. Fiorio, and S. M. Aglioti,. 2007. The influence of hands posture on mental rotation of hands and feet. Exp Brain Res 183: 1–7. Iriki, A., M. Tanaka, and Y. Iwamura. 1996. Coding of modified body schema during tool use by macaque postcentral neurones. Neuroreport 7: 2325–2330. Kammers, M. P., F. de Vignemont, L. Verhagen, and H. C. Dijkerman. 2009a. The rubber hand illusion in action. Neuropsychologia 47: 204–211. Kammers, M. P., J. A. Kootker, H. Hogendoorn, and H. C. Dijkerman. 2009b. How many motoric body representations can we grasp? Exp Brain Res 202: 203–212. Kammers, M. P., L. O. Verhagen, L. H. C. Dijkerman, H. Hogendoorn, F. De Vignemont, and D. J. Schutter. 2009c. Is this hand for real? Attenuation of the rubber hand illusion by transcranial magnetic stimulation over the inferior parietal lobule. J Cogn Neurosci 21: 1311–1320. Karnath, H. O., S. Ferber, and M. Himmelbach. 2001. Spatial awareness is a function of the temporal not the posterior parietal lobe. Nature 411: 950–953. Kase, C. S., J. F. Troncoso, J. E. Court, J. F. Tapia, and J. P. Mohr. 1977. Global spatial disorientation. Clinicopathologic correlations. J Neurol Sci 34: 267–278. Kitazawa, S. 2002. Where conscious sensation takes place. Conscious Cogn 11: 475–477. Kobor, I., L. Furedi, G. Kovacs, C. Spence, and Z. Vidnyanszky. 2006. Back-to-front: Improved tactile discrimination performance in the space you cannot see. Neurosci Lett 400: 163–167. Körding, K. P., and D. M. Wolpert, 2004. Bayesian integration in sensorimotor learning. Nature 427: 244–247. Lackner, J. R. 1988. Some proprioceptive influences on the perceptual representation of body shape and orientation. Brain 111(Pt 2): 281–297. Lackner, J. R., and P. A. DiZio. 2000. Aspects of body self-calibration. Trends Cogn Sci 4: 279–288. Lackner, J. R., and B. Shenker. 1985. Proprioceptive influences on auditory and visual spatial localization. J Neurosci 5: 579–583. Lacroix, R., R. Melzack, D. Smith, and N. Mitchell. 1992. Multiple phantom limbs in a child. Cortex 28: 503–507. Ladavas, E. 2002. Functional and dynamic properties of visual peripersonal space. Trends Cogn Sci 6: 17–22. Ladavas, E., G. di Pellegrino, A. Farne, and G. Zeloni. 1998. Neuropsychological evidence of an integrated visuotactile representation of peripersonal space in humans. J Cogn Neurosci 10: 581–589. Lenggenhager, B., T. Tadi, T. Metzinger, and O. Blanke. 2007. Video ergo sum: manipulating bodily selfconsciousness. Science 317: 1096–1099. Lloyd, D. M., D. I. Shore, C. Spence, and G. A. Calvert. 2003. Multisensory representation of limb position in human premotor cortex. Nat Neurosci 6: 17–18.
578
The Neural Bases of Multisensory Processes
Longo, M. R., F. Schuur, M. P. Kammers, M. Tsakiris, and P. Haggard. 2008. What is embodiment? A psychometric approach. Cognition 107: 978–998. Lotze, M., H. Flor, W. Grodd, W. Larbig, and N. Birbaumer. 2001. Phantom movements and pain. An fMRI study in upper limb amputees. Brain 124: 2268–2277. Macaluso, E., C. D. Frith, and J. Driver. 2002. Crossmodal spatial influences of touch on extrastriate visual areas take current gaze direction into account. Neuron 34: 647–658. Macaluso, E., C. D. Frith, and J. Driver. 2007. Delay activity and sensory–motor translation during planned eye or hand movements to visual or tactile targets. J Neurophysiol 98: 3081–3094. Makin, T. R., N. P. Holmes, and E. Zohary. 2007. Is that near my hand? Multisensory representation of peri personal space in human intraparietal sulcus. J Neurosci 27: 731–740. Mangun, G. R., and S. A. Hillyard. 1988. Spatial gradients of visual attention: Behavioral and electrophysiological evidence. Electroencephalogr Clin Neurophysiol 70: 417–428. Maravita, A., and A. Iriki. 2004. Tools for the body (schema). Trends Cog Sci 8: 79–86. Maravita, A., C. Spence, S. Kennett, and J. Driver. 2002. Tool-use changes multimodal spatial interactions between vision and touch in normal humans. Cognition 83: B25–B34. Marino, B. F., N. Stucchi, E. Nava, P. Haggard, and A. Maravita. 2010. Distorting the visual size of the hand affects hand pre-shaping during grasping. Exp Brain Res 202: 499–505. McGonigle, D. J., R. Hanninen, S. Salenius, R. Hari, R. S. Frackowiak, and C. D. Frith. 2002. Whose arm is it anyway? An fMRI case study of supernumerary phantom limb. Brain 125: 1265–1274. Metzinger, T. 2009. Why are out-of-body experiences interesting for philosophers? The theoretical relevance of OBE research. Cortex 45: 256–258. Misaki, M., E. Matsumoto, and S. Miyauchi. 2002. Dorsal visual cortex activity elicited by posture change in a visuo- tactile matching task. Neuroreport 13: 1797–1800. Mort, D. J., P. Malhotra, S. K. Mannan, C. Rorden, A. Pambakian, C. Kennard, and M. Husain. 2003. The anatomy of visual neglect. Brain 126: 1986–1997. Mullette-Gillman, O. A., Y. E. Cohen, and J. M. Groh. 2005. Eye-centered, head-centered, and complex coding of visual and auditory targets in the intraparietal sulcus. J Neurophysiol 94: 2331–2352. Naito, E. 2004. Sensing limb movements in the motor cortex: How humans sense limb movement. Neuroscientist 10: 73–82. Naito, E., P. E. Roland, and H. H. Ehrsson. 2002. I feel my hand moving: A new role of the primary motor cortex in somatic perception of limb movement. Neuron 36: 979–988. Obayashi, S., M. Tanaka, and A. Iriki. 2000. Subjective image of invisible hand coded by monkey intraparietal neurons. Neuroreport 11: 3499–3505. Pagel, B., T. Heed, and B. Röder. 2009. Change of reference frame for tactile localization during child development. Dev Sci 12: 929–937. Paqueron, X., M. Leguen, D. Rosenthal, P. Coriat, P. J. C. Willer, and N. Danziger. 2003. The phenomenology of body image distortions induced by regional anaesthesia. Brain 126: 702–712. Pare, M., and R. H. Wurtz. 1997. Monkey posterior parietal cortex neurons antidromically activated from superior colliculus. J Neurophysiol 78: 3493–3497. Pare, M., and R. H. Wurtz. 2001. Progression in neuronal processing for saccadic eye movements from parietal cortex area LIP to superior colliculus. J Neurophysiol 85: 2545–2562. Parsons, L. M. 1987. Imagined spatial transformations of one’s hands and feet. Cogn Psychol 19: 178–241. Pavani, F., C. Spence, and J. Driver. 2000. Visual capture of touch: Out-of-the-body experiences with rubber gloves. Psychol Sci 11: 353–359. Pellijeff, A., L. Bonilha, P. S. Morgan, K. McKenzie, and S. R. Jackson. 2006. Parietal updating of limb posture: An event-related fMRI study. Neuropsychologia 44: 2685–2690. Pesaran, B., M. J. Nelson, and R. A. Andersen. 2006. Dorsal premotor neurons encode the relative position of the hand, eye, and goal during reach planning. Neuron 51: 125–134. Pouget, A., S. Deneve, and J. R. Duhamel. 2002. A computational perspective on the neural basis of multi sensory spatial representations. Nat Rev Neurosci 3: 741–747. Press, C., C. Heyes, P. Haggard, and M. Eimer. 2008. Visuotactile learning and body representation: An ERP study with rubber hands and rubber objects. J Cogn Neurosci 20: 312–323. Previc, F. H. 1998. The neuropsychology of 3-D space. Psychol Bull 124: 123–164. Ramachandran, V. S. 1993. Behavioral and magnetoencephalographic correlates of plasticity in the adult human brain. Proc Natl Acad Sci U S A 90: 10413–10420. Ramachandran, V. S., and W. Hirstein. 1997. Three laws of qualia—what neurology tells us about the biological functions of consciousness, qualia and the self. J Consciousness Stud 4: 429–458.
The Body in a Multisensory World
579
Ramachandran, V. S., and W. Hirstein. 1998. The perception of phantom limbs. The D. O. Hebb lecture. Brain 121(Pt 9): 1603–1630. Rizzolatti, G., G. Luppino, and M. Matelli. 1998. The organization of the cortical motor system: New concepts. Electroencephalogr Clin Neurophysiol 106: 283–296. Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981a. Afferent properties of periarcuate neurons in macaque monkeys. I. Somatosensory responses. Behav Brain Res 2: 125–146. Rizzolatti, G., C. Scandolara, M. Matelli, and M. Gentilucci. 1981b. Afferent properties of periarcuate neurons in macaque monkeys: II. Visual responses. Behav Brain Res 2: 147–163. Röder, B., A. Kusmierek, C. Spence, and T. Schicke. 2007. Developmental vision determines the reference frame for the multisensory control of action. Proc Natl Acad Sci U S A 104: 4753–4758. Röder, B., F. Rösler, and C. Spence. 2004. Early vision impairs tactile perception in the blind. Curr Biol 14: 121–124. Roux, F. E., J. A. Lotterie, E. Cassol, Y. Lazorthes, J. C. Sol, and I. Berry. 2003. Cortical areas involved in virtual movement of phantom limbs: comparison with normal subjects. Neurosurgery 53: 1342–1352. Saadah, E. S., and R. Melzack. 1994. Phantom limb experiences in congenital limb-deficient adults. Cortex 30: 479–485. Sakata, H., Y. Takaoka, A. Kawarasaki, and H. Shibutani. 1973. Somatosensory properties of neurons in the superior parietal cortex (area 5) of the rhesus monkey. Brain Res 64: 85–102. Schaefer, M., H. Flor, H. J. Heinze, and M. Rotte. 2007. Morphing the body: Illusory feeling of an elongated arm affects somatosensory homunculus. Neuroimage 36: 700–705. Scherberger, H., M. A. Goodale, and R. A. Andersen. 2003. Target selection for reaching and saccades share a similar behavioral reference frame in the macaque. J Neurophysiol 89: 1456–1466. Schicke, T., F. Bauer, and B. Röder. 2009. Interactions of different body parts in peripersonal space: how vision of the foot influences tactile perception at the hand. Exp Brain Res 192: 703–715. Schicke, T., and B. Röder. 2006. Spatial remapping of touch: Confusion of perceived stimulus order across hand and foot. Proc Natl Acad Sci U S A 103: 11808–11813. Schlack, A., S. J. Sterbing-D’Angelo, K. Hartung, K. P. Hoffmann, and F. Bremmer. 2005. Multisensory space representations in the macaque ventral intraparietal area. J Neurosci 25: 4616–4625. Sellal, F., C. Renaseau-Leclerc, and R. Labrecque. 1996. The man with 6 arms. An analysis of supernumerary phantom limbs after right hemisphere stroke. Rev Neurol (Paris) 152: 190–195. Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Brain Res Cogn Brain Res 14: 147–152. Shore, D. I., E. Spry, and C. Spence. 2002. Confusing the mind by crossing the hands. Brain Res Cogn Brain Res 14: 153–163. Simmel, M. L. 1962. The reality of phantom sensations. Soc Res 29: 337–356. Spence, C., F. Pavani, and J. Driver. 2004a. Spatial constraints on visual–tactile cross-modal distractor congruency effects. Cogn Affect Behav Neurosci 4: 148–169. Spence, C., F. Pavani, A. Maravita, and N. Holmes, N. 2004b. Multisensory contributions to the 3-D representation of visuotactile peripersonal space in humans: Evidence from the crossmodal congruency task. J Physiol Paris 98: 171–189. Stein, B. E., M. T. Wallace, and M. A. Meredith. 1995. Neural mechanisms mediating attention and orientation to multisensory cues. In The Cognitive Neurosciences, ed. M. S. Gazzaniga, 683–702. Cambridge, MA: MIT Press, Bradford Book. Stricanne, B., R. A. Andersen, and P. Mazzoni. 1996. Eye-centered, head-centered, and intermediate coding of remembered sound locations in area LIP. J Neurophysiol 76: 2071–2076. Stuphorn, V., E. Bauswein, and K. P. Hoffmann. 2000. Neurons in the primate superior colliculus coding for arm movements in gaze-related coordinates. J Neurophysiol 83: 1283–1299. Tark, K. J., and C. E. Curtis. 2009. Persistent neural activity in the human frontal cortex when maintaining space that is off the map. Nat Neurosci 12: 1463–1468. Trommershäuser, J., L. T. Maloney, and M. S. Landy. 2003. Statistical decision theory and trade-offs in the control of motor response. Spat Vis 16: 255–275. Tsakiris, M. 2010. My body in the brain: a neurocognitive model of body-ownership. Neuropsychologia 48: 703–712. Tsakiris, M., M. D. Hesse, C. Boy, P. Haggard, and G. R. Fink. 2007. Neural signatures of body ownership: A sensory network for bodily self-consciousness. Cereb Cortex 17: 2235–2244. Türker, K. S., P. L. Yeo, and S. C. Gandevia. 2005. Perceptual distortion of face deletion by local anaesthesia of the human lips and teeth. Exp Brain Res 165: 37–43.
580
The Neural Bases of Multisensory Processes
Walton, M. M., B. Bechara, and N. J. Gandhi. 2007. Role of the primate superior colliculus in the control of head movements. J Neurophysiol 98: 2022–2037. Xing, J., and R. A. Andersen. 2000. Models of the posterior parietal cortex which perform multimodal integration and represent space in several coordinate frames. J Cogn Neurosci 12: 601–614. Yamamoto, S., and S. Kitazawa. 2001a. Reversal of subjective temporal order due to arm crossing. Nat Neurosci 4: 759–765. Yamamoto, S., and S. Kitazawa. 2001b. Sensation at the tips of invisible tools. Nat Neurosci 4: 979–980. Yamamoto, S., S. Moizumi, and S. Kitazawa. 2005. Referral of tactile sensation to the tips of L-shaped sticks. J Neurophysiol 93: 2856–2863. Yue, Z., G. N. Bischof, X. Zhou, C. Spence, and B. Röder. 2009. Spatial attention affects the processing of tactile and visual stimuli presented at the tip of a tool: An event-related potential study. Exp Brain Res 193: 119–128.
Section VII Naturalistic Multisensory Processes: Motion Signals
29
Multisensory Interactions during Motion Perception From Basic Principles to Media Applications Salvador Soto-Faraco and Aleksander Väljamäe
CONTENTS 29.1 Introduction........................................................................................................................... 583 29.2 Basic Phenomenology of Multisensory Interactions in Motion Perception.......................... 584 29.3 Some Behavioral Principles . ................................................................................................ 586 29.3.1 What Is the Processing Level at Which Cross-Modal Interactions in Motion Processing Originate?................................................................................................ 586 29.3.2 Are These Interactions Specific to Motion Processing?............................................ 588 29.3.3 Pattern of Modality Dominance................................................................................ 588 29.3.4 Multisensory Integration of Motion Speed................................................................ 589 29.4 Neural Correlates of Multisensory Integration of Motion..................................................... 591 29.4.1 Multisensory Motion Processing Areas in the Brain................................................ 591 29.4.2 Evidence for Cross-Modal Integration of Motion Information in the Human Brain............................................................................................................. 592 29.5 Motion Integration in Multisensory Contexts beyond the Laboratory.................................. 593 29.5.1 Sound Compensating for Reduced Visual Frame Rate............................................. 593 29.5.2 Filling in Visual Motion with Sound......................................................................... 594 29.5.3 Perceptually Optimized Media Applications............................................................ 596 29.6 Conclusions............................................................................................................................ 597 Acknowledgments........................................................................................................................... 598 References....................................................................................................................................... 598
29.1 INTRODUCTION Hearing the blare of an ambulance siren often impels us to trace the location of the emergency vehicle with our gaze so we can quickly decide which way to pull the car over. In doing so, we must combine motion information from the somehow imprecise but omnidirectional auditory system with the far more precise, albeit spatially bounded, visual system. This type of multisensory interplay, so pervasive in everyday life perception of moving objects, has been largely ignored in the scientific study of motion perception until recently. Here, we provide an overview of recent research about behavioral and neural mechanisms that support the binding of different sensory modalities during the perception of motion, and discuss some potential extensions of this research into the applied context of audiovisual media. 583
584
The Neural Bases of Multisensory Processes
29.2 BASIC PHENOMENOLOGY OF MULTISENSORY INTERACTIONS IN MOTION PERCEPTION Early research addressing the perception of motion when more than one sensory modality is involved mainly focused on psychophysical tasks in humans (for a review of early research, see Soto-Faraco and Kingstone 2004). The most popular approach has been the use of intersensory conflict situations, where incongruent information in two sensory modalities is presented to the observer (as illustrated, for spatially static events, in the famous ventriloquist illusion; de Gelder and Bertelson 2003; Howard and Templeton 1966). Some of the early observations arising from intersensory conflict were already indicative of the consequences of cross-modality interactions in the perception of motion direction (Anstis 1973; Zapparoli and Reatto 1969). Zapparoli and Reatto (1969), for instance, reported that when presenting their observers with combinations of directionally incongruent visual and auditory apparent motion streams,* some of them reported that they experienced the two modalities moving in a unified trajectory (see also Anstis 1973, for a similar kind of introspective report regarding subjective perception of motion coherence upon auditory– visual motion conflict). More recent research sought to confirm these introspective data via more sophisticated experimental tasks (e.g., Allen and Kohlers 1981; Staal and Donderi 1983). For example, Allen and Kohlers estimated the interstimulus interval that leads to the perception of apparent motion between two discrete stimuli in one sensory modality (vision or audition), as a function of the timing and directional congruency of two discrete events in another sensory modality (audition or vision, respectively) presented concurrently. For example, in their study, Allen and Kohlers (see also Staal and Donderi 1983) found significant cross-modal effects of vision on the likelihood of perceiving motion in auditory apparent motion displays, whereas the effects in the reverse direction were much weaker, if present at all. Although these studies did not specifically measure perceived direction of motion, their results often revealed cross-modal influences that were independent of the directional congruency between modalities. Given the results of later studies (described below), this failure to find direction congruency effects could have been more related to methodological peculiarities of the setups used in these studies than due to the actual absence of cross-modal interaction in terms of motion direction (see Soto-Faraco and Kinsgstone 2004 and Soto-Faraco et al. 2004a, for further discussion of these confounds). Along these lines, Mateeff et al. (1985) reported an elegant study with more tightly controlled conditions where the speed and direction of a moving sound were adjusted psychophysically for subjective steadiness. When visual motion was presented concurrently but in the opposite direction of the moving sound, the sound needed to move at a velocity of 25% to 50% of that of the distractor light for subjective sound steadiness to be achieved. This finding suggests, again, that the perception of motion direction reflects the outcome of a combination process involving directional information available to both sensory modalities. Some of the past work in our laboratory has attempted to apply the logic of the seminal introspective studies described earlier, from a psychophysical viewpoint. In particular, we developed a procedure based on intersensory conflict whereby directionally incongruent motion signals were presented in audition and vision (Soto-Faraco et al. 2002, 2004a). In a typical task, participants are asked to report the direction of an auditory apparent motion stream while an (irrelevant) visual apparent motion stream is presented concurrently in the same or different direction (Figure 29.1a and b). The results in this type of experiments, replicated now many times and in different laboratories, are clear-cut. Responses to the sound direction are very accurate when a concurrent visual motion stream is presented in the same direction as the sounds (about 100% correct), whereas sound * The phenomenon of apparent motion (namely, experiencing a connected trajectory across two discrete events presented successively at alternate locations) has been described in different sensory modalities, including the classic example of vision (Exner 1875; Wertheimer 1912) but also in audition and touch (Burt 1917a, 1917b; Hulin 1927; Kirman 1974). Moreover, there is evidence suggesting that the principles governing apparent motion are similar for the different senses (Lakatos and Shepard 1997).
585
Multisensory Interactions during Motion Perception (c)
0,75 0,50 0,25 0,00
Sound
500 ms
Light
Frequency
Light
1.00
Asynchronous conflicting
500 ms
0.80
Sound
0.60
Asynchronous congruent
Observed frequency Gaussian fit
0.40
Sound Light
70 60 50 40 30 20 10 0
Distribution of the cross-modal dynamic capture (n = 384)
0.20
Synchronous conflicting
Light
Asynch.
0.00
Sound
–0.20
Synchronous congruent
–0.40
(d)
Synch.
–0.60
(b)
Conflicting
1,00
–0.80
Task: Judge sound direction
Proportion correct
Sound Light
Congruent
51% Capture
–1.00
(a)
Size of CDC
FIGURE 29.1 Cross-modal dynamic capture effect. (a) Observer is presented with auditory motion together with visual motion along the horizontal plane, and is asked to determine the direction of sounds and ignore the visual event. (b) Examples of different kinds of trials used in the task, combining directional congruency and synchrony between sound and light. (c) Typical outcome in this task, where accuracy in sound direction task is strongly influenced by congruency of visual distractor (CDC effect), but only in synchronous trials. (d) Histogram regarding the size of congruency effect across a sample of 384 participants who performed this task across a variety of experiments, but under comparable conditions.
motion discrimination performance drops dramatically (by 50%) when the lights are presented synchronously but move in the opposite direction (see Figure 29.1c and d). This effect of directional congruency, termed cross-modal dynamic capture (CDC), occurs with equivalent strength when using continuous (rather than apparent) motion displays, but is eliminated if the visual and auditory signals are desynchronized in time by as little as half a second. One interesting aspect is that the frequent errors made by observers under directionally incongruent audiovisual motion are better explained by a phenomenological reversal in the direction of sounds, rather than by mere confusion. This latter inference is supported by the finding that the same pattern of directional congruency effects is seen even after filtering out low confidence responses (self-rated by the observer, after every trial) from the data (Soto-Faraco et al. 2004a). Another relevant finding supporting the existence of multisensory integration between motion signals comes from an adaptation paradigm developed by Kitagawa and Ichihara (2002). In their study, Kitagawa and Ichihara adapted observers with visual motion either receding from or looming toward them, and found adaptation aftereffects not only on the perceived direction of subsequent visual stimuli, but also on auditory stimuli. For example, after adapting observers to looming visual motion, a steady sound would appear to move away from them (i.e., its intensity would seem to fade off slightly over time). This result supports the early nature of multisensory interactions between auditory and visual motion detectors. Interestingly, Kitagawa and Ichihara also tested adaptation aftereffects when adapting observers with combined auditory and visual motion moving stimuli, and found that the magnitude of the adaptation effect depended on the directional congruency
586
The Neural Bases of Multisensory Processes
between the two adaptor motion signals (for related findings, see Väljamäe and Soto-Faraco 2008; Vroomen and de Gelder 2003). In summary, the findings discussed above seem to point to the existence of robust interactions between sensory modalities during the extraction of motion information, and in particular of its direction. However, they are still far from providing a full characterization of these interactions at a behavioral and at a neural level. We provide some of the main findings regarding these two aspects in the following sections.
29.3 SOME BEHAVIORAL PRINCIPLES Despite the existence of strong phenomenological correlates suggesting cross-modal interactions in motion perception, there are a number of important questions that need to be addressed for a complete characterization of these interactions. We discuss some of them below.
29.3.1 What Is the Processing Level at Which Cross-Modal Interactions in Motion Processing Originate? One critical question for any cross-modal interaction arising from intersensory conflict concerns its level of processing (e.g., Bertelson and Aschersleben 1998; de Gelder and Bertelson 2003). That is, given intersensory conflict between the stimuli present in the display, there are a number of levels of information processing at which interactions could potentially occur, ranging from early, sensory stages up to late, decisional ones. Whereas the former stages of information processing are relevant in terms of characterizing multisensory interactions during perception, the latter ones are relatively uninformative to this respect, and they inform us about cognitive mechanisms that apply to response selection in general, and not necessarily specific to multisensory interactions. The general problem in the interpretation of intersensory conflict paradigms has been well described by a number of authors (e.g., Bertelson and Aschersleben 1998; Choe et al. 1975; de Gelder and Bertelson 2003; Welch 1999, about this issue) and has produced a long historical debate in the case of the ventriloquist illusion. The problem can be stated as follows: irrelevant (to-beignored) information present in the display maps onto the response set available to the participant in a way that, for incongruent (conflict) trials this irrelevant modality primes the erroneous response to the target, whereas in the congruent (no conflict) trials the distractor favors the appropriate response. This creates at least two types of confound, one based on stimulus–response compatibility effects (first reported by Fitts and Seeger 1953; see also Fitts and Deininger 1954; Simon 1969; see Hommel 2000, for a review) and the other based on the potential (conscious or unconscious) strategies adopted by participants based on their awareness about the conflicting nature of some trials, also referred to as cognitive bias (e.g., Bertelson and Aschersleben 1998). Either type of bias, in isolation or combined, can provide a sufficient explanation of many intersensory conflict results based on known empirical facts, without the need to resort to cross-modal interactions at the perceptual level. Like in other domains, however, in the specific case of multisensory interactions during motion perception, a series of results have shown confidently that interactions at a perceptual level are relevant. Results from adaptation aftereffects, such as those shown in Kitagawa and Ichihara’s (2002) study, favor an interpretation in terms of early stages of processing, given that aftereffects are often attributed to fatigue of motion detectors in sensory brain areas. However, note that Kitagawa and Ichihara’s study still contains the elements for a post-perceptual interpretation (i.e., presence of intersensory conflict, from adaptation to test phase, and awareness of the conflict). Soto-Faraco et al. (2005) used a variation of the original CDC task described above, but replaced the unimodal “left vs. right” discrimination task with a “same vs. different” task in which participants were asked to compare motion direction across the two modalities presented (auditory and visual). Soto-Faraco et al. found that participants were unable to distinguish between same- and
Multisensory Interactions during Motion Perception
587
different-direction audiovisual apparent motion streams unless the interstimulus interval between the two discrete flashes/beeps was larger than 300 ms. Yet, the same observers were able to accurately discriminate the direction of apparent motion streams in each sensory modality individually for interstimulus intervals below 75 ms. Given that conflict between the stimulus (left–right) and response (same–different) was not possible in this paradigm, stimulus–response compatibility could be ruled out as the source of the behavioral effect. Moreover, in this experiment the interstimulus interval was adjusted using interleaved adaptive staircases to reach the point of perceptual uncertainty. At this point, by definition participants are not aware of whether they are being presented with a conflicting or a congruent trial, and thus cannot adopt strategies based on stimulus congruence, thereby also ruling out cognitive biases. After eliminating stimulus–response compatibility and cognitive biases as possible explanations, the participants’ failure to individuate the direction of each sensory modality in multisensory displays can only be attributed to an interference at a perceptual level (Soto-Faraco et al. 2005). Other approaches that have been used to disentangle the contribution of perceptual versus postperceptual mechanisms in cross-modal motion effects include the use of analytic tools such as the signal detection theory (see MacMillan and Creelman 1991), where an independent estimation of the sensitivity (associated to perceptual sources) and the decision bias (associated to post-perceptual sources) can be obtained (e.g., Sanabria et al. 2007; Soto-Faraco et al. 2006). In Sanabria et al. (2007) and Soto-Faraco et al. (2006) studies, for example, participants were asked to discriminate left-moving sounds (signal) from right-moving sounds (noise) in the context of visual stimuli that moved in a constant direction throughout the whole experimental block (always left or always right). The findings were clear, the presence of visual motion lowered sensitivity (d′) to sound direction as compared to a no-vision baseline, regardless of whether sound direction was consistent or inconsistent with the visual distractor motion. That is, visual motion made signal and noise motion signals in the auditory modality more similar between them, and thus, discrimination was more difficult. However, response criterion (c) in this task shifted consistently with the direction of the visual distractor. In sum, this experiment was able to dissociate the effects of perceptual interactions from the effects at the response level. Other authors have used a similar strategy to disentangle the contribution of perceptual versus post-perceptual processes in somewhat different types of displays (see Meyer and Wuerger 2001; Meyer et al. 2005). For instance, Meyer and Wuerger presented their participants with a visual direction discrimination task (using random dot kinematograms) in the context of auditory distractor motion. They used a mathematical model that included a sensitivity parameter and a bias parameter, and found that most of the influence that auditory motion had on the detection responses to visual random dot displays was explained by a decision bias (for a similar strategy, see Alais and Burr 2004a). This result highlights the importance that post-perceptual biases can have in experiments using cross-modal distractors, and, in part, contrasts with the result presented above using the CDC task. Although it is difficult to compare across these methodologically very different studies, part of the discrepancy might root in the use of vision as the target modality (as opposed to sound), and the target stimulus being near the threshold for direction discrimination (in Meyer et al.’s case, random dot displays with low directional coherence). More recent applications of this type of approach have revealed, however, that one can obtain both a shift in sensitivity over and above any bias effects (Meyer et al. 2005; Wuerger et al. 2003). In sum, the presence of decision and cognitive influences is very likely to have an effect in most of the tasks used to address multisensory contributions to motion processing. Yet, the data of several independent studies seems to show rather conclusively that influences at the level of perception do also occur during multisensory motion processing. It must be noted, however, that there are limits on how early in the processing hierarchy this cross-modal interaction can occur. For example, there is evidence that cross-modal motion integration takes place only after certain unisensory perceptual processes have been completed, such as visual perceptual grouping (e.g., Sanabria et al. 2004a, 2004b) or the computation of visual speed (López-Moliner and Soto-Faraco 2007; see Section 29.2.4).
588
The Neural Bases of Multisensory Processes
29.3.2 Are These Interactions Specific to Motion Processing? Although the studies described above have used moving stimuli for their tasks, one potential concern is about the specificity of the conclusions. That is, the processes whereby these multisensory interactions arise might not be based on motion information (i.e., direction of motion), but on other features that are unrelated to motion per se but present in the displays, such as the location and timing of the onsets or offsets of the stimuli (for a discussion of this problem, see Meyer and Wuerger 2001; SotoFaraco et al. 2002, 2004a). This is especially relevant, but not unique, for experiments using apparent motion displays. If an explanation based on nonmotion features was possible, then the results of most of the studies discussed up to know would not be particularly informative as to whether and how multisensory integration of motion information occurs. A number of findings suggest, however, that multisensory integration does indeed take place specifically during motion processing. For instance, Soto-Faraco et al. (2004a) estimated that the magnitude of the CDC (see Figure 29.1c) for apparent motion displays was about 50%, that is, sound direction was judged erroneously in about half the conflicting trials, whereas it was nearly 100% accurate in congruent trials (thus, for an effect of 50%). Using the same setup, spatial disparity, and timings, they assessed the chances that any of the two individual components of auditory apparent motion stream was ventriloquized toward a concurrent light flash, which resulted in an effect magnitude of only 17%. From this result, it would appear that the consequences of CDC go beyond what can be expected on the basis of simple static capture (see also Soto-Faraco et al. 2002, Experiment 3, for a similar result). The exploration of split-brain patient JW afforded another independent test of the critical role of motion for CDC (Soto-Faraco et al. 2002). JW had his corpus callosum (all the fibers connecting the left and right cerebral hemispheres) surgically sectioned, so that there is no direct communication between his two hemispheres at a cortical level. Whereas auditory apparent motion is experienced normally by JW (given that the auditory pathways do not cross at the corpus callosum), he does not experience the typical impression of visual apparent motion when presented with two sequential flashes at different hemifields. This is so despite that JW’s ability to localize individual flashes presented laterally is spared (Gazzaniga 1987; Soto-Faraco et al. 2002). Interestingly, when tested in the CDC task, JW barely made any errors regarding the direction of sounds in incongruent trials (thus, in stark contrast with the healthy controls, which experienced the typical 50% CDC effect). This result suggests that JW’s inability to experience visual apparent motion across the midline spared him from the CDC effect, and reveals that the experience of motion is essential for the crossmodal interactions observed in previous studies.
29.3.3 Pattern of Modality Dominance The studies discussed up to this point have touched on the case of audiovisual interactions in motion perception, and in particular on the effects of visual motion on the perception of sound motion. This bias reflects the imbalance of modality combinations being represented in current cross-modal literature addressing motion perception. Yet, cross-modal literature should make us aware that many multisensory phenomena present asymmetries, so that visual effects on audition are substantially different from auditory effects on vision (i.e., the ventriloquist illusion; e.g., Alais and Burr 2004b; Radeau and Bertelson 1976; the McGurk effect, McGurk and MacDonald 1976). What is more, since motion can be extracted from tactile signals, as well as from visual and acoustic ones, different modality combinations may involve different sets of constraints. Previous research in multisensory integration of motion already suggests a clear asymmetry whereby effects of audition on vision are smaller, if present at all, than those typically seen for vision on audition (e.g., Allen and Kohlers 1981; Kitagawa and Ichihara 2002; Meyer and Wuerger 2001; Ohmura 1987). For example, the CDC effect does not occur at all, or does occur very weakly, when visual apparent motion is the target and sounds are the to-be-ignored modality (e.g., SotoFaraco et al. 2004a). There are indeed some reports of acoustic influences on visual motion, but
Multisensory Interactions during Motion Perception
589
these are invariably obtained when the visual signal is at or near threshold (e.g., Alais and Burr 2004b; Meyer and Wuerger 2001). We have incorporated touch to the CDC paradigm in several studies (e.g., Oruç et al. 2008; Sanabria et al. 2005a, 2005b; Soto-Faraco et al. 2004b). In these cases, participants were asked to wear vibrotactile stimulators at their index finger pads and place their hands resting on the table, near the LEDs and/or loudspeakers. Tactile apparent motion was generated by presenting a brief (50 ms) sine wave (200 Hz) vibration in alternation to each index finger. In this way, auditory, tactile, and visual apparent motion streams could be presented using equivalent stimulus parameters in terms of onset time, duration, and spatial location. All possible combinations of distractor and target modality using tactile, visual, and acoustic stimuli were tested (Oruç et al. 2008; Soto-Faraco et al. 2000; Soto-Faraco and Kingstone 2004). When considered as a whole, the results of these experiments reveal a hierarchy of sensory modalities with respect to their contribution to the perception of motion direction. Vision has a strong influence on auditory motion, yet acoustic distractors did not modulate the perception of visual motion (along the lines of other recent results, such as those of Meyer and Wuerger 2001; Kitagawa and Ichiara 2002). A similar pattern applies to visuo-tactile interactions, whereby vision captures tactile motion direction but touch hardly exerted any influence on the perception of motion in vision. The combination of auditory and tactile motion stimuli, however, showed reciprocal influence between both modalities, albeit with a stronger effect of touch on sound than the reverse. This particular hierarchy, however, must be considered with some caution, given that factors such as stimulus saliency, reliability, and even cognitive aspects such as attention, may indeed exert an important influence on the relative strength of the modalities. For example, it has been shown that directing the focus of attention to one or another modality can modulate CDC in the case of audio-tactile interactions, although not in modality pairings where vision was involved (Oruç et al. 2008). According to the findings described above, vision would be the most dominant sense in terms of its contribution to compute the direction of motion, then touch, and lastly audition. Within this framework, multisensory integration would not consist of a process in which one modality overrides the information in another modality (as the results of the audiovisual case, when considered in isolation, might sometimes suggest; for a similar example based on the dominance of vision over touch in shape/size perception, see, e.g., Rock and Harris 1967). Instead, the results support the proposal that multisensory integration of motion would abide to some kind of weighted combination between different information sources (see López-Moliner and Soto-Faraco 2007). If this is so, then the strength to which each modality is weighted during the unisensory perception of motion becomes a particularly relevant issue. This is clearly a matter for further research, but based on the success in explaining cross-modal results regarding other perceptual domains, one could borrow the ideas of modality appropriateness or the so-called optimal integration model (Ernst and Banks 2002; Ernst and Bulthoff 2004) to answer this question.
29.3.4 Multisensory Integration of Motion Speed Most of current knowledge regarding multisensory contributions to motion perception has been based on tasks using discrimination of motion direction, or detection of motion signals. However, another fundamental property of motion, velocity, has been largely neglected in these past studies (for exceptions see, López-Moliner and Soto-Faraco 2007; Manabe and Riquimaroux 2000). LópezMoliner and Soto-Faraco (2007) addressed whether, and how, different sensory sources of velocity information influence each other during the perception of motion. Participants were presented with moving sound images (left to right or right to left) at varying speeds in a 2IFC task, with the velocity of the comparison sounds being determined by QUEST staircases (see Figure 29.2a) to be equalled to a standard sound moving at 30° s−1. The comparison sound, however, could be presented concurrent with a variety of visual sinusoidal gratings moving at different velocities, ranging from 15 to 45° s−1. The results clearly showed a shift in perceived sound velocity as a function of the velocity
590
The Neural Bases of Multisensory Processes
(a)
(b)
Set of moving visual gratings used
Standard sound (velocity = 30º/s)
Spatial frequency (cycles/º)
12
Comparison sound (velocity determined by QUEST staircase)
(c)
45º/s
10
30º/s
8
15º/s
6 4 2 0
0
0.5
1
Temporal frequency (Hz) (e)
10
15
20
25
PSE
30
35
40
(d)
0.20 0.25 0.30 0.35 0.40 0.45 0.50
Spatial frequency (c/deg)
4
6
8
10
Temporal frequency (Hz)
15
20
25
30
35
40
45
Velocity
FIGURE 29.2 A summary of results about cross-modal effects in velocity perception, from López-Moliner and Soto-Faraco’s study. (a) 2IFC paradigm involved a velocity discrimination judgment regarding sounds (Was the second sound faster or slower than the first?), where second interval could contain either no visual motion or else moving gratings. (b) Graphical description of different gratings used in the task in space defined by temporal and spatial frequency. Note that two exemplars of each motion velocity formed by different combinations of spatial and temporal frequencies were used. Each of three velocities is denoted by a different symbol (see labels next to symbols). (c–e) Point of subjective equality for sound velocity with reference to a 30° s−1 standard, when combined with different kinds of moving gratings. Same data are depicted as a function of (c) spatial frequency, (d) temporal frequency, and (e) velocity of gratings. It can be seen that results pattern best when depicted along velocity axis.
of the concurrent visual stimulus, so that slow visual motion made participants underestimate the velocity of concurrent sounds, and rapid visual motion made people perceive the sounds moving faster than they really were. In this study, gratings composed of different combinations of spatial and temporal frequencies could represent visual motion of a given velocity (see Figure 29.2b). This was done so because sinusoidal moving gratings can be conveniently separated into spatial frequency (sf ) and temporal frequency (tf ) (Watson and Ahumada 1983), and velocity (v) of a grating can be expressed by the ratio between its tf (in Hz) and its sf (number of cycles per degree of visual angle). This spatiotemporal definition of stimulus space has been previously used to characterize the spectral receptive fields of neurons at various levels of the visual system in the monkey (e.g., Perrone and Thiele 2001), and it has received confirmation from human psychophysics (Reisbeck and Gegenfurtner 1999). For instance, many neurons in the middle temporal cortex (MT) encode velocity, in the sense that the set of stimuli that best drives these neurons lay along an isovelocity continuum in the space defined
Multisensory Interactions during Motion Perception
591
by sf and tf. Unlike MT neurons, many of the motion-sensitive neurons found at earlier stages of the visual system such as V1 fail to display an invariant response across different stimuli moving at equivalent velocities, but rather they often display a response profile tuned to particular temporal frequencies (however, see Priebe et al. 2006, for velocity responses in some V1 neurons). Given that the velocity of a grating can be decomposed in terms of spatial and temporal frequency (v = sf/ tf ), one can then attempt to isolate the influence of varying spatial and temporal frequencies of the visual stimulus, on the perceived velocity of sounds. What López-Moliner and Soto-Faraco found is that neither spatial frequency per se, nor temporal frequency, produced any systematic effect on sound velocity perception, above and beyond that explained by velocity. One could then infer that the binding of multisensory velocity information might be based on motion information that is already available at late stages of processing within the visual system. Note that this inference resonates with the finding that multisensory motion integration occurs only after perceptual grouping has been completed within vision (Sanabria et al. 2004a, 2004b).
29.4 NEURAL CORRELATES OF MULTISENSORY INTEGRATION OF MOTION 29.4.1 Multisensory Motion Processing Areas in the Brain One way in which it is particularly important to advance in order to constrain any explanation about multisensory contributions to the perception of motion is to understand its neural underpinnings. Although there are many structures in the visual system that contain motion-responsive neurons, the processing of global visual motion has repeatedly been shown to strongly involve visual area V5/ MT, in the occipitotemporal cortex. Together with its surrounding regions (the V5/MT+ complex), V5/MT has been characterized as a motion processing area in both human (e.g., Watson et al. 1993; Zihl et al. 1983, 1991) and nonhuman primate studies (e.g., Ungerleider and Desimone 1986). Area V5/MT+ is well connected with other visual areas such as the V1/V2 complex and V3, as well as higher-order areas in the posterior parietal cortex, in particular the ventral intraparietal area (VIP), within the intraparietal sulcus (IPS; e.g., Maunsell and Van Essen 1983), which contains directionally sensitive neurons [e.g., Colby et al. 1993; Duhamel et al. 1991, 1998; see also Bremmer et al. 2001b, for evidence regarding a homologous region in the human cortex using functional magnetic resonance imaging (fMRI)]. Given the strong connectivity between VIP and the ventral premotor cortex (PMv; e.g., Luppino et al. 1999), it is conceivable that this frontal area was involved in some aspects of visual motion processing. Although the level of knowledge about the network of brain areas supporting motion perception in the auditory and somatosensory systems is not nearly as detailed as that of the visual system, there have been some important recent advances. For example, some researchers have demonstrated the selective involvement of certain brain areas (such as the planum temporale, the inferior and superior parietal cortices, and the right insula) in response to auditory motion (e.g., Baumgart et al. 1999; Pavani et al. 2002). The primary and secondary somatosensory areas (SI and SII), located in the postcentral gyrus, seem to play a major role in the perception of tactile motion (e.g., Hagen et al. 2002). Beyond these brain areas associated to the processing of motion in individual sensory modalities, the relevant question, however, is “Which are the brain areas that may serve motion processing in more than one sensory modality?” Animal electrophysiology has already made some important advances in detailing several structures containing neurons that are responsive to dynamic changes and/or moving stimuli in several sensory modalities. Among the most relevant, we find several of the higher-order areas pointed out earlier as subserving visual motion processing such as VIP (Bremmer et al. 2001b; Colby et al. 1993; Duhamel et al. 1998; Schlack et al. 2005) and PMv (e.g., Graziano et al. 1994, 1997). Some parts of the temporal lobule (in particular, the superior temporal sulculs and the middle temporal gyrus; Bremmer et al. 2001a, 2001b; Bruce et al. 1981; Desimone and Gross 1979) have been also linked to the processing of dynamic stimuli in multiple sensory modalities.
592
The Neural Bases of Multisensory Processes
In humans, a particularly compelling demonstration of brain areas relevant for multisensory motion processing would be the association of motion processing deficits across sensory modalities after focal brain lesions. Yet, to our knowledge, this kind of neuropsychological evidence is still lacking (Griffiths et al. 1997b; Zihl et al. 1991). Despite this lack of neuropsychological association, a number of neuroimagnig studies in humans have helped to map multisensory brain areas that receive converging motion information from several sensory systems (e.g., Bremmer et al. 2001b; Hagen et al. 2002; Lewis et al. 2000). In Bremmer et al.’s fMRI study, human observers were scanned while presented with visual, acoustic, or somatosensory motion stimuli in alternate blocks of trials. Regions commonly active for motion in all three modalities were found in both right and left posterior parietal cortices (with local maxima suggesting a potential involvement of VIP), and also in PMv. Thus, this pattern would appear to match well with the data obtained in monkey electrophysiology (e.g., Graziano et al. 1994, 1997, 2004). In another fMRI study, Lewis et al. found a region of common activation for visual and auditory motion processing in the lateral parietal cortex, again suggesting the involvement of some areas along the IPS, as well as some frontal structures such as the lateral frontal cortex and the anterior cingulate. The results of these fMRI studies are clear in that there are some particular brain structures that are responsive to motion signals in more than one sensory modality. The combination of evidence from human neuroimaging and monkey neurophysiology helps us infer that the multisensory responses in these motion-responsive areas can be due to the presence of multisensory neurons that are sensitive to motion in at least one of their sensory modalities. This is an important step in order to disentangle the mechanisms responsible for processing multisensory motion, but it is perhaps not sufficient to enable us to claim that these areas display any particular role in the process of binding multisensory information about motion.
29.4.2 Evidence for Cross-Modal Integration of Motion Information in the Human Brain A couple of recent human brain imaging studies have shed some light precisely on the question of the brain areas that may respond preferentially to the combined presence of motion in several sensory modalities (Alink et al. 2008; Baumann and Greenlee 2007). Baumann and Greenlee used random dot kinematograms where 16% of the dots moved in a predetermined direction, to create their visual motion stimulus. In the critical comparison, they contrasted brain activity arising in a condition where sounds moved in phase versus a condition in which they moved in antiphase with respect to the direction of this visual motion stimulus. They used a stationary sound condition as their baseline, and found that, with respect to this baseline, sounds in-phase with visual motion produced a pattern of activity involving extensive regions of the superior temporal gyrus (STG), the supra marginal gyrus (SMG), the IPS, and the superior parietal lobule (SPL), in addition to some regions of the primary and secondary visual cortex. Sounds in antiphase with the visual stimulus produced a similar pattern of activity, but with weaker BOLD (blood oxygen level dependence) increase throughout, thus leading the authors to suggest that these areas are responsive to the integration of correlated motion information across modalities. In another recent study, Alink et al. (2008) used an adaptation of the CDC task (e.g., SotoFaraco et al. 2002, 2004a) to compare brain activity in response to audiovisual motion presented in directional congruency or incongruency. Alink et al. focused their analysis on unimodal motion processing regions, which were localized a priori on individual basis. These areas involved classical visual motion areas V5/MT+ as well as the auditory motion complex (AMC), an area of the posterior part of the planum temporale that had already been associated to auditory motion processing (e.g., Baumgart et al. 1999). One of Alink et al.’s relevant findings was that these unimodally defined motion processing regions responded with decreased activity when presented with directionally conflicting auditory and visual motion signals, as compared to congruent ones. This result would
Multisensory Interactions during Motion Perception
593
lend support to the idea that the consequences of multisensory integration of motion information can be traced back to relatively early (sensory) stages of motion processing (for behavioral support of this hypothesis, see Soto-Faraco et al. 2004a, 2005; Sanabria et al. 2007). A second interesting aspect of Alink et al.’s study is that they reproduced the CDC effect while measuring brain activity. Thus, they were able to contrast BOLD changes resulting from trials where the CDC illusion presumably occurred (incorrectly responded ones) with that evoked by otherwise identical trials but where sounds were perceived to move in the physically correct direction. This contrast revealed that in trials where the CDC illusion was experienced, activity in the auditory motion areas (AMC) suffered a reduction with respect to veridically perceived trials, whereas the reverse pattern occurred for visual motion areas (i.e., enhanced activity in MT/V5+ in illusion trials with respect to veridical perception). This result parallels the visual dominance pattern typically observed in the behavioral manifestation of CDC. Finally, when extending the scope of their analysis to the whole brain, Alink et al. found that conflicting motion led to the activation of an extensive network of frontoparietal areas, including the IPS and supplementary motor areas (SMA). Remarkably, in this analysis Alink et al. also found that VIP was modulated by the occurrence of illusory motion percepts (as indexed by the CDC task). In particular, not only VIP resulted more strongly activated in trials leading to illusory percepts, but this activation seemed to precede in time the activity evoked by the stimulus itself, an indication that prior state of the motion processing network might be a critical determinant for CDC to occur, and hence, of cross-modal motion integration.
29.5 MOTION INTEGRATION IN MULTISENSORY CONTEXTS BEYOND THE LABORATORY Although the behavioral and neural principles described above have represented an important advance in our understanding of how perception of multisensory motion signals works, we are still far from achieving a satisfactory characterization. Yet, this has not prevented the development of media applications, such as cinema, that are recreating a real-world illusion by using (multisensory) motion perception principles. Most of the time, these applications have been developed independently of, and even antedate, the scientific principles they are related to. Here we discuss a few relevant cases.
29.5.1 Sound Compensating for Reduced Visual Frame Rate Visual “flicker” fusion occurs at rates just above about 50 to 100 Hz (Landis 1954; van der Zee and van der Meulen 1982). Because of this perceptual characteristic of the visual system, standard cinema and television applications in the twentieth century used frame rates of 50 and 60 Hz. However, compared to the visual system, temporal accuracy of the auditory system is significantly higher, as humans are able to detect amplitude modulations of high-frequency tonal carriers of up to 600 Hz (Kohlrausch et al. 2000). Given this sensory imbalance, together with the multisensory nature of our perception, it is a likely possibility that sound can significantly alter our perception of visual world dynamics. In fact, several well-studied effects provide evidence for that, such as the “auditory driving” effect (e.g., Welch et al. 1986), the temporal ventriloquism (Morein-Zamir et al. 2003), or the freezing effect (e.g., Vroomen and de Gelder 2000). The audiovisual media industry has extensively exploited this feature of multisensory perception (auditory influence on vision in the time domain) in applications where sound has been traditionally used for highlighting temporal structure of rapid visual events. Consider, for example, sound transients articulating hits in kung fu fighting scenes (Chion 1994) or Walt Disney’s “Mickey Mousing” technique whereby a specific theme of a soundtrack is tightly synchronized with a cartoon character’s movements (Thomas and Johnston 1981). Interestingly, sound has also been used in cinema for creating an illusion of visual event continuity, “greasing the cut” as editors would put it, when
594
The Neural Bases of Multisensory Processes
an accompanying sound effect masks rough editing between two shots (Eidsvik 2005). In a classic example from George Lucas’ film, The Empire Strikes Back (1980), the visual illusion of a spaceship door sliding open is created using two successive still shots of a door closed and a door opened combined with a “whoosh” sound effect (Chion 1994). These auditory modulations of visual dynamics perception give rise to the practical question of whether sound could compensate for reduced frame rate in films. A recent study by Mastoropoulou et al. (2005) investigated the influence of sound in a forced choice discrimination task between pairs of 3-s video sequences displayed at varying temporal resolutions of 10, 12, 15, 20, or 24 frames per second (fps). Participants judged motion smoothness of the videos being presented. In visual-only conditions, naïve participants could discriminate between displays differing by as little as 4 fps. On the contrary, in audiovisual presentations participants could reliably discriminate between displays only when they differed by 14 fps. It is perhaps surprising that Mastoropoulou et al. (2005) hypothesized that divided attention might be the cause of the reported effects, without even considering the alternative explanation that audiovisual information integration might have produced the sensation of smoother visual displays altogether, thereby making it more difficult to spot discontinuities.
29.5.2 Filling in Visual Motion with Sound Recent studies focusing on cross-modal interactions have found that sound can induce the illusion of seeing a visual event when there is none. For example, Shams et al. (2000) created an illusion of multiple visual flashes by coupling a single brief visual stimulus with multiple auditory beeps. In these experiments, participants were asked to count the number of times a flickering white disk had flashed in displays containing one or more concurrent task-irrelevant brief sounds. The number of flashes reported by observers increased with a number of beeps present. In a follow-up ERP study Shams et al. (2001) reported that sound modulated early visual evoked potentials originating from the occipital cortex. Interestingly, the electrophysiological activity corresponding to the illusory flashes was found to be very similar to the activity produced when a flash was physically presented. Other research groups have demonstrated that this illusory-flash effect occurs at a perceptual level with psychophysically assessable characteristics (e.g., McCormick and Mamassian 2008) and that it does not subjectively differ from a real flash when used in orientation-discrimination tasks (Berger et al. 2003). The sound-induced flash illusion has been also studied using apparent motion, where visual bars were flashed in succession starting from one side of the screen to the other (Kamitani and Shimojo 2001). In this case, increasing the number of beeps produces additional illusory bars that are leading to a subjective experience of smoother visual object motion. In order to quantify the perceptual effects of illusory-flash in time-sampled motion contexts, such as that often seen in animated cartoons, Väljamäe and Soto-Faraco (2008) applied a methodology similar to the motion adaptation aftereffects reported by Kitagawa and Ichihara (2002), as discussed earlier (Section 29.1). Väljamäe and Soto-Faraco exposed participants to time-sampled approaching/receding visual, auditory, or audiovisual motion in depth that was simulated by changing the size of visual stimulus (0° to 9° of visual angle) and intensity of sound (40–80 dB sound pressure level range). Both unisensory and audiovisual combination of adaptors were used, with the audiovisual adaptors being either directionally congruent or conflicting (see Figure 29.3a). Visual and auditory adaptors had two frequency rates: high-rate train of flashes (flicker) or beeps (flutter) at 12.5 Hz and low-rate flicker or flutter at 6.25 Hz frequency. An adaptive staircase procedure was used to measure the amount of auditory motion aftereffect (point of subjective steadiness) induced by different adaptor stimuli. In addition, in one of the experiments participants also had to judge subjective smoothness of the visual adaptors. The results showed visual adaptation aftereffects on sounds, so that high-rate flashes produced stronger auditory motion aftereffect than flashes at a lower rate, which were largely ineffective. Importantly, when the visual adaptors were combined with high-rate flutter, not only the size of
595
Multisensory Interactions during Motion Perception
t 27 ms
Flutter, Ah– (12.5 Hz)
Loudness
133 ms
t 53 ms
Magnitude of auditory aftereffect (dB/s)
Flicker, Vl+ (6.25 Hz)
(b) 3 Visual angle
(a)
Congruent direction
2
Incongruent direction
Vl–Ah– Vl+Ah–
1 Vl+Al–
0 Vl–Al+
Vl–Ah+
–1
27 ms
–2
Flicker + Flutter, Vl+Ah–
Vh–Ah–
t
–3
Vh+Ah+
Vl+Ah+
+/– approaching/receding stimuli Ah/Al: 12.5/6.25Hz auditory flutter Vh/Vl: 12.5/6.25Hz visual flicker
FIGURE 29.3 A subset of experimental conditions and results. (a) Some examples of motion adaptors representing low-rate approaching visual stimuli, high-rate receding sounds, and direction conflicting stimuli combining high-rate sounds with low-rate visual events. (Reprinted from Väljamäe, A. and Soto-Faraco, S., Acta Psychol., 129, 249–254, Copyright 2008, with permission from Elsevier.) (b) Magnitude of auditory aftereffect (in dB/s) after adaptation to time-sampled approaching (+) or receding (−) audiovisual motion in depth. Left subpanel shows results for directionally congruent adaptors (high-rate visual combined with high-rate sounds; and low-rate visual combined with high-rate sounds) and right subpanel represents results of directionally incongruent audiovisual adaptors (low-rate visual combined with low-rate sounds; and low-rate visual combined with low-rate sounds).
the adaptation aftereffect increased overall, but interestingly both the fast and the slow flicker rates turned out to be equally effective in producing auditory aftereffects (see Figure 29.3b, left subpanel). This result strongly suggested that high-rate flutter can fill in sparsely sampled visual object motion. This filling-in effect could be related to the sound-induced flash phenomenon, whereby the combination of low-rate flicker with a rapid train of beeps leads to illusory flashes (e.g., Shams et al. 2002). In fact, the judgments of subjective smoothness regarding the visual flicker stimuli supported the psychophysical data—low-rate flicker was rated as being smoother when combined with high-rate beeps than when combined with low-rate flutter. However, results from these experiments did not speak directly as to whether the observed effects are specific to motion per se or else, they result just from the effects of the high-frequency temporal structure of sound signal. In a separate experiment, Väljamäe and Soto-Faraco (2008) tested the relevance of motion direction congruency of the adaptors by using direction incongruent multisensory adaptors. If the effect of the audiovisual adaptor lacks direction specificity, then the audiovisual adapting stimulus should work equally well despite the cross-modal incongruence in motion direction. However, the results showed that in the case of incongruent combination of audiovisual adaptors, weaker aftereffects were produced (Figure 29.3b, right subpanel). In fact, the aftereffects of these adaptors were not different in size and direction to the auditory motion aftereffects induced by unimodal acoustic adaptors. The findings of Väljamäe and Soto-Faraco’s (2008) study could be potentially attributed to the sound-induced visual flash illusion, given that the timing parameters of the present discrete stimuli are similar to the ones used in original experiments by Shams et al. 2000 (cf. Lewkowicz 1999 for discussion on intermodal temporal contiguity window for integration of discrete multisensory events). Thus, the aftereffects of multisensory adaptors might be explained by perceptual “upgrading” of lowrate flicker by high-rate train of beeps. In this case, illusory visual flashes might have filled in sparsely sampled real visual flicker and increased motion aftereffects. Importantly, the observed effects did not solely depend on the flutter rate, but also on the directional congruency between auditory and
596
The Neural Bases of Multisensory Processes
visual adaptors. This means that the potential of sounds to fill in the visual series critically depends on some kind of compatibility, or congruence, between the motion signal being processed by hearing and sight. Thus, above and beyond the potential contribution of the auditory driving phenomenon (e.g., Welch et al. 1986), the effect described above seems to belong to interactions between motion cues provided by a moving object. These results might support the ideas that sound can compensate for a reduced visual frame rate in media applications as described in Section 29.5.1. A better understanding of underlying mechanisms involved in such cross-modal fill-in effects may facilitate new types of perceptually optimized media applications, where sound and visuals are tightly synchronized on a frame-by-frame basis. From classical examples one can highlight animation films by Walt Disney (e.g., Fantasia), where music was directly used as a reference for animator’s work, or the abstract films by Len Lye (e.g., Colour Flight, Rhythm, Free Radicals; see Len Lye filmography 2005) where he was visualizing the music rhythm by painting or scratching directly on celluloid.
29.5.3 Perceptually Optimized Media Applications Modern times challenge us with rapidly evolving technologies that mediate our perception of physical space and time. In many situations, cinema, television, and virtual reality aim for representing realistic or fictional worlds using the most immersive technologies available on the market (e.g., Sanchez-Vives and Slater 2005). The history of cinema serves as an illustration of how technical development gradually equipped the “Great Mute” with sound, color, large and curved projection screens, and stereoscopy. Developers of new immersive broadcasting solutions often include vibrotactile and even olfactory stimulation in their prototypes (Isono et al. 1996). From this perspective, it is important to answer the question of which sensory information is needed to create a coherent perceptual representation of the real-world motion in the viewers. For example, limited animation techniques with reduced frame rates and minimalist graphics (e.g., Japanese animation or “anime”) have been widely used in media since the 1950s and represent an alternative to a photorealistic approach (Furniss 1998). Representing a scene by series of still images instead of a continuous video stream becomes a more and more common technique in music video clips and advertisements, where fastediting techniques are typically used to catch the user’s attention (Fahlenbrach 2002). However, presenting an audiovisual stream as a sequence of still images can also be seen as a way of regulating the sensory load for an end-user of multisensory displays. In a way, such slideshow-like presentation of a visual stream resembles the common technique in visual and multimodal information search and retrieval where only key video frames are used (Snoek and Worring 2002). An interesting question is, to what extent visual information can be compensated by other modalities in new, perceptually optimized media applications (Väljamäe et al. 2008)? For example, a video stream can be divided into a sequence of still images combined with a continuous soundtrack, either monophonic or spatialized. Several research projects used single still photographs combined with spatial sound to recreate a coherent audiovisual experience (Bitton and Agamanolis 2004; Hoisko 2003). A classic cinematographic example where successive still images are used as a visual representation is the photo-novel La Jetée by Marker (1962), where an accompanying voice of a narrator guides the viewer through the story. Comparing subjective experiences produced by La Jetée and conventional film shows that reduced temporal resolution of the visual stream does not degrade emotional impact and evaluation of spatial content (Väljamäe and Tajadura-Jiménez 2007). In the spirit of Chris Marker’s pioneering films, current movie and video clips makers make increasing use of photograph trains instead of continuous visual stream. An illustrative example of this approach is the recent music video clip by Jem “It’s Amazing,” directed by Saam Gabbay (Gabbay 2009; http:// www.youtube.com/watch?v=8XDxhDbtDak), where more than 25,000 still photographs were used to create a 4-min music video and at overage 4–5 fps were used on the areas that required lip sync to music (Saam Gabbay, personal communication). Effective reduction of visual information in audiovisual content, as shown in the examples above, critically depends on better understanding of multisensory motion processing. Such perceptual
Multisensory Interactions during Motion Perception
597
optimization may have important implications for audio and video compression and rendering technologies, especially in wireless communication, which at the present time are developed rather independently. In these technologies, a critical problem is to find a compromise between the limited transmission rate of information available to current technology and the realism of the content being displayed (e.g., Sanchez-Vives and Slater 2005). Future audiovisual media content synthesis, delivery, and reproduction may switch from such unisensory approach to amodal categories of the enduser percepts (such as objects, events, or even affective states of the user). In this new multisensory design, amodal categories may then define sensory modalities to be reproduced and their rendering quality (cf. “quality of service” approach in media delivery applications).
29.6 CONCLUSIONS We started by providing an overview of past and recent developments revealing the phenomenological interactions that can be observed during the perception of motion in multisensory contexts. Over and above early findings based on introspective reports (e.g., Anstis 1973; Zapparoli and Reatto 1969), which already pointed to the existence of strong multisensory interactions in motion processing, more recent psychophysical studies in humans have often reported that perception of sound motion can be influenced by the several properties of a concurrently presented visual motion signal (direction, smoothness, speed). For example, in the CDC effect, sounds can appear to move in the same direction as a synchronized visual moving object, despite that in reality they travel in opposite directions (e.g., Soto-Faraco et al. 2002, 2004a). Although these findings were frequently observed under artificially induced intersensory conflict, they speak directly to the fact of the strong tendency for multisensory binding that rules motion perception in everyday life naturalistic environments. Some of the characteristics that define this multisensory binding of motion signals are: (1) that these multisensory combination processes occur, at least in part, at early perceptual stages before other potential effects related to decisional stages take place; (2) that motion information is subject to multisensory binding, over and above any other binding phenomena that can take place between spatially static stimuli; and (3) that when other sensory modality combinations are taken into account, a hierarchy of modalities arises, where vision dominates touch which, in turn, dominates audition (e.g., Soto-Faraco et al. 2003). This hierarchy, however, can be modulated by factors such as attention focus (Oruç et al. 2008) and most probably by the relative reliabilities between the sensory signals, as per recent findings in other domains. We have also touched upon the potential underlying brain mechanisms that support multisensory binding of motion information. Both animal and human studies reveal that among the brain structures that are responsive to visual motion, the higher the processing stage at which one looks at, the more the chances that this area will reveal multisensory properties. In fact, some studies in the past have provided evidence of overlap between the brain regions that are active during the presentation of motion in audition, touch, and vision (Bremmer et al. 2001b; Lewis et al. 2000). Two of the structures that are consistently found in this type of studies are the PMv and parts of the IPS, possibly the human homologue of the monkey ventral intraparietal (VIP) region. As per animal electrophysiology data, these two areas are strongly interconnected, display a similar tuning to spatial representations of moving objects, and contain multisensory neurons. Two recent studies have provided further insight about the functional organization of multisensory motion processing in the human brain (Alink et al. 2008; Baumann et al. 2007). In both cases, the involvement of posterior parietal (VIP) and fontral (PMv) areas in binding multisensory motion information seems clear. In addition, Alink et al.’s results were suggestive of the cross-modal modulation of early sensory areas usually considered to be involved in unisensory motion processing (MT/V5 in vision, and the PT in audition). One additional recent finding is also suggestive of the responsiveness of early visual areas to acoustic motion, in this case as a consequence of brain plasticity in the blind (Saenz et al. 2008). Finally, we have discussed some of the potential connections between basic and applied research with regard to the use of dynamic displays in audiovisual media. Film editing techniques that have
598
The Neural Bases of Multisensory Processes
been developed empirically over the years reflect some of the principles that have been independently discovered in the laboratory. For example, sound is often used in the cinema to support visual continuity of a highly dynamic scene, capitalizing on the superior temporal resolution of audition over vision. Väljamäe and Soto-Faraco (2008) attempted to bridge the gap between basic research on motion perception and application of multisensory principles by showing that sounds with highrate dynamic structure could help compensate for the poor visual continuity of moving stimuli displayed at low sampling rates. These examples show that better understanding of the underlying principles of multisensory integration might help to optimize synthesis, transmission, and presentation of multimedia content. Future research on multisensory motion perception might make use of the principles that are being discovered in the laboratory in order to achieve more realistic ecological stimuli using virtual or augmented reality setups. It will also be interesting to study situations where the user or observer can experience either illusory or real self-motion (see Hettinger 2002 for a recent review). Although the current multisensory motion research concentrated solely on the situations where the user is static, viewers often are moving about in real-life situations, which implies that perception of moving objects of the surrounding environment is modulated by experienced self-motion (Probst et al. 1984, see Calabro, Soto-Faraco and Vaina 2011, for a multisensory approach). Such investigations can shed light on interactions between neural mechanisms involved in self-motion and object motion perception (cf. Bremmer et al. 2005) and, in addition, may further contribute to optimization of media applications for training and entertainment.
ACKNOWLEDGMENTS S.S.-F. received support from the Spanish Ministry of Science and Innovation (PSI2010-15426 and Consolider INGENIO CSD2007-00012) and the Comissionat per a Universitats i Recerca del DIUE-Generalitat de Catalunya (SRG2009-092). A.V. was supported by Fundació La Marató de TV3 through grant no. 071932.
REFERENCES Alais, D., and D. Burr. 2004a. The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14: 257–262. Alais, D., and D. Burr. 2004b. No direction-specific bimodal facilitation for audiovisual motion detection. Cognitive Brain Research 19: 185–194. Alink, A. W., and L. Muckli. 2008. Capture of auditory motion by vision is represented by an activation shift from auditory to visual motion cortex. Journal of Neuroscience 28: 2690–2697. Allen, P. G., and P. A. Kolers. 1981. Sensory specificity of apparent motion. Journal of Experimental Psychology: Human Perception and Performance 7: 1318–1326. Anstis, S. M. 1973. Hearing with the hands. Perception, 2, 337–341. Baumann, O., and M. W. Greenlee. 2007. Neural correlates of coherent audiovisual motion perception. Cerebral Cortex 17:1433–1443. Berger, T. D., M. Martelli, and D. G. Pelli. 2003. Flicker flutter: Is an illusory event as good as the real thing? Journal of Vision 3(6): 406–412. Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic Bulletin and Review 5: 482–489. Bitton, J., and S. Agamanolis. 2004. RAW: Conveying minimally-mediated impressions of everyday life with an audio-photographic tool. In Proceedings of CHI 2004, 495–502. ACM Press. Bremmer, F., A. Schlack, J. R. Duhamel, W. Graf, and G. R. Fink. 2001a. Space coding in primate parietal cortex. Neuroimage 14: S46–S51. Bremmer, F., A. Schlack, N. J. Shah et al. 2001b. Polymodal motion processing in posterior parietal and premotor cortex: A human fMRI study strongly implies equivalencies between humans and monkeys. Neuron 29: 287–296. Bremmer, F. 2005. Navigation in space: The role of the macaque ventral intraparietal area. Journal of Physiology 566: 29–35.
Multisensory Interactions during Motion Perception
599
Burtt, H. E. 1917a. Auditory illusions of movement — A preliminary study. Journal of Experimental Psychology 2: 63–75. Burtt, H. E. 1917b. Tactile illusions of movement. Journal of Experimental Psychology 2: 371–385. Calabro, F., S. Soto-Faraco, and L. M. Vaina. 2011. Acoustic facilitation of object movement detection during self-motion. Proceedings of the Royal Academy of Sciences B. doi:10.1098/rspb.2010.2757. In press. Calvert, G., C. Spence, and E. Barry (eds). 2004. The handbook of multisensory processes. Cambridge, MA: MIT Press. Chion, M. 1994. Audio-vision: Sound on screen. New York: Columbia Univ. Press. Choe, C. S., R. B. Welch, R. M. Gilford, and J. F. Juola. 1975. The ‘ventriloquist effect’: Visual dominance or response bias? Perception and Psychophysics 18: 55–60. Colby, C. L., J. R. Duhamel, and M. E. Goldberg. 1993. Ventral intra-parietal area of the macaque: Anatomical location and visual response properties. Journal of Neurophysiology 69: 902–914. Connor, S. 2000. Dumbstruck: A cultural history of ventriloquism. Oxford: Oxford Univ. Press. de Gelder, B., and P. Bertelson. 2003. Multisensory integration, perception and ecological validity. Trends in Cognitive Sciences 7: 460–467. Dong, C., N. V. Swindale, and M. S. Cynader. 1999. A contingent aftereffect in the auditory system. Nature Neuroscience 2: 863–865. Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1991. Congruent representations of visual and somatosensory space in single neurons of monkey ventral intra-parietal cortex (area VIP). In Brain and space, ed. J. Palliard, 223–236. Oxford: Oxford Univ. Press. Duhamel, J. R., C. L. Colby, and M. E. Goldberg. 1998. Ventral intra-parietal area of the macaque: Congruent visual and somatic response properties. Journal of Neurophysiology 79: 126–136. Eidsvik, C. 2005. Background tracks in recent cinema. In Moving image theory: Ecological condisderations, ed. J. D. Anderson and B. F. Anderson, 70–78. Carbondale, IL: Southern Illinois Univ. Press. Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429–433. Ernst, M. O., and H. H. Bulthoff. 2004. Merging the senses into a robust percept. Trends in Cognitive Sciences 8: 162–169. Exner, S. 1875. Experimentelle Untersuchung der einfachsten psychischen Processe. Pfluger’s Arch Physiol 11: 403–432. Fahlenbrach, K. 2002. Feeling sounds: Emotional aspects of music videos. In Proceedings of IGEL 2002 conference, Pécs, Hungary. Fitts, P. M., and R. L. Deininger. 1954. S–R compatibility: Correspondence among paired elements within stimulus and response codes. Journal of Experimental Psychology 48: 483–492. Fitts, P. M., and C. M. Seeger. 1953. S–R compatibility: Spatial characteristics of stimulus and response codes. Journal of Experimental Psychology 46: 199–210. Furniss, M. 1998. Art in motion: Animation aesthetics. London: John Libbey. Gazzaniga, M. S. 1987. Perceptual and attentional processes following callosal section in humans. Neuro psychologia 25: 119–133. Gepshtein, S., and K. Kubovy. 2000. The emergence of visual objects in space-time. Proceedings of the National Academy of Sciences 97: 8186–8191. Gilbert, G. M. 1939. Dynamic psychophysics and the phi phenomenon. Archives of Psychology 237: 5–43. Graziano, M. S. A., C. G. Gross, C. S. R. Taylor, and T. Moore. 2004. A system of multimodal areas in the primate brain. In Crossmodal space and crossmodal attention, ed. C. Spence and J. Driver, 51–68. Oxford: Oxford Univ. Press. Graziano, M. S. A., X. Hu, and C. G. Gross. 1997. Visuo-spatial properties of ventral premotor cortex. Journal of Neurophysiology 77: 2268–2292. Graziano, M. S. A., G. S. Yap, and C. G. Gross. 1994. Coding of visual space by premotor neurons. Science 266: 1054–1057. Hagen, M. C., O. Franzen, F. McGlone, G. Essick, C. Dancer, and J. V. Pardo. 2002. Tactile motion activates the human MT/V5 complex. European Journal of Neuroscience 16: 957–964. Hettinger, L. J. 2002. Illusory self-motion in virtual environments. In Handbook of virtual environments, ed. K. M. Stanney, 471–492. Hillsdale, NJ: Lawrence Erlbaum. Hoisko, J. 2003. Early experiences of visual memory prosthesis for supporting episodic memory. International Journal of Human–Computer Interaction 15: 209–320. Hommel, B. 2000. The prepared reflex: Automaticity and control in stimulus–response translation. In Control of cognitive processes: Attention and performance XVIII, ed. S. Monsell and J. Driver, 247–273. Cambridge, MA: MIT Press.
600
The Neural Bases of Multisensory Processes
Howard, I. P., and W. B. Templeton. 1966. Human spatial orientation. New York: Wiley. Hulin, W. S. 1927. An experimental study of apparent tactual movement. Journal of Experimental Psychology 10: 293–320. Isono H., S. Komiyama, and H. Tamegaya. 1996. An autostereoscopic 3-D HDTV display system with reality and presence. SID Digest 135–138. Kamitani, Y., and S. Shimojo. 2001. Sound-induced visual “rabbit.” Journal of Vision 1: 478a. Kirman, J. H. 1974. Tactile apparent movement: The effects of interstimulus onset interval and stimulus duration. Perception and Psychophysics 15: 1–6. Kitagawa, N., and S. Ichihara. 2002. Hearing visual motion in depth. Nature 416: 172–174. Kohlrausch, A., R. Fassel, and T. Dau. 2000. The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers. Journal of the Acoustical Society of America 108: 723–734. Korte, A. 1915. Kinematoscopische Untersuchungen. Zeitschrift für Psychologie 72: 193–296. Lakatos, S., and R. N. Shepard. 1997. Constraints common to apparent motion in visual, tactile and auditory space. Journal of Experimental Psychology: Human Perception and Performance 23: 1050–1060. Landis, C. 1954. Determinants of the critical flicker-fusion threshold. Physiological Reviews 34: 259–286. Len Lye filmography. Len Lye Foundation site, http://www.govettbrewster.com/LenLye/Foundation/ LenLyeFoundation.aspx (accessed 28 March 2011). Lewkowicz, D. J. 1999. The development of temporal and spatial intermodal perception. In Cognitive contributions to the perception of spatial and temporal events, ed. G. Aschersleben, 395–420. Amsterdam: Elsevier. Lopez-Moliner, J., and S. Soto-Faraco. 2007. Vision affects how fast we hear sounds move. Journal of Vision 7:6.1–6.7. Luppino, G., A. Murata, P. Govoni, and M. Matelli. 1999. Largely segregated parietofrontal connections linking rostral intraparietal cortex (areas AIP and VIP) and the ventral premotor cortex (areas F5 and F4). Experimental Brain Research 128: 181–187. Macmillan, N. A., and C. D. Creelman. 1991. Detection theory: A user’s guide. Cambridge, UK: Cambridge Univ. Press. Manabe, K., and H. Riquimaroux. 2000. Sound controls velocity perception of visual apparent motion. Journal of the Acoustical Society of Japan 21: 171–174. Marker, C. 1962. La Jetée [Motion picture]. France: Argos Film. Mastoropoulou, G., K. Debattista, A. Chalmers, and T. Troscianko. 2005. The influence of sound effects on the perceived smoothness of rendered animations. Paper presented at APGV’05: Second Symposium on Applied Perception in Graphics and Visualization, La Coruña, Spain. Mateeff, S., J. Hohnsbein, and T. Noack. 1985. Dynamic visual capture: Apparent auditory motion induced by a moving visual target. Perception 14: 721–727. Maunsell, J. H. R., and D. C. Van Essen. 1983. The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey. Journal of Neuroscience 3: 2563–2580. McCormick, D., and P. Mamassian. 2008. What does the illusory flash look like? Vision Research 48: 63–69. McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264: 746–748. Meyer, G. F., and S. M. Wuerger. 2001. Cross-modal integration of auditory and visual motion signals. Neuroreport 12: 2557–2560. Meyer, G. F., S. M. Wuerger, F. Röhrbein, and C. Zetzsche. 2005. Low-level integration of auditory and visual motion signals requires spatial co-localisation. Experimental Brain Research 166: 538–547. Morein-Zamir, S., S. Soto-Faraco, and A. Kingstone. 2003. Auditory capture of vision: Examining temporal ventriloquism. Cognitive Brain Research 17: 154–163. Ohmura, H. 1987. Intersensory influences on the perception of apparent movement. Japanese Psychological Research 29: 1–19. Oruç, I., S. Sinnett, W. F. Bischof, S. Soto-Faraco, K. Lock, and A. Kingstone. 2008. The effect of attention on the illusory capture of motion in bimodal stimuli, Brain Research 1242: 200–208. Pavani, F., E. Macaluso, J. D. Warren, J. Driver, and T. D. Griffiths. 2002. A common cortical substrate activated by horizontal and vertical sound movement in the human brain. Current Biology 12: 1584–1590. Perrone, J. A., and A. Thiele. 2001. Speed skills: Measuring the visual speed analyzing properties of primate MT neurons. Nature Neuroscience 4: 526–532. Pick, H. L., D. H. Warren, and J. C. Hay. 1969. Sensory conflict in judgments of spatial direction. Perception and Psychophysics 6: 203–205. Priebe, N. J., S. G. Lisberger, and J. A. Movshon. 2006. Tuning for spatiotemporal frequency and speed in directionally selective neurons of macaque striate cortex. Journal of Neuroscience 26: 2941–2950.
Multisensory Interactions during Motion Perception
601
Probst, T., S. Krafczyk, T. Brandt, and E. Wist. 1984. Interaction between perceived self-motion and object motion impairs vehicle guidance. Science 225: 536–538. Radeau, M., and P. Bertelson. 1976. The effect of a textured visual field on modality dominance in a ventriloquism situation. Perception and Psychophysics 20: 227–235. Reisbeck, T. E., and K. R. Gegenfurtner. 1999. Velocity tuned mechanisms in human motion processing. Vision Research 39: 3267–3285. Rock, I., and C. S. Harris. 1967. Vision and touch. Scientific American 216: 96–104. Saenz, M., L. B. Lewis, A. G. Huth, I. Fine, and C. Koch. 2008. Visual motion area MT+/V5 responds to auditory motion in human sight-recovery subjects. Journal of Neuroscience 28: 5141–5148. Sanabria, D., S. Soto-Faraco, and C. Spence. 2004a. Exploring the role of visual perceptual grouping on the audiovisual integration of motion. Neuroreport 18: 2745–2749. Sanabria, D., S. Soto-Faraco, and C. Spence. 2005a. Spatiotemporal interactions between audition and touch depend on hand posture. Experimental Brain Research 165: 505–514. Sanabria, D., S. Soto-Faraco, and C. Spence. 2005b. Assessing the influence of visual and tactile distractors on the perception of auditory apparent motion. Experimental Brain Research 166: 548–558. Sanabria, D., S. Soto-Faraco, J. S. Chan, and C. Spence. 2004b. When does visual perceptual grouping affect multisensory integration? Cognitive, Affective, and Behavioral Neuroscience 4: 218–229. Sanabria, D., C. Spence, and S. Soto-Faraco. 2007. Perceptual ������������������������������������������������������������� and decisional contributions to audiovisual interactions in the perception of apparent motion: A signal detection study. Cognition 102:299–310. Sanchez-Vives, M. V., and M. Slater. 2005. From presence to consciousness through virtual reality. Nature Reviews Neuroscience 4: 332–339. Schlack, A., S. Sterbing, K. Hartung, K. P. Hoffmann, and F. Bremmer. 2005. Multisensory space representations in the macaque ventral intraparietal area. Journal of Neuroscience 25: 4616–4625. Sekuler, R., A. B. Sekuler, and R. Lau. 1997. Sound alters visual motion perception. Nature 385: 308. Shams, L., Y. Kamitani, and S. Shimojo. 2000. What you see is what you hear. Nature 408: 788. Shams, L., Y. Kamitani, and S. Shimojo. 2002. Visual illusion induced by sound. Cognitive Brain Research 14: 147–152. Shams, L., Y. Kamitani, S. Thompson, and S. Shimojo. 2001. Sound alters visual evoked potential in humans. NeuroReport 12: 3849–3852. Shimojo, S., C. Scheier, R. Nijhawan, L. Shams, Y. Kamitani, and K. Watanabe. 2001. Beyond perceptual modality: Auditory effects on visual perception. Acoustical Science and Technology 22: 61–67. Simon, J. R. 1969. Reactions towards the source of stimulation. Journal of Experimental Psychology 81: 174–176. Snoek, C., and M. Worring. 2002. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25: 5–35. Soto-Faraco, S., and A. Kingstone. 2004. Multisensory integration of dynamic information. In The handbook of multisensory processes, ed. G. Calvert, C. Spence, and B. E. Stein, 49–68. Cambridge, MA: MIT Press. Soto-Faraco, S., A. Kingstone, and C. Spence. 2000. The role of movement and attention in modulating audiovisual and audiotactile ‘ventriloquism’ effects. Abstracts of the Psychonomic Society 5: 40. Soto-Faraco, S., A. Kingstone, and C. Spence. 2003. Multisensory contributions to the perception of motion. Neuropsychologia 41: 1847–1862. Soto-Faraco, S., A. Kingstone, and C. Spence. 2006. Integrating motion information across sensory modalities: The role of top-down factors. In Progress in Brain Research: Visual Perception Series, vol. 155, ed. S. Martínez-Conde et al., 273–286. Amsterdam: Elsevier. Soto-Faraco, S., J. Lyons, M. S. Gazzaniga, C. Spence, and A. Kingstone. 2002. The ventriloquist in motion: Illusory capture of dynamic information across sensory modalities. Cognitive Brain Research 14: 139–146. Soto-Faraco, S., C. Spence, and A. Kingstone. 2004a. Crossmodal dynamic capture: Congruency effects in the perception of motion across sensory modalities. Journal of Experimental Psychology: Human Perception and Performance 30: 330–345. Soto-Faraco, S., C. Spence, and A. Kingstone. 2005. Assessing automaticity in the audio-visual integration of motion. Acta Psychologica 118: 71–92. Soto-Faraco, S., C. Spence, and A. Kingstone. 2004b. Congruency effects between auditory and tactile motion: Extending the phenomenon of cross-modal dynamic capture. Cognitive Affective and Behavioral Neuroscience 4: 208–217. Staal, H. E., and D. C. Donderi. 1983. The effect of sound on visual apparent movement. American Journal of Psychology 96: 95–105.
602
The Neural Bases of Multisensory Processes
Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press. Thomas, F., and O. Johnston. 1981. Disney animation: The illusion of life. New York: Abbeyville Press. Ungerleider, L. G., and R. Desimone. 1986. Cortical connections of visual area MT in the macaque. Journal of Computational Neurology 248: 190–222. Väljamäe, A., and S. Soto-Faraco. 2008. Filling-in visual motion with sounds. Acta Psychologica 129: 249–254. Väljamäe, A., and A. Tajadura-Jiménez. 2007. Perceptual optimization of audio-visual media: Moved by sound. In Narration and spectatorship in moving images, ed. B. Anderson and J. Anderson. Cambridge Scholars Press. Väljamäe, A., A. Tajadura-Jiménez, P. Larsson, D. Västfjäll, and M. Kleiner. 2008. Handheld experiences: Using audio to enhance the illusion of self-motion. IEEE MultiMedia 15: 68–75. van der Zee, E., and A. W. van der Meulen. 1982. The influence of field repetition frequency on the visibility of flicker on displays. IPO Annual Progress Report 17: 76–83. Vroomen, J., and B. de Gelder. 2000. Sound enhances visual perception: Cross-modal effects of auditory organization on vision. Journal of Experimental Psychology: Human Perception and Performance 26: 1583–1590. Vroomen, J., and B. de Gelder. 2003. Visual motion influences the contingent auditory motion aftereffect. Psychological Science 14: 357–361. Watson, A. B., and A. J. Ahumada. 1983. A look at motion in the frequency domain. In Motion: Perception and representation, ed. J. K. Tsotsos, 1–10. New York: Association for Computing Machinery. Watson, J. D., R. Myers, R. S. Frackowiak et al. 1993. Area V5 of the human brain: Evidence from a combined study using positron emission tomography and magnetic resonance imaging. Cerebral Cortex 3: 79–94. Welch, R. B. 1999. Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and temporal perceptions. In Cognitive contributions to the perception of spatial and temporal events, ed. G. Ascherlseben, T. Bachmann, and J. Musseler, 371–387. Amsterdam: Elsevier Science. Welch, R. B., and D. H. Warren. 1986. Intersensory interactions. In Handbook of perception and human performance. Vol. 1, Sensory processes and perception, ed. K. R. Boff, L. Kaufman, and J. P. Thomas, 25–36. New York: Wiley. Welch, R. B., L. D. Duttenhurt, and D. H. Warren. 1986. Contributions of audition involved in the multimodal integration of perceptual and vision to temporal rate perception. Perception and Psychophysics 39: 294–300. Wertheimer, M. 1912. Experimentelle Studien über das Sehen von Bewegung. [Experimental studies on the visual perception of movement]. Zeitschrift für Psychologie 61: 161–265. Wertheimer, M. 1932. Principles of perceptual organization. Psychologische Forschung 41: 301–350. Abridged translation by M. Wertheimer, in Readings in perception, ed. D. S. Beardslee and M. Wertheimer, 115– 137. Princeton, NJ: Van Nostrand-Reinhold. Wuerger, S. M., M. Hofbauer, and G. F. Meyer. 2003. The integration of auditory and visual motion signals at threshold. Perception and Psychophysics 65: 1188–1196. Zapparoli, G. C., and L. L. Reatto. 1969. The apparent movement between visual and acoustic stimulus and the problem of intermodal relations. Acta Psychologia 29: 256–267. Zihl, J., D. von Cramon, and N. Mai. 1983. Selective disturbance of movement vision after bilateral brain damage. Brain 106: 313–40. Zihl, J., D. von Cramon, N. Mai, and C. Schmid. 1991. Disturbance of movement vision after bilateral posterior brain damage. Further evidence and follow up observations. Brain 114: 2235–2252.
30
Multimodal Integration during Self-Motion in Virtual Reality Jennifer L. Campos and Heinrich H. Bülthoff
CONTENTS 30.1 Introduction...........................................................................................................................603 30.2 Simulation Tools and Techniques..........................................................................................604 30.2.1 Visual Displays..........................................................................................................604 30.2.2 Treadmills and Self-Motion Simulators....................................................................606 30.3 Influence of Visual, Proprioceptive, and Vestibular Information on Self-Motion Perception.............................................................................................................................. 611 30.3.1 Unisensory Self-Motion Perception........................................................................... 611 30.3.2 Multisensory Self-Motion Perception........................................................................ 613 30.3.2.1 Effects of Cue Combination........................................................................ 613 30.3.2.2 Cue Weighting under Conflict Conditions.................................................. 616 30.3.3 Unique Challenges in Studying Multisensory Self-Motion Perception..................... 618 30.4 Advantages and Disadvantages of Using Simulation Technology to Study Multisensory Self-Motion Perception.......................................................................................................... 619 30.5 Multisensory Self-Motion Perception: An Applied Perspective........................................... 620 30.6 Summary............................................................................................................................... 622 Acknowledgments........................................................................................................................... 622 References....................................................................................................................................... 622
30.1 INTRODUCTION Our most common, everyday activities and those that are most essential to our survival, typically involve moving within and throughout our environment. Whether navigating to acquire resources, avoiding dangerous situations, or tracking one’s position in space relative to important landmarks, accurate self-motion perception is critically important. Self-motion perception is typically experienced when an observer is physically moving through space including self-propelled movements such as walking, running, or swimming, and also when being passively moved while on train or when actively driving a car or flying a plane. Self-motion perception is important for estimating movement parameters such as speed, distance, and heading direction. It is also important for the control of posture, the modulation of gait, and for predicting time to contact when approaching or avoiding obstacles. It is an essential component of path integration, which involves the accumulation of self-motion information when tracking one’s position in space relative to other locations or objects. It is also important for the formation of spatial memories when learning complex routes and environmental layouts. During almost all natural forms of self-motion, there are several sensory systems that provide redundant information about the extent, speed, and direction of egocentric movement, the most important of which include dynamic visual information (i.e., optic flow), vestibular information (i.e., provided through the inner ear organs including the otoliths and semicircular canals), proprioceptive 603
604
The Neural Bases of Multisensory Processes
information provided by the muscles and joints, and the efference copy signals representing the commands of these movements. Also important, although less well studied, are auditory signals related to self-motion and somatosensory cues provided through wind, vibrations, and changes in pressure. Currently, much work has been done to understand how several of these individual modalities can be used to perceive different aspects of self-motion independently. However, researchers have only recently begun to evaluate how they are combined to form a coherent percept of self-motion and the relative influences of each cue when more than one is available. Not only is it important to take a multisensory approach to self-motion perception in order to understand the basic science underlying cue combination, but it is also important to strive toward evaluating human behaviors as they occur under natural, cue-rich, ecologically valid conditions. The inherent difficulty in achieving this is that the level of control that is necessary to conduct careful scientific evaluations is often very difficult to achieve under natural, realistic conditions. Consequently, in order to maintain strict control over experimental conditions, much of the past work has been conducted within impoverished, laboratory environments using unnatural tasks. More recently, however, Virtual Reality (VR) technology and sophisticated self-motion interfaces have been providing researchers with the opportunity to provide natural, yet tightly controlled, stimulus conditions, while also maintaining the capacity to create unique experimental scenarios that could not occur in the real world (Bülthoff and van Veen 2001; Loomis et al. 1999; Tarr and Warren 2002). VR also does this in a way that maintains an important perception–action loop that is inherent to nearly all aspects of human–environment interactions. Visually simulated Virtual Environments (VEs) have been the most commonly used form of VR, because, until very recently it has been difficult to simulate full-body motion through these environments without having to resort to unnatural control devices such as joysticks and keyboards. More recently, the development of high-precision motion tracking systems and sophisticated self-motion simulators (e.g., treadmills and motion platforms) are allowing far more control and flexibility in the presentation of body-based self-motion cues (i.e., proprioceptive and vestibular information). Consequently, researchers are now able to study multisensory self-motion perception in novel and exciting ways. The significant technological advancements and increased accessibility of many VR systems have stimulated a renewed excitement in recognizing its significant potential now and in the future. Much of the multisensory research up until this point has focused on tasks involving discrete stimulus presentations in near body space, including visual–auditory, visual–proprioceptive, and visual–haptic interactions. Far less is understood about how different sources of sensory information are combined during large-scale self-motion through action space. Unlike other approaches used to examine the integration of two specific cues at a particular, discrete instance in time, navigating through the environment requires the dynamic integration of several cues across space and over time. Understanding the principles underlying multimodal integration in this context of unfolding cue dynamics provides insight into an important category of multisensory processing. This chapter begins by a brief description of some of the different types of simulation tools and techniques that are being used to study self-motion perception, along with some of the advantages and disadvantages of the different interfaces. Subsequently, some of the current empirical work investigating multisensory self-motion perception using these technologies will be summarized, focusing mainly on visual, proprioceptive, and vestibular influences during full-body self-motion through space. Finally, the implications of this research for several applied areas will be briefly described.
30.2 SIMULATION TOOLS AND TECHNIQUES 30.2.1 Visual Displays The exciting potential of VR comes from the fact that you can create worlds with particular characteristics that can be systematically manipulated and customized. This includes elaborate worlds
Multimodal Integration during Self-Motion in Virtual Reality
605
unlike anything that can or does exist within the known real world. Rich, realistic visual details can be included, or the visual scene can be intentionally limited to particular visual cues of interest such as the optic flow provided through a cloud of dots or the relative positioning of selected landmarks. Instant teleportation from one position in space to another (Meilinger et al. 2007), the inclusion of wormholes to create non-Euclidean spaces (Schnapp and Warren 2007), and navigation throughout four-dimensional (4-D) environments (D’Zmura et al. 2000) are all possible. This type of control and flexibility is not something that can be achieved in a real-world testing environment. Whereas in the past the process of using computer graphics to create more complex VEs, such as realistic buildings or cities, was time consuming and arduous, new software advancements are now allowing entire virtual cities of varying levels of detail to be built in just a few days (e.g., Müller et al. 2006). In order to allow an observer to visualize these VEs, different types of displays have been used (for a more thorough review, see Campos et al. 2007a). Traditionally, desktop displays have been the most commonly used visualization tool for presenting VEs. These displays typically consist of a stationary computer monitor paired with an external control device that is used to interact with the VE (i.e., a joystick or a mouse). Even though the quality and resolution of desktop displays has been steadily increasing in recent years (e.g., high dynamic range displays; see Akyüz et al. 2007), they are nonimmersive, have a limited field of view (FOV), and can accommodate very little natural movement. Other displays such as the Cave Automatic Virtual Environments (CAVE™; Cruz-Neira et al. 1993) and other large curved projection screen systems (e.g., Meilinger et al. 2008; http://www .cyberneum.com/PanoLab_en.html; see Figure 30.1) provide observers with a much wider FOV by
FIGURE 30.1 MPI Panoramic projection screen. This large, spherical panoramic projection screen consists of four projectors that project images of Virtual Environments (VEs) onto surrounding curved walls and also the floor. This provides a field of view of more than 220° horizontal and 125° vertical, thereby taking up almost the entire human visual field. Participants can move through the VE via various different input devices such as bicycles, driving interfaces, or joysticks (as shown here). The VE displayed in photo is a highly realistic virtual model of the city center of Tübingen. (Photo courtesy of: Axel Griesch.)
606
The Neural Bases of Multisensory Processes
projecting images on the walls surrounding the observer, and in some cases, the floor. Such displays are often projected with two slightly different images (accounting for the interpupillary distance), which, when paired with stereo glasses (anaglyph stereo or polarized stereo), can provide a 3-D display of the environment. Despite the full FOV and high level of immersion provided by these displays, they again only allow for a limited range of active movements. Apart from desktop displays, head-mounted displays (HMDs) are perhaps the most widely used visualization system for navigational tasks. HMDs range in size, resolution, and FOV. Their typically small FOV is one of the main restrictions. This restriction can be partially ameliorated by pairing the HMD with a motion tracking system that can be used to update the visual image directly as a function of the observer’s own head movements. This allows for a greater visual sampling of the environmental space and a more natural method of visually exploring one’s environment. HMDs also provide a highly immersive experience because the visual information is completely restricted to that experienced through the display by blocking out all surrounding visual input. The greatest advantage of HMDs is the extent of mobility that is possible, allowing for natural, large-scale movements through space such as walking. In terms of understanding the role of particular sources of sensory information in self-motion perception, there is often a trade-off between having high-resolution, wide FOV displays, which provide the most compelling visual information, and the flexibility of having a visualization system that can move with the observer (i.e., HMD), thus providing natural body-based cues. Therefore, using a combination of approaches is often advisable.
30.2.2 Treadmills and Self-Motion Simulators The most natural way in which humans interact with and navigate within their environment is by actually moving. Therefore, understanding self-motion perception can only truly be accomplished by studying an active observer as they physically move through space, something for which a simple visualization device alone will not suffice. From the perspective of multisensory approaches to studying self-motion, it is also important that particular body-based cues can be isolated from each other, for instance, by independently manipulating proprioceptive and vestibular inputs. Several sophisticated self-motion interfaces and motion capture systems are now providing such opportunities. Of course, the most natural form of movement through a VE is in fact, not simulated movement at all, but actual walking. Several laboratoriess have now developed large, fully tracked, free walking spaces (e.g., the MPI Tracking Lab, Campos et al. 2009, http://www.cyberneum.com/ TrackingLab_en.html, see Figure 30.2; the VENlab, Tarr and Warren 2002; and the HIVE, Waller et al. 2007). Using motion capture information, an observer can walk, rotate, and orient in any direction while his/her movements are used directly to update the information in the visual display (i.e., HMD). This provides a highly natural locomotor experience and retains proprioceptive and vestibular inputs in their purest form. The main limitation of these setups is that the size of the VE is constrained by the size of the actual environment. Although this is sufficient for studying behaviors that take place in smaller-scale spaces, it would not suffice for understanding the role of self-motion perception during the exploration of larger outdoor spaces or complex buildings, for instance. Some strategies have been used to maximize movement capacities, such as placing a gain on the visuals during rotations. What this does is redirect the walker by causing them to turn a greater or lesser angle physically as a way of containing their movements within the confines of the space (Engel et al. 2008; Peck et al. 2008; Razzaque et al. 2001, 2002). However, the perceptual consequences of such redirected walking manipulations are currently not known. The advantage of tracking an observer’s position in space as a way of updating their position in the VE is that this also provides a moment-by-moment recording of the behaviors that are being performed during any given task. This is particularly informative when studying self-motion perception because it provides a measure of different movement parameters such as walking speed and the walked trajectory. With full or partial body tracking, additional movement characteristics such
Multimodal Integration during Self-Motion in Virtual Reality
607
FIGURE 30.2 MPI Tracking Laboratory. This fully tracked, free-walking space is 12 × 12 m in size. In this space, participants’ position and orientation are tracked using an optical tracking system (16 Vicon MX13 cameras) through monitoring of reflective markers. Information about a participant’s position and orientation is sent from optical trackers, via a wireless connection, to a backpack-mounted laptop worn by participant. This system can therefore be used to both update the visual environment as a function of participants’ own movements (i.e., in HMD as shown here) and to capture different movement parameters. With this setup, it is also possible to track two or more observers and thus allows for multiuser interactions within a VE. (Photo courtesy of Manfred Zentsch.)
as step length, facing direction, pointing direction, and body posture can also be recorded. This provides a rich source of information as it effectively captures even subtle movement characteristics at every instance in time (e.g., Campos et al. 2009; Siegle et al. 2009). Other devices that are used to allow physical walking through VEs are treadmill setups. Unlike free walking spaces, treadmills permit unconstrained walking over infinite distances. Standard treadmills typically provide a capacity for straight, forward walking while limiting the walker to one position in space. Essentially, this limits the body-based cues to proprioceptive information. Most often these setups also use a handrail for stability and support, which provides additional haptic information informing the observer of their lack of movement through space. When walking in place under such conditions, not only are the kinematics of walking different from walking over ground (e.g., propulsive forces), but the vestibular information that is typically generated during the acceleration phase of walking is missing. In order to account for this, other, much larger treadmills (ranging from 1.5 to 2.5 m wide and 3 to 6 m long) have been developed, which allow for forward, accelerated walking across the treadmill belt until a constant walking velocity is reached (Hollerbach et al. 2000; Souman et al. 2010; Thompson et al. 2005). A harness can be used for safety to ensure that the walker does not leave the surface of the treadmill, while still allowing the flexibility of relatively unconstrained movements. Furthermore, systems such as the Sarcos Treadport system developed by Hollerbach and colleagues is equipped with a tether that can be used to push and pull the walker in a way that simulates the accelerating or decelerating forces that accompany walking through space (Christensen et al. 2000). This tether can also be used to simulate uphill or downhill locomotion (Tristano et al. 2000).
608
The Neural Bases of Multisensory Processes
By pairing these types of setups with a motion tracking system, the treadmill speed can be adjusted online in response to the observer’s own movements. Specifically, control algorithms have been developed as a way of allowing an observer to walk naturally (including stopping and changing walking speeds), while at the same time the treadmill speed is adjusted in a way that causes the walker to remain as centrally on the treadmill as possible (e.g., Souman et al. 2010). These algorithms are also optimized so that the recentering movements produce accelerations that are not strong enough to create large perturbations during walking, causing a loss of balance. In general, as a method of naturally moving through VEs, large linear treadmills can effectively provide proprioceptive information during walking, as well as some important vestibular cues. However, they do not allow for turning or rotational movement trajectories and can create some “noisy” vestibular stimulation during recentering when using a control algorithm. Circular treadmills constitute another type of movement device that allows for limitless curvilinear walking through space without reaching any end limits. During curvilinear walking, the vestibular system is always stimulated, thus providing a rich sensory experience through both pro prioceptive and inertial senses. Most circular treadmills are quite small in diameter and thus mainly permit walking or rotating in place (e.g., Jürgens et al. 1999). Larger circular treadmills allow for natural, full-stride walking in circles (see Figure 30.3 for an image of the MPI circular treadmill that is 3.6 m in diameter). The MPI circular treadmill is a modified version of that originally developed by Mittelstaedt and Mittelstaedt (1996), which includes new control and safety features and a motorized handlebar that can move independently of the treadmill belt/disk. Consequently, this provides a unique opportunity to decouple vestibular and proprioceptive information by having participants walk in place at one rate as they are moved through space at a different rate. This is achieved by having the participants’ rate of movement through space (i.e., inertial input) dictated by the speed at which the handlebar is moved, while the rate at which they walk in place (i.e., proprioceptive input)
FIGURE 30.3 MPI Circular Treadmill. This circular treadmill (3.6 m in diameter) allows for natural, fullstride walking in circles. It is equipped with a motorized handlebar that can move independently from treadmill belt/disk. Using this setup, the relation between handlebar speed and disk speed can be systematically manipulated to provide different information to two sensory systems. A computer monitor mounted on handlebar can also be used to present visual information during movement. (Photo courtesy of Axel Griesch.)
Multimodal Integration during Self-Motion in Virtual Reality
609
is dictated by the rate of the disk relative to walking/handlebar speed. Using this setup, the relation between the handlebar speed and the disk speed can be systematically manipulated to provide different information to the two sensory systems. The main drawback of most of these types of treadmill systems is that they do not allow for combinations of purely linear and rotational movements, nor can they accommodate changes in walking direction. To address this problem, there have been a handful of attempts to develop omnidirectional treadmills that allow limitless walking in every direction (Darken et al. 1997; Iwata 1999, Torus treadmill). The newest omnidirectional treadmill built by the Cyberwalk project (http:// www.cyberwalk-project.org) is the largest at 6.5 m (21 ft) × 6.5 m (4 m (13 ft) × 4 m walking area) and weighing 11 tons (see Figure 30.4). It is made up of a series of individual treadmill belts running in one direction (x), all mounted on two chains that move the belts in the orthogonal direction (y). Consequently, the combined motion of belts and chains can create motion in any direction. Again, this system is used in combination with a customized control algorithm to ensure that the walker remains centered on the platform while allowing them to change speed and direction (Souman et al. 2010). Another form of self-motion perception is that which occurs when one is passively moved through space. In this case, proprioceptive information about lower limb movements is not available and thus, in the absence of vision, self-motion is mainly detected through vestibular cues and other sources of nonvisual information (e.g., wind, changes in skin pressure, vibrations). In order to understand how inertial information can be used for self-motion perception, researchers have used devices that are able to move an observer within 2-D space, including manual wheelchairs (Allen et al. 2004; Waller and Greenauer 2007), programmable robotic wheelchairs (Berthoz et al. 1995; Israël et al. 1997; Siegle et al. 2009), frictionless sleds (Seidman 2008), rotating platforms (Jürgens
FIGURE 30.4 Cyberwalk Omni-directional Treadmill. This large omnidirectional treadmill was built by the Cyberwalk project (http://www.cyberwalk-project.org) and is housed at the MPI for Biological Cybernetics. It is 6.5 × 6.5 m (4 × 4 m walking area) and weighs 11 tons. It is made up of a series of individual treadmill belts running in one direction (x) all mounted on two chains that can move the belts in the orthogonal direction (y). Consequently, combined motion of belts and chains can create motion in any direction. (Photo courtesy of Tina Weidgans.)
610
The Neural Bases of Multisensory Processes
et al. 1999), and circular treadmills (Mittelstaedt and Mittelstaedt 1996; MPI circular treadmill, see Figure 30.3). Other devices allow for 3-D movements such as standard 6 degree-of-freedom motion platforms (e.g., Stewart motion platform; Berger et al. 2010; Butler et al. 2010; Lehmann et al. 2008; Riecke et al. 2006; http://www.cyberneum.com/MotionLab_en.html; see Figure 30.5). The MPI has recently developed a completely new type of motion simulator based on an anthropomorphic robot arm design (Teufel et al. 2007, http://www.cyberneum.com/RoboLab_en.html; see Figure 30.6). The MPI Motion Simulator can move participants linearly over a range of several meters and can rotate them around any axis, thus offering a high degree of freedom of motion. Observers can be passively moved along predefined trajectories (i.e., open loop; Siegle et al. 2009) or they can be given complete interactive control of their own movements (i.e., closed loop) via a variety of input devices, including a helicopter cyclic stick (Beykirch et al. 2007) and a steering wheel. As a consequence of its structure, certain degrees of freedom, such as roll and lateral arcs, do not interact with other degrees of freedom. Furthermore, this serial design provides a larger workspace and allows
FIGURE 30.5 MPI Stewart motion platform. The Motion Lab at the MPI for Biological Cybernetics consists of a Maxcue 600, 6 degree-of-freedom Stewart platform coupled with a 86 × 65 degree field of view projection screen mounted on a platform. Subwoofers are installed underneath the seat to produce somatosensory vibrations as a way of masking platform motors. Movements can be presented passively, or participants can control the platform via several different input devices including a helicopter cyclic stick and a 4 degree-of-freedom haptics manipulator. (Photo courtesy of Manfred Zentsch.)
Multimodal Integration during Self-Motion in Virtual Reality
611
FIGURE 30.6 MPI Motion Simulator. The MPI Motion simulator is based on an anthropomorphic robot arm design and can move participants linearly over a range of several meters and can rotate them around any axis. Observers can be passively moved along predefined trajectories or they can be given complete interactive control of their own movements via a variety of input devices, such as a helicopter cyclic stick or a steering wheel. A curved projection screen can also be mounted on end of the robot arm in front of the seated observer or alternatively an HMD can be used to present immersive visuals. Optical tracking systems have also been mounted on the robot arm to measure position and orientation of an observer’s head or their arm during pointing-based tasks. (Photo courtesy of Anne Faden.)
for upside-down movements, infinite roll capabilities, and continuous centrifugal forces—all of which are not possible with traditional simulator designs. In summary, as evidenced by the range of interfaces now available and customizable for addressing particular research questions, technology is now providing a means by which to carefully evaluate multimodal self-motion perception. Visualization devices can be used to assess how visual information alone can be used to perceive self-motion and can help to determine the importance of particular visual cues. Self-motion devices are allowing for the systematic isolation of vestibular or proprioceptive cues during both active, self-propelled movements and during passive transport. When these different interfaces are combined, this provides the opportunity to devise very specific multisensory scenarios. Much of this was not possible until very recently and as such, the field of multisensory self-motion perception is an exciting and a newly emerging field.
30.3 INFLUENCE OF VISUAL, PROPRIOCEPTIVE, AND VESTIBULAR INFORMATION ON SELF-MOTION PERCEPTION 30.3.1 Unisensory Self-Motion Perception The classic approach to understanding how particular cues contribute to different aspects of selfmotion perception has been to systematically eliminate particular cues and evaluate behaviors under reduced cue conditions. This, of course, is an important first step in understanding which cues are
612
The Neural Bases of Multisensory Processes
necessary and/or sufficient to accurately perceive self-motion. Performance has been measured for observers who only receive computer-simulated visual information in the absence of body-based cues, and also when evaluating behaviors during movements in the complete absence of vision (e.g., when walking or being passively moved). Much of the work on visual self-motion perception has looked specifically at the capacity of an observer to use optic flow alone to effectively perceive self-motion using either sparse visual input (i.e., textured ground plane or cloud of dots) or a rich visual scene (i.e., realistic visual environment). For example, it has been shown that individuals are relatively accurate at using dynamic visual information to discriminate and reproduce visually simulated traveled distances (Bremmer and Lappe 1999; Frenz et al. 2003; Frenz and Lappe 2005; Redlick et al. 2001; Sun et al. 2004a) and to update their landmark-relative position in space (Riecke et al. 2002). Other studies have shown that optic flow alone can be used to estimate various other characteristics of self-motion including direction (Warren and Hannon 1988; Warren et al. 2001), and speed (Larish and Flach 1990; Sun et al. 2003) of self-motion through space. Optic flow can also induce postural sway in the absence of physical movement perturbations (Lee and Aronson 1974; Lestienne et al. 1977) and can be used to predict the time to contact with an environmental object (Lee 1976). Characteristics of visually induced illusory self-motion, referred to as “vection,” have also received considerable interest, particularly from individuals using VR (Dichgans and Brandt 1978; Hettinger 2002; Howard 1986). Most readers have likely experienced vection while sitting in a stationary train when a neighboring train begins to move. In this case, the global movement of the outside visual scene induces a compelling sense of self-motion when really it is the environment (i.e., the neighboring train) that is moving relative to you. This phenomenon highlights the extent to which vision alone can create a compelling illusion of self-motion. Others have studied conditions in which access to visual information is removed and only bodybased cues (e.g., inertial and proprioceptive cues) remain available during movement. It has been clearly established that humans are able to view a static target up to 20 m and accurately reproduce this distance by walking an equal extent without vision (Elliott 1986; Fukusima et al. 1997; Loomis et al. 1992; Mittelstaedt and Mittelstaedt 2001; Rieser et al. 1990; Sun et al. 2004b; Thomson 1983). Participants can also continuously point to a previously viewed target when walking past it blindfolded on a straight, forward trajectory (Campos et al. 2009; Loomis et al. 1992). Others have demonstrated that individuals are able to estimate distance information when learning and responding through blindfolded walking (Ellard and Shaughnessy 2003; Klatzky et al. 1998; Mittelstaedt and Mittelstaedt 2001; Sun et al. 2004b). A recent article by Durgin et al. (2009) looked specifically at the mechanisms through which proprioceptive information can be used to estimate an extent of selfmotion and suggest that step integration might be a form of odometry used by humans (even when explicit step counting is not permitted). Such mechanisms are similar to those previously shown to be used by terrestrial insects such as desert ants (Wittlinger et al. 2006). There is some evidence, however, that step integration could be susceptible to accumulating noise and might therefore only be reliable for short traveled distances (Cheung et al. 2007). A thorough collection of research has focused specifically on investigating the role of inertial information, mainly provided through the vestibular organs, during simple linear and rotational movements (Berthoz et al. 1995; Bertin and Berthoz 2004; Butler et al. 2010; Harris et al. 2000; Israël and Berthoz 1989; Ivanenko et al. 1997; Mittelstaedt and Glasauer 1991; Mittelstaedt and Mittelstaedt 2001; Seidman 2008; Siegle et al. 2009; Yong et al. 2007) and when traveling along more complex routes involving several different segments (Allen et al. 2004; Sholl et al. 1989; Waller and Greenauer 2007). Some findings have been interpreted to indicate that head velocity and displacement can be accurately perceived by temporally integrating the linear acceleration information detected by the otolith system. Others indicate that the influence and/or effectiveness of vestibular information is somewhat limited, particularly when other nonvisual information such as vibrations are no longer available (Seidman 2008), when moving along trajectories with more complex velocity profiles (Siegle et al. 2009) or during larger-scale navigation (Waller and Greenauer 2007).
Multimodal Integration during Self-Motion in Virtual Reality
613
30.3.2 Multisensory Self-Motion Perception As important as it is to understand how humans are able to perceive self-motion during reduced sensory conditions, under most natural conditions, it is almost always the case that information from several modalities is concurrently available. Unlike other types of cue combinations that maintain a correlational relationship, in the case of self-motion perception, visual–proprioceptive and proprioceptive–vestibular interactions are often casually related. For instance, when observers self-propel themselves during walking, they immediately experience changes in optic flow information as a direct consequence of their movement. Rarely does the entire visual field move when the body signals that it is stationary or vice versa. In fact, motion sickness often arises when the brain attempts to reconcile the fact that the visual environment (e.g., the interior cabin of a ship) does not appear to move relative to the head and yet the vestibular system is clearly detecting physical movements. Traditionally, several different approaches have been used to evaluate the contributions of particular sensory systems to self-motion perception and spatial updating during egocentric movements. These have included: (1) directly comparing the effects of multisensory versus unisensory conditions (the most common approach); (2) creating subtle and transient cue conflicts between the information provided by different sensory systems; and (3) introducing a prolonged conflict as a way of evaluating the effects of sensory recalibration. Empirical evidence obtained using each of these strategies will be discussed in turn, with a focus on studies that have exploited simulation tools and techniques. 30.3.2.1 Effects of Cue Combination Tasks that have been used to investigate the role of different sensory systems for self-motion perception have ranged from estimating the speed, distance, and direction of a simple linear or rotational movement, returning to origin after traveling a two-segment path separated by a turn, and navigating longer, more complex routes. In order to evaluate the contributions of particular sensory systems to each of these tasks, research has directly compared reduced cue conditions to conditions in which all or most sensory information is available. What is clear is that no single sensory system appears to be globally critical for all aspects of self-motion perception, but rather, the relative importance of particular modalities is somewhat task-dependent. Although an exhaustive review is not provided here, this summary is intended to emphasize the necessity of taking a comprehensive approach to evaluating multisensory self-motion perception as it applies to different levels and types of behaviors. When looking at multisensory self-motion for purely rotational movements, several studies have indicated that proprioceptive information appears to be quite important. For instance, Bakker et al. (1999) asked subjects to turn various angles while in a virtual forest. They compared conditions in which only visual or vestibular (passive rotations) information was available, to conditions in which participants actively rotated themselves by moving their legs. It was reported that having only visual information led to the poorest performance in this task, followed by pure vestibular stimulation. When participants actually moved themselves, they were the most consistent and accurate. Visual information, however, was not completely ignored, because the estimates in the combined cue condition (i.e., when participants saw the forest while physically moving) fell between the two unisensory conditions, indicating a combined cue effect. Lathrop and Kaiser (2002) also evaluated perceived self-orientation by measuring pointing accuracy to unseen virtual landmarks. Performances in which participants learned the location of landmarks (via an HMD) during full-body rotations were better than when the same movements were simulated visually on a desktop monitor. Consistent with the idea that pure vestibular inputs are not sufficient for self-orientation in space during rotational movements, Wilkie and Wann (2005) reported that, when completing steering maneuvers by rotating on a motorized chair, inertial information did not contribute significantly more than that already provided through various visual inputs. Using both a realistic, visually rich scene and a pure optic flow stimulus, Riecke et al. (2006) evaluated the effects of rotational inertial
614
The Neural Bases of Multisensory Processes
cues on obligatory egocentric spatial updating. It was found that neither in rich nor impoverished visual conditions did the added inertial information improve performance. Unlike other studies, however, Riecke et al. (2006) demonstrated that with a realistic, familiar visual scene, dynamic visual information alone could be used for updating egocentric positions. Lehmann et al. (2008) evaluated the benefits of having inertially based self-motion information during the mental rotation of an array of objects. In this case, participants were seated on a motion simulator while viewing a large projection screen and a virtual array of objects displayed on a tabletop directly in front of them. When having to identify which object in the array was shifted after a viewpoint change (either physically or visually introduced), a detection advantage was observed after the physical rotation. This indicates that the inertial information provided during the rotation facilitated mental rotation, thus also supporting previous real-world studies (Simons and Wang 1998). Others have also investigated individual cue contributions during purely linear movements. For instance, Harris et al. (2000) evaluated the ability of participants to estimate linear trajectories using either visual information provided through an HMD and/or vestibular sources when passively moved on a cart. Here they found that when visual and vestibular inputs were concurrently available, estimates more closely approximated those of the purely vestibular estimates than the purely visual estimates. The importance of body-based cues for traveled distance estimation has also been revealed through a series of studies by Campos et al. (2007b). In these experiments, body-based cues were provided either by: (1) natural walking in a fully tracked free walking space (proprioceptive and vestibular), (2) being passively moved by a robotic wheelchair (vestibular), or (3) walking in place on a treadmill (proprioceptive). Distances were either presented through optic flow alone, body-based cues alone, or both visual and body-based cues combined. In this case, combined cue effects were again always observed, indicating that no modality was ever completely disregarded. When visual and body-based cues were combined during walking, estimates more closely approximated the unisensory body-based estimates. When visual and inertial cues were combined during passive movements, the estimates fell in between the two unisensory estimates. Sun et al. (2004a) investigated the relative contributions of visual and proprioceptive information by having participants compare two traveled distances experienced by riding a stationary bicycle down a virtual hallway viewed in an HMD. It was concluded in this case that visual information was predominantly used. It is important to note that when riding a bike, there is no absolute oneto-one relationship between the metrics of visual space and those of the proprioceptive movements because of the unknown scale of one pedal rotation (i.e., this would be depend on the gear, for instance). Even under such conditions, combined cue effects were observed, such that, when visual and proprioceptive cues were both available, estimates differed from those in either of the unimodal conditions. Cue combination effects have also been evaluated for speed perception during linear self-motion (Durgin et al. 2005; Sun et al. 2003). For instance, Durgin et al. (2005) have reported that physically moving (i.e., walking or being passively moved) during visually simulated self-motion causes a reduction in perceived visual speed compared to situations in which visually simulated selfmotion is experienced when standing stationary. The authors attribute this to the brain’s attempt at optimizing its efficiency when presented with two, typically correlated cues with a predictable relationship. Slightly more complex paths consisting of two linear segments separated by a rotation of varying angles have also been used to understand how self-motion is integrated across different types of movements. Typically, such tasks are used to answer questions about how accurately observers are able to continuously update their position in space without using landmarks (i.e., perform path integration). For instance, triangle completion tasks typically require participants to travel a linear path, rotate a particular angle, travel a second linear path, and then return to home (or face the point of origin). In a classic triangle completion study, Klatzky et al. (1998) demonstrated that during purely visual simulations of rotational components of the movement (i.e., the turn), participants
Multimodal Integration during Self-Motion in Virtual Reality
615
were highly disoriented when attempting to face back to start compared to conditions under which full-body information was present during the rotation. In fact, the physical turn condition resulted in errors that were almost as low as the errors in the full, real walking condition in which body information was available during the entire route (walk, turn, walk, face start). Unlike several of the rotational self-motion experiments described above, vestibular inputs during the rotational component of this triangle completion task appeared to be very important for perceived self-orientation. Again, however, it emphasizes the importance of physical movement cues over visual cues in isolation. Using a similar task, Kearns et al. (2002) demonstrated that pure optic flow information was sufficient to complete a return-to-origin task, although the introduction of body-based cues (proprioceptive and vestibular) when walking through the virtual environment led to decreased variability in responding. This was true regardless of the amount of optic flow that was available from the surrounding environment, thus suggesting a stronger reliance on body-based cues. When using a two-segment path reproduction task to compare moving via a joystick versus walking on the Torus treadmill, Iwata and Yoshida (1999) reported a higher accuracy during actual walking on the treadmill compared to when active control of self-motion was provided through the use of an input device. Chance et al. (1998) used a more demanding task in which participants were asked to travel through a virtual maze and learn the locations of several objects as they moved. At the end of the route, when prompted, participants turned and faced the direction of a particular target. Here, the authors compared conditions in which participants actually walked through the maze (proprioceptive and vestibular inputs during translation and rotation), to that in which a joystick was used to navigate the whole maze (vision alone), to that in which a joystick was used to translate and physical rotations were provided (proprioceptive and vestibular inputs during rotation only). When physically walking errors were the lowest, when only visual information was available errors were the highest, and when only physical rotations were possible, responses fell in between (although they were not significantly different from the vision-only condition). Using similar conditions, Ruddle and Lessels (2006, 2009) observed a comparable pattern of results when evaluating performances on a search task in a room-sized virtual environment. Specifically, conditions in which participants freely walked during their search resulted in highly accurate and efficient search performance, when allowed to only physically rotate observers were less efficient, and even less efficient with only visual information. Waller and colleagues have evaluated questions related to multisensory navigation as they relate to larger-scale self-motion perception and the acquisition and encoding of spatial representations. For instance, they have considered whether the inertial information provided during passive movements in a car contributes to the development of an accurate representation of a route beyond the information already provided through dynamic visual inputs (Waller et al. 2003). They found that inertial inputs did not significantly improve performance and even when the inertial cues were not consistent with the visuals, instead of disorienting or distracting observers, there was in fact no impact on spatial memory. Similarly, Waller and Greenauer (2007) asked participants to travel along a long indoor route (about 480 ft), and then evaluated their ability to perform a variety of spatial tasks. Although participants learned the route under different sensory conditions—by walking with updated vision, by being passively moved with updated vision, or by viewing a visual simulation of the same movement—there appeared to be no significant effects of cue availability (however, see Waller et al. 2004). Overall, the less obvious role of body-based cues in these larger-scale, more cognitively demanding tasks stands in contrast to the importance of body-based cues evidenced in simpler self-motion updating tasks. As such, future work must help to reconcile these findings and to form a more comprehensive model of multisensory self-motion in order to understand how the scale of a space, the accumulation of self-motion information, and the demands of the task relate to relative cue weighting. Not only do the effects of cue combinations exhibit themselves through consciously produced behaviors or responses in spatial tasks, but they can also be seen in other aspects of self-motion,
616
The Neural Bases of Multisensory Processes
including the characteristics of gait. For instance, Mohler et al. (2007a) investigated differences in gait parameters such as walking speed, step length, and head-to-trunk angle when walking with eyes open versus closed, and also when walking in a VE (wearing an HMD) versus walking in the real world. It was found that participants walked slower and exhibited a shorter stride length when walking with their eyes closed. During sighted walking while viewing the VE through the HMD, participants walked slower and took smaller steps than when walking in the real world. Their headto-trunk angle was also smaller when walking in the VE, most likely due to the reduced vertical FOV. Similarly, Sheik-Nainar and Kaber (2007) evaluated different aspects of gait, such as speed, cadence, and joint angles when walking on a treadmill. They evaluated the effects of presenting participants with congruent and updated visuals (via an HMD projecting a simulated version of the laboratory space) compared to stationary visuals (real-world laboratory space with reduced FOV to approximate HMD). These two conditions were compared to natural, overground walking. Results indicated that although both the treadmill conditions caused participants to walk slower and take smaller steps, when optic flow was consistent with the walking speed, gait characteristics more closely approximated that of overground walking. Finally, although most of the work on multisensory self-motion perception has dealt specifically with visual interactions with body-based cues, it is important to note that researchers have begun to evaluate the impact of auditory cues on self-motion perception. For instance, Väljamäe et al. (2008) have shown that sounds associated with self-motion through space, such as footsteps, can enhance the perception of linear vection. Furthermore, Riecke et al. (2009) have shown that sounds produced by a particular spatial location (i.e., by water flowing in a fountain) can enhance circular vection when appropriately updated with the moving visuals. 30.3.2.2 Cue Weighting under Conflict Conditions Although understanding the perceptual and behavioral consequences of adding or subtracting cues remains an informative approach to understanding self-motion perception, it is limited when attempting to precisely quantify the contributions made by individual cues or when defining the exact principles underlying this integration. Considering that individual modalities are sufficient in isolation for many of the different self-motion based tasks, it is difficult to assess how the different modalities combine when several are simultaneously present. In most cases, the information provided by two different sensory modalities regarding the same external stimuli is redundant and thus, it is difficult to dissociate the individual contributions of each. A popular and effective strategy for dissociating naturally congruent cues has been the cue conflict approach. This approach involves providing individual modalities with different and incongruent information about a single perceptual event or environmental stimuli. Much of the classic research using experimentally derived cue conflicts in the real world comes from work using displacement prisms (Pick et al. 1969; Welch and Warren 1980), and other recent examples have used magnification/minimization lenses (Campos et al. 2010). In the case of self-motion perception, prism goggles have been used, for instance, to shift the entire optic array horizontally, thus causing a conflict between what is perceived visually and what is perceived via other modalities such as proprioception (Rushton et al. 1998). Although prism approaches have, in the past provided great insight into sensory–motor interactions in the real world, distortions can occur and the type of conflict manipulations that can be introduced are limited (e.g., restricted to changing heading direction or vertical eye height). VR, however, provides a much more flexible system that can change many different characteristics of the visual environment as well as present visual speeds, traveled distances, heading directions, orientation in 3-D space, etc., that differ from that being simultaneously presented to proprioceptive and vestibular sources. In the context of understanding multisensory integration during self-motion, cue conflicts have been used to understand (1) the immediate consequences of transient sensory conflict (momentary incongruencies) and (2) the recalibration of optic flow and body-based cues over time (enduring conflict). In this section, each will be considered.
Multimodal Integration during Self-Motion in Virtual Reality
617
In the case of transient cue conflicts, it is typically the case that such conflicts occur on a trial-bytrial basis in an effort to avoid adaptation effects. In this case, the idea is ultimately to understand the relative cue weighting of visual and body-based cues when combined under normal circumstances. For instance, Sun et al. (2003, 2004a) used this strategy in the aforementioned simulated bike riding experiment as a way of dissociating the proprioceptive information provided by pedaling, from the yoked optic flow information provided via an HMD. In a traveled distance comparison task, they reported an overall higher weighting of visual information when the relation between the two cues was constantly varied. However, the presence of proprioceptive information continued to improve visually specified distance estimates, even when it was not congruent with the visuals (Sun et al. 2004a). On the other hand, Harris et al. (2000) used a similar technique to examine the relative contributions of visual–vestibular information to linear self-motion estimation over several meters and found that observers estimated more closely approximated the distances specified by vestibular cues than those specified by optic flow. Sun et al. (2003) also evaluated the relative contributions of visual and proprioceptive information using a speed discrimination task while bike riding down a virtual hallway. Here, they found that although both cues contributed to speed estimates, proprioceptive information was in fact weighted higher. For smaller scale, simulated full-body movements have also investigated visual–vestibular integration by presenting optic flow stimuli via a projection screen and presenting vestibular information via a 6 degree-of-freedom motion platform (Butler et al. 2010; Fetsch et al. 2009; Gu et al. 2008). In this case, it has consistently been shown that the variances observed for the estimates in the combined cue conditions are lower than estimates in either of the unisensory conditions. In the series of traveled distance experiments by Campos et al. (2007b) described above, subtle cue conflicts were also created between visual and body-based cues (see also Kearns 2003). Here, incongruencies were created by either changing the visual gain during physical movements or changing the proprioceptive gain during walking (i.e., by changing the treadmill speed). Overall, the results demonstrated a higher weighting of body-based cues during natural overground walking, a higher weighting of proprioceptive information during treadmill walking, and a relatively equal weighting of visual and vestibular cues during passive movement. These results were further strengthened by the fact that the higher weighting of body-based cues during walking was unaffected by whether visual or proprioceptive gain was manipulated. The vast majority of the work evaluating relative cue weighting during self-motion perception using cue conflict paradigms has considered how vision combines with different body-based cues. Others have recently conducted some of the first experiments to use this technique for studying proprioceptive–vestibular integration. In order to achieve this, they used the MPI circular treadmill setup described above (see Figure 30.3). Because this treadmill setup consists of a handlebar that can move independently of the treadmill disk, the relation between the handlebar speed and the disk speed can be changed to provide different information to the two sensory systems. Cue conflict techniques have also been used to evaluate the effect of changing cue relations on various gait parameters. For instance, Prokop et al. (1997) asked participants to walk at a comfortable, yet constant speed on a self-driven treadmill. When optic flow was accelerated or decelerated relative to the actual walking speed, unintentional modulations in walking speed were observed. Specifically, when the visual speed increased, walking speeds decreased, whereas the opposite was true for decreased visual speeds. Similarly, it has also been shown that walk-to-run and run-to-walk transitions can also be unintentionally modified by providing a walking observer with different rates of optic flow (Guerin and Bardy 2008; Mohler et al. 2007b). Again, as the rate of optic flow is increased, the speed at which an observer will transition from running to walking will be lower, whereas the opposite is true for decreased optic flow rates. Another group of studies has used prolonged cue conflicts as a way of investigating sensory– motor recalibration effects during self-motion. A classic, real-world multisensory recalibration experiment was conducted by Rieser and colleagues (1995), in which an extended mismatch was created between visual flow and body-based cues. Using a cleverly developed setup, participants
618
The Neural Bases of Multisensory Processes
walked on a treadmill at one speed, while it was pulled behind a tractor moving at either a faster or slower speed. Consequently, the speed of the movement experienced motorically was either greater or less than the speed of the visually experienced movement. After adaptation, participants walked blindfolded to previewed visual targets. Results indicated that when the visual flow was slower than the locomotor information participants overshot the target (relative to pretest), whereas when the visual flow was faster than the locomotor information they undershot the target distance. Although the approach used by Rieser et al. (1995) was ingenious, one can imagine that this strategy can be accomplished much more easily, safely, and under more highly controlled circumstances by using simulation devices. Indeed, the results of Rieser et al.’s (1995) original study have since been replicated and expanded upon using VR. This has been achieved by having participants walk on a treadmill or within a tracked walking space while they experience either relatively faster or slower visually perceived flow via an HMD or a large FOV projection display (Durgin et al. 2005; Mohler et al. 2007c; Proffitt et al. 2003; Thompson et al. 2005). For instance, it has been shown that adaptations that occur when subjects are walking through a VE on a treadmill transfer to a realworld blind walking task (Mohler et al. 2007c). There is also some indication that the aftereffects observed during walking on solid ground (tracked walking space) are larger than those observed during treadmill walking (Durgin et al. 2005). Pick et al. (1999) have also shown similar recalibration effects for rotational self-motion.
30.3.3 Unique Challenges in Studying Multisensory Self-Motion Perception In recent years, much of the multisensory research community have used psychophysical methods as a way of evaluating whether two cues are integrated in a statistically optimal fashion [i.e., maximum likelihood estimation (MLE) or Bayesian approaches to cue integration; Alais and Burr 2004; Blake and Bülthoff 1993; Bülthoff and Mallot 1988; Bülthoff and Yuille 1991, 1996; Butler et al. 2010; Cheng et al. 2007; Ernst and Banks 2002; Ernst and Bülthoff 2004; Fetsch et al. 2009; Knill and Saunders 2003; Kording and Wolpert 2004; MacNeilage et al. 2007; Welchman et al. 2008]. A traditional design used to evaluate such predictions involves a comparison of the characteristics of the psychometric functions (i.e., just noticeable difference or variance scores) obtained during unisensory conditions to those obtained during multisensory conditions. Based on the assumptions of an MLE account, at least two general predictions can be made. First, the variance observed in the combined sensory condition should be lower than that observed in either of the unimodal conditions. Second, the cue with the highest unimodal variance should be given less weight when the two cues are combined. A cue conflict is often used as a way of providing slightly different information to each of the two cues as a way of identifying which cue was weighted higher in the combined estimate. However, because of the tight relationship between visual, vestibular, and proprioceptive information during self-motion, this presents a unique challenge for obtaining unbiased unisensory estimates through which to base predictive models. This is because even in the unisensory conditions, there remains an inherent conflict. For instance, when visual self-motion is simulated in the absence of proprioceptive and vestibular inputs, this could be a challenge for the brain to reconcile. Because the proprioceptive and vestibular systems cannot be “turned off,” they constantly send the brain information about self-motion, regardless of whether that information indicates self-motion through space or a stationary egocentric position. Therefore, when the visual system is provided with a compelling sense of self-motion, both the muscles and joints and inner ear organs clearly do not support this assessment. Despite these constraints, effective models of self-motion perception have recently been developed as a way of assessing some of the abovementioned predictions (e.g., Jürgens and Becker 2006; Laurens and Droulez 2007). For instance, Jürgens and Becker (2006) evaluated the weighting of vestibular, proprioceptive, and cognitive inputs on displacement perception. They report that the more sensory information that is available, the less participants appeared to rely on cognitive strategies.
Multimodal Integration during Self-Motion in Virtual Reality
619
In addition, with increasing sources of combined information, lower variance scores were observed. Cheng et al. (2007) have also summarized some of the multisensory work in locomotion and spatial navigation and evaluated how these findings fit within the context of Bayesian theoretical predictions. Overall, there remains much important work to be done concerning the development of quantitative models describing the principles underlying multisensory self-motion perception.
30.4 ADVANTAGES AND DISADVANTAGES OF USING SIMULATION TECHNOLOGY TO STUDY MULTISENSORY SELF-MOTION PERCEPTION Throughout this chapter, numerous unique benefits of using visually simulated environments and various self-motion simulators to study multisensory self-motion perception have been described. However, because VR technology is not yet capable of achieving the extraordinary task of capturing every aspect of reality in veridical spatial and temporal terms, there are several limitations that must also be acknowledged. Below, we will briefly consider some of the additional advantages and disadvantages of using VR in studying multisensory self-motion perception not already discussed earlier in this chapter (see also Bülthoff and van Veen 2001; Loomis et al. 1999; Tarr and Warren 2002). Considering that the natural world contains an infinite amount of contextual and behaviorally relevant sensory information, it is often difficult to predict how these sources of information will interact. As mentioned above, perhaps the most significant advantage of VR is that it can it provide a highly controlled, multisensory experience. It is also able to overcome some of the difficulties inherent in experimentally manipulating a unimodal component of a multisensory experience and for dissociating individual cues within one modality. Moreover, each of these manipulations is achieved under safe, low-risk, highly replicable circumstances, and often (although not always) at a much lower cost than is possible in the real world. For instance, Souman et al. (2009) were interested in empirically testing the much-speculated question of whether humans indeed walk in circles when lost in the desert. To do this, Souman and colleagues traveled to the Sahara desert. Without going through this level of effort and expense, conducting such experiments would otherwise be extremely difficult to test in the real world because of the need to have a completely sparse environment through which an individual can walk for hours. However, following the original, real-world experiment, Souman and colleagues have since been able to evaluate similar questions under more precise conditions by using the newly developed MPI omnidirectional treadmill. Here they can manipulate particular characteristics of the VE as a way of evaluating the exact causes of any observed veering behaviors, while still allowing for limitless walking capabilities in any direction. Although many of the tasks described thus far have dealt mainly with consciously reported or reproduced behaviors in VEs, it is also important to note that even unconscious, physiological reactions (e.g., heart rate and galvanic skin response) often respond in ways similar to that observed for real-world events. For instance, when having observers walk to the very edge of cliff in a VE, not only do many participants report a compelling sense of fear, but their heart rate also increases considerably (Meehan et al. 2005). This effect is further amplified when additional sensory cues, such as the haptic sensation of feeling the edge of drop-off with one’s feet, are also provided. The disadvantages of VR must also be accounted for when considering whether particular technologies are appropriate for addressing specific research questions. For instance, as mentioned above, there is often a trade-off between high-quality, wide FOV visualization systems, and mobility. However, the impact that a reduced FOV has on self-motion perception is still relatively unclear. The results of Warren and Kurtz (1992) indicate that, unlike previously believed, peripheral optic flow of information is not necessarily the dominant source of visual input when performing a visual heading task, but rather, central visual input tends to provide more accurate estimates. Banton et al. (2005), on the other hand, indicate that peripheral information seems to be important for accurately perceiving visual speed when walking. Therefore, the perceptual impact of having a restricted FOV on the perception of various aspects of self-motion requires further investigation.
620
The Neural Bases of Multisensory Processes
There are also several clear and consistent perceptual errors that occur in VEs that do not occur in the real world. For instance, although much research has now demonstrated that humans are very good at estimating the distance between themselves and a stationary target in the real world (see Loomis and Philbeck 2008 for a review), the same distance magnitudes are consistently underestimated in immersive VEs by as much 50% (Knapp and Loomis 2004; Loomis and Knapp 2003; Thompson et al. 2004; Witmer and Kline 1998). This effect is not entirely attributable to poor visual graphics (Thompson et al. 2004) and although some groups have reported a distance compression effect when the FOV is reduced and the viewer is stationary (Witmer and Kline 1998), others have shown that when head movements are allowed under restricted FOV conditions, these effects are not observed (Creem-Regehr et al. 2005; Knapp and Loomis 2004). Strategies have been used to reduce this distance compression effect, for instance, by providing various forms of feedback when interacting in the VE (Mohler et al. 2007c; Richardson and Waller 2005; Waller and Richardson 2008), yet the exact cause of this distance compression remains unknown. Another, less-studied perceptual difference between virtual and real environments that has also been reported, is the misperception of visual speed when walking in VEs (Banton et al. 2005; Durgin et al. 2005). For instance, Banton et al. (2005) required participants to match their visual speed (presented via an HMD) to their walking speed as they walked on a treadmill. When facing forward during walking, visual speeds were increased by about 1.6× that of the walking speed in order to appear equal. When motion tracking is used to visually update an observer’s position in the VE, there is also the concern that temporal lag has the potential to create unintentional sensory conflict, disrupt the feeling of presence, and cause cybersickness. There is also some indication that characteristics of gait change when walking overground in a VE compared to the real world (Mohler et al. 2007a), and walking on a treadmill in a VE is associated with increased stride frequency (Sheik-Nainar and Kaber 2007). It is yet unknown how such changes in physical movement characteristics might impact particular aspects of self-motion perception. In addition to lower-level perceptual limitations of VEs, there are also higher-level cognitive effects that can affect behavior. For instance, there is often a general awareness when interacting within a VE that one is in fact engaging with artificially derived stimuli. Observers might react differently to simulated scenarios, for instance, by placing a lower weighting on sensory information that they know to be simulated. Furthermore, when visually or passively presented movements defy what is physically possible in the real world, this information might also be treated differently. In cue conflict situations, it has also been shown that relative cue weighting during self-motion can change as a function of whether an observer is consciously aware of any cue conflicts that are introduced (Berger and Bülthoff 2009). There is also a discord between the perceptual attributes of the virtual world that an observer is immersed in and the knowledge of the real world that they are physically located within. Evidence that this awareness might impact behavior comes from findings indicating that, during a homing task in a VE, knowledge of the size of the real-world room impacts navigational behaviors in the VE (Nico et al. 2002). Specifically, when participants knowingly moved within in a smaller real-world room they undershot the origin in the VE compared to when they were moving within a larger realworld space (even though the VEs were of identical size). Overall, researchers should ideally strive to exploit the advantages offered by the various available interfaces while controlling for the specific limitations through the use of others. Furthermore, whenever possible, it is best to take the reciprocally informative approach of comparing and coordinating research conducted in VR with that taking place in real-world testing scenarios.
30.5 MULTISENSORY SELF-MOTION PERCEPTION: AN APPLIED PERSPECTIVE Being able to effectively and accurately represent multiple sources of sensory information within a simulated scenario is essential for a broad variety of applied areas. VR technologies are now being
Multimodal Integration during Self-Motion in Virtual Reality
621
widely adopted for use in areas as diverse as surgical, aviation, and rescue training, architectural design, driving and flight simulation, athletic training and evaluation, psychotherapy, gaming, and entertainment. Therefore, not only is it important to understand cue integration during relatively simple tasks, but it is also imperative to understand these perception–action loops during more complex, realistic, multifaceted behaviors. Although most multisensory research has focused on the interaction of only two sensory cues, most behaviors occur in the context of a variety of sensory inputs, and therefore understanding the interaction of three or more cues (e.g., Bresciani et al. 2008) during ecologically valid stimulus conditions is also important. These issues are particularly critical considering the possibly grave consequences of misperceiving spatial properties or incorrectly adapting to particular stimulus conditions. Here, we briefly consider two applied fields that we feel are of particular interest as they relate to multisensory self-motion perception: helicopter flight behavior and locomotor rehabilitation. Helicopter flight represents one of the most challenging multisensory control tasks accomplished by humans. The basic science of helicopter flight behavior is extremely complex and the effects of specific flight simulation training on real-world performance (i.e., transfer of training) remain poorly understood. Because several misperceptions are known to occur during helicopter flight, it is important to first understand the possible causes of such misperceptions in a way that will allow for more effective training procedures. One example of such a misperception that can occur when reliable visual information is not available during flight is the somatogravic illusion. In this case, the inertial forces during accelerations of the aircraft and those specifying gravitational forces may become confused, thus causing an illusion of tilt during purely linear accelerations, often resulting in devastating outcomes. Several studies have been conducted using the MPI Motion Simulator by outfitting it with a helicopter cyclic stick and various visualization devices in order to create a unique and customizable flight simulator. For instance, nonexpert participants were trained on the simulator to acquire the skills required to stabilize a helicopter during a hovering task (Nusseck et al. 2008). In this case, the robot was programmed to move in a way that mimicked particular helicopter dynamics and the participants’ task was to hover in front of real-world targets. Two helicopter sizes were simulated: one that was light and agile and another that was heavy and inert. Participants were initially trained on one of the two helicopters and subsequently their performance was tested when flying the second helicopter. This method was used to reveal the novice pilots’ ability to transfer the general flight skills they learned on one system, to another system with different dynamics. The results indicated that participants were able to effectively transfer the skills obtained when training in the light helicopter to the heavy helicopter, whereas the opposite was not true. Understanding these transfer of training effects are important for assessing the effectiveness of both, training in simulators and flying in actual aircraft, and also for understanding the subtle differences of flying familiar versus unfamiliar aircraft—something almost all pilots are at one time faced with. Another applied area that would benefit greatly from understanding multisensory self-motion perception is the diagnosis and rehabilitative treatment of those with disabling injury or illness. A significant percentage of the population suffers from the locomotor consequences of Parkinson’s disease, stroke, acquired brain injuries, and other age-related conditions. Oftentimes rehabilitation therapies consist of passive range of motion tasks (through therapist manipulation or via roboticassisted walking) or self-initiated repetitive action tasks. In the case of lower-limb sensory–motor disabilities, one rehabilitative technique is to have patients walk on a treadmill as a way of actively facilitating and promoting the movements required for locomotion. The focus of such techniques, however, is exclusively on the motor system, with very little consideration given to the multimodal nature of locomotion. In fact, treadmill walking actually causes a conflict between proprioceptive information, which specifies that the person is moving, and visual information, which indicates a complete lack of self-motion. Considering that one of the key factors in the successful learning or relearning of motor behaviors is feedback, a natural source of feedback can be provided by the visual flow information obtained
622
The Neural Bases of Multisensory Processes
during walking. As such, incorporating visual feedback into rehabilitative treadmill walking therapies could prove to be of great importance. Actively moving within a VE is also likely to be highly rewarding for individuals lacking stable mobility and thus may increase levels of motivation in addition to recalibrating the perceptual–motor information. Although some work has been done to evaluate multimodal effects in upper-limb movement recovery, this is not something that has been investigated as thoroughly for full body locomotor behavior such as walking. A group that has evaluated such questions is Fung et al. (2006), who have used a self-paced treadmill, mounted on a small motion platform coupled with a projection display as a way of adapting gait behavior in stroke patients. They found that, by training with this multimodal system, patients showed clear locomotor improvements such as increases in gait speed and the ability to more flexibly adapt their gait when faced with changes in ground terrain. Rehabilitation research and treatment programs can benefit greatly from the flexibility, safety, and high level of control offered by VR and simulator systems. As such, technologies that offer multimodal stimulation and control are expected to have a major impact in the future [e.g., see, Toronto Rehabilitation Institute’s Challenging Environment Assessment Laboratory (CEAL); http://www .cealidapt.com].
30.6 SUMMARY This chapter has emphasized the closed-loop nature of human locomotor behavior by evaluating studies that preserve the coupling between perception and action during self-motion perception. This combined cue approach to understanding full body movements through space offers unique insights into multisensory processes as they occur over space and time. Future work in this area should aim to define the principles underlying human perceptual and cognitive processes in the context of realistic sensory information. Using simulation techniques also allows for a reciprocally informative approach of using VR as a useful tool for understanding basic science questions related to the human observer in action, while also utilizing the results of this research to provide informed methods of improving VR technologies. As such, the crosstalk between applied fields and basic science research approaches should be strongly encouraged and facilitated.
ACKNOWLEDGMENTS We would like to thank members of the MPI Cyberneum group, past and present (http://www .cyberneum.com/People.html), Jan Souman, John Butler, and Ilja Frissen for fruitful discussions. We also thank Simon Musall for invaluable assistance and two anonymous reviewers for helpful comments.
REFERENCES Akyüz, A. O., R. W. Fleming, B. E. Riecke, E. Reinhard, and H. H. Bülthoff. 2007. Do HDR displays support LDR content: A psychophysical evaluation. ACM Trans Graphics 26(3:38): 1–7. Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14: 257–262. Allen, G. L., K. C. Kirasic, M. A. Rashotte, and D. B. M. Haun. 2004. Aging and path integration skill: Kinesthetic and vestibular contributions to wayfinding. Percept Psychophys 66(1): 170–179. Bakker, N. H., P. J. Werkhoven, and P. O. Passenier. 1999. The effects of proprioceptive and visual feedback on geographical orientation in virtual environments. Pres Teleop Virtual Environ 8: 36–53. Banton, T., J. Stefanucci, F. Durgin, A. Fass, and D. Proffitt. 2005. The perception of walking speed in a virtual environment. Pres Teleop Virtual Environ 14(4): 394–406. Berger, D. R., J. Schulte-Pelkum, and H. H. Bülthoff. 2010. Simulating believable forward accelerations on a Stewart motion platform. ACM Trans Appl Percept 7(1:5): 1–27.
Multimodal Integration during Self-Motion in Virtual Reality
623
Berger, D. R., and H. H. Bülthoff. 2009. The role of attention on the integration of visual and inertial cues. Exp Brain Res 198(2–3): 287–300. Berthoz, A., I. Israël, P. Georges-François, R. Grasso, and T. Tsuzuku. 1995. Spatial memory of body linear displacement: What is being stored? Science 269: 95–98. Bertin, R. J. V., and A. Berthoz. 2004. Visuo-Vestibular interaction in the reconstruction of travelled trajectories. Exp Brain Res 154: 11–21. Beykirch, K., F. M. Nieuwenhuizen, H. J. Teufel, H.-G. Nusseck, J. S. Butler, and H. H. Bülthoff. 2007. Control of a lateral helicopter sidestep maneuver on an anthropomorphic robot. Proceedings of the AIAA Modeling and Simulation Technologies Conference and Exhibit, 1–8. American Institute of Aeronautics and Astronautics, Reston, VA, USA. Blake, A., H. H. Bülthoff, and D. Sheinberg. 1993. Shape from texture: Ideal observers and human psychophysics. Vis Res 33: 1723–1737. Bremmer, F., and M. Lappe. 1999. The use of optical velocities for distance discrimination and reproduction during visually simulated self motion. Exp Brain Res 127: 33–42. Bresciani, J.-P., F. Dammeier, and M. O. Ernst, 2008. Trimodal integration of visual, tactile and auditory signals for the perception of sequences of events. Brain Res Bull 75(6): 753–760. Bülthoff, H. H., and H. A. Mallot. 1988. Integration of depth modules: Stereo and shading. J Opt Soc Am 5: 1749–1758. Bülthoff, H. H., and H.-J. van Veen. 2001. Vision and action in virtual environments: Modern psychophysics. In Spatial cognition research. Vision and attention, ed. M. L. Jenkin and L. Harris, 233–252. New York: Springer Verlag. Bülthoff, H. H., and A. Yuille. 1991. Bayesian models for seeing shapes and depth. Comments Theor Biol 2(4): 283–314. Bülthoff, H. H., and A. L. Yuille. 1996. A Bayesian framework for the integration of visual modules. In Attention and Performance XVI: Information Integration in Perception and Communication, ed. J. McClelland and T. Inui, 49–70. Cambridge, MA: MIT Press. Butler, J. S., S. T. Smith, J. L. Campos, and H. H. Bülthoff. 2010. Bayesian integration of visual and vestibular signals for heading. J Vis 10(11): 23, 1–13. Campos, J. L., J. S. Butler, B. Mohler, and H. H. Bülthoff. 2007b. The contributions of visual flow and locomotor cues to walked distance estimation in a virtual environment. Appl Percept Graphics Vis 4: 146. Campos, J. L., P. Byrne, and H.-J. Sun. 2010. Body-based cues trump vision when estimating walked distance. Eur J Neurosci 31: 1889–1898. Campos, J. L., H.-G. Nusseck, C. Wallraven, B. J. Mohler, and H. H. Bülthoff. 2007a. Visualization and (mis) perceptions in virtual reality. Tagungsband 10. Proceedings of Workshop Sichtsysteme, ed. R. Möller and R. Shaker, 10–14. Aachen, Germany. Campos, J. L., J. Siegle, B. J. Mohler, H. H. Bülthoff and J. M. Loomis. 2009. Imagined self-motion differs from perceived self-motion: Evidence from a novel continuous pointing method. PLoS ONE 4(11): e7793. doi:10.1371/journal.pone.0007793. Chance, S. S., F. Gaunet, A. C. Beall, and J. M. Loomis. 1998. Locomotion mode affects the updating of objects encountered during travel: The contribution of vestibular and proprioceptive inputs to path integration. Pres Teleop Virtual Environ 7(2): 168–178. Cheng, K., S. Shettleworth, J. Huttenlocher, and J. J. Rieser. 2007. Bayesian integrating of spatial information. Psychol Bull 133(4): 625–637. Cheung, A., S. Zhang, C. Stricker, and M. V. Srinivasan. 2007. Animal navigation: The difficulty of moving in a straight line. Biol Cybern 97: 47–61. Christensen��������������������������������������������������������������������������������������������������� , R., J. M. Hollerbach, Y. Xu, and S. Meek. 2000. ������������������������������������������������� Inertial force feedback for the Treadport locomotion interface. Pres Teleop Virtual Environ 9: 1–14. Creem-Regehr, S. H., P. Willemsen, A. A. Gooch, and W. B. Thompson. 2005. The influence of restricted viewing conditions on egocentric distance perception: Implications for real and virtual environments. Perception 34(2): 191–204. Cruz-Neira, C., T. A. Sandin, and R. V. DeFantini. 1993. Surround screen projection-based virtual reality: The design and implementation of the cave. Proc SIGGRAPH, 135–142. Darken, R. P., W. R. Cockayne, and D. Carmein. 1997. The omni-directional treadmill: A locomotion device for virtual worlds. Proceedings of the ACM User Interface Software and Technology, Banff, Canada, October 14–17, 213–221. Dichgans, J., and T. Brandt. 1978. Visual–vestibular interaction: Effects on self-motion perception and postural control. In Perception, Vol. VIII, Handbook of sensory physiology, ed. R. Held, H. W. Leibowitz, and H. L. Teuber, 756–804. Berlin: Springer.
624
The Neural Bases of Multisensory Processes
Durgin, F. H., A. Pelah, L. F. Fox et al. 2005. Self-motion perception during locomotor recalibration: More than meets the eye. J Exp Psychol Hum Percept Perform 31: 398–419. Durgin, F. H., M. Akagi, C. R. Gallistel, and W. Haiken. 2009. The precision of locomotor odometry in humans. Exp Brain Res 193(3): 429–436. D’Zmura������������������������������������������������������������������������������������������������� , M., P. Colantoni, and G. Seyranian. 2000. Virtual ����������������������������������������������������� environments with four or more spatial dimensions. Pres Teleop Virtual Environ 9(6): 616–631. Ellard, C. G., and S. C. Shaughnessy. 2003. A comparison of visual and non-visual sensory inputs to walked distance in a blind-walking task. Perception 32(5): 567–578. Elliott, D. 1986. Continuous visual information may be important after all: A failure to replicate Thomson. J Exp Psychol Hum Percept Perform 12: 388–391. Engel, D., C. Curio, L. Tcheang, B. J. Mohler, and H. H. Bülthoff. 2008. A psychophysically calibrated controller for navigating through large environments in a limited free-walking space. In Proceedings of the 2008 ACM Symposium on Virtual Reality Software and Technology, ed. S. Feiner, D. Thalmann, P. Guitton, B. Fröhlich, E. Kruijff, M. Hachet, 157–164. New York: ACM Press. Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429–433. Ernst, M. O., and H. H. Bülthoff. 2004. Merging the senses into a robust percept. Trends Cogn Sci 8: 162–169. Fetsch, C. R., A. H. Turner, G. C. DeAngelis, and D. E. Angelaki. 2009. Dynamic reweighting of visual and vestibular cues during self-motion perception. J Neurosci 29(49): 15601–15612. Frenz, H., F. Bremmer, and M. Lappe. 2003. Discrimination of travel distances from “situated” optic flow. Vis Res 43(20): 2173–2183. Frenz, H., and M. Lappe. 2005. Absolute travel distance from optic flow. Vis Res 45(13): 1679–1692. Fukusima, S. S., J. M. Loomis, and J. A. DaSilva. 1997. Visual perception of egocentric distance as assessed by triangulation. J Exp Psychol Hum Percept Perform 23: 86–100. Fung, J., C. L. Richards, F. Malouin, B. J. McFadyen, and A. Lamontagne. 2006. A treadmill and motion coupled virtual reality system for gait training post-stroke. Cyberpsych Behav 9(2): 157–162. Gu, Y., D. E. Angelaki, and G. C. Deangelis. 2008. Neural correlates of multisensory cue integration in macaque MSTd. Nat Neurosci 11(10): 1201–1210. Guerin, P., and B. G. Bardy. 2008. Optical modulation of locomotion and energy expenditure at preferred transition speed. Exp Brain Res 189: 393–402. Harris, L. R., M. Jenkin, and D. C. Zikovitz. 2000. Visual and non-visual cues in the perception of linear selfmotion. Exp Brain Res 135: 12–21. Hettinger, L. J. 2002. Illusory self-motion in virtual environments. In Handbook of virtual environments, ed. K. M. Stanney, 471–492. Hillsdale, NJ: Lawrence Erlbaum. Hollerbach, J. M., Y. Xu, R. Christensen, and S. C. Jacobsen. 2000. Design specifications for the second generation Sarcos Treadport locomotion interface Haptics Symposium, Proc. ASME Dynamic Systems and Control Division, DSC-Vol. 69-2, 1293–1298, Orlando, November. Howard, I. P. 1986. The perception of posture, self-motion, and the visual vertical. In Sensory processes and perception, Vol. I, Handbook of human perception and performance, ed. K. R. Boff, L. Kaufman, and J. P. Thomas, 18.1–18.62, New York: Wiley. Israël, I., and A. Berthoz. 1989. Contributions of the otoliths to the calculation of linear displacement. J Neurophysiol 62(1): 247–263. Israël, I., R. Grasso, P. Georges-Francois, T. Tsuzuku, and A. Berthoz. 1997. Spatial memory and path integration studied by self-driven passive linear displacement: I. Basic properties. J Neurophysiol 77: 3180–3192. Ivanenko, Y. P., R. Grasso, I. Israël, and A. Berthoz. 1997. The contributions of otoliths and semicircular canals to the perception of two-dimensional passive whole-body motion in humans. J Physiol 502(1): 223–233. Iwata, H. 1999. Walking about virtual environments on an infinite floor. IEEE Virtual Real 13–17, March. Iwata, H., and Y. Yoshida. 1999. Path reproduction tests using a torus treadmill. Pres Teleop Virtual Environ 8(6): 587–597. Jürgens, R., and W. Becker. 2006. Perception of angular displacement without landmarks: Evidence for Bayesian fusion of vestibular, optokinetic, podokinesthetic, and cognitive information. Exp Brain Res 174(3): 528–43. Jürgens, R., T. Boß, and W. Becker. 1999. Estimation of self-turning in the dark: Comparison between active and passive rotation. Exp Brain Res 128: 491–504. Kearns, M. J., W. H. Warren, A. P. Duchon, and M. J. Tarr. 2002. Path integration from optic flow and body senses in a homing task. Perception 31: 349–374.
Multimodal Integration during Self-Motion in Virtual Reality
625
Kearns, M. J. 2003. The roles of vision and body senses in a homing task: The visual environment matters. Unpublished doctoral thesis, Brown University. Klatzky, R. L., J. M. Loomis, A. C. Beall, S. S. Chance, and R. G. Golledge. 1998. Spatial updating of selfposition and orientation during real, imagined, and virtual locomotion. Psychol Sci 9(4): 293–298. Knapp, J. M., and J. M. Loomis. 2004. Limited field of view of head-mounted displays is not the cause of distance underestimation in virtual environments. Pres Teleop Virtual Environ 13(5): 572–577. Knill, D. C., and J. A. Saunders. 2003. Do humans optimally integrate stereo and texture information for judgments of surface slant? Vis Res 43: 2539–2558. Kording, K. P., and D. M. Wolpert. 2004. Bayesian integration in sensorimotor learning. Nature 427(15): 244–247. Larish, J. F., and J. M. Flach. 1990. Sources of optical information useful for perception of speed of rectilinear self-motion. J Exp Psychol Hum Percept Perform 16: 295–302. Lathrop, W. B., and M. K. Kaiser. 2002. Perceived orientation in physical and virtual environments: Changes in perceived orientation as a function of idiothetic information available. Pres Teleop Virtual Environ 11(1): 19–32. Laurens, J., and J. Droulez. 2007 Bayesian processing of vestibular information. Biol Cybern 96: 389–404. Lee, D. N. 1976. Theory of visual control of braking based on information about time-to-collision. Perception 5(4): 437–459. Lee, D. N., and E. Aronson. 1974. Visual proprioceptive control of standing in human infants. Percept Psychophys 15(3): 529–532. Lehmann, A., M. Vidal, and H. H. Bülthoff. 2008. �������������������������������������������������������������� A high-end virtual reality setup for the study of mental rotations. Pres Teleop Virtual Environ 17(4): 365–375. Lestienne, F., J. Soechting, and A. Berthoz. 1977. Postural readjustments induced by linear vection of visual scenes. Exp Brain Res 28(3–4): 363–384. Loomis, J. M., J. A. Da Silva, N. Fujita, and S. S. Fukusima. 1992. Visual space perception and visually directed action. J Exp Psychol Hum Percept Perform 18: 906– 921. Loomis, J. M., J. J. Blascovich, and A. C. Beall. 1999. Immersive virtual environment technology as a basic research tool in psychology. Behav Res Methods Instrum Comp 31(4): 557–564. Loomis, J. M., and J. M. Knapp. 2003. Visual perception of egocentric distance in real and virtual environments. In Virtual and adaptive environments, ed. L. J. Hettinger and M. W. Haas, 21–46. Mahwah, NJ: Erlbaum. Loomis, J. M., and J. W. Philbeck. 2008. Measuring perception with spatial updating and action. In Embodiment, ego-space and action, ed. R. L. Klatzky, M. Behrmann, and B. MacWhinney, 1–42. Mahwah, NJ: Erlbaum. MacNeilage, P. R., M. S. Banks, D. R. Berger, and H. H. Bülthoff. 2007. A Bayesian model of the disambiguation of gravitoinertial force by visual cues. Exp Brain Res 179: 263–290. Meehan, M., S. Razzaque, B. Insko, M. Whitton, and F. P. Brooks. 2005. Review of four studies on the use of physiological reaction as a measure of presence in stressful Virtual Environments. Appl Psychophysiol Biofeedback 30(3): 239–258. Meilinger, T., B. E. Riecke, and H. H. Bülthoff. 2007. Orientation ����������������������������������������������������� specificity in long-term memory for environmental spaces. Proceedings of the Cognitive Sciences Society, Nashville, Tennessee, USA, August 1–4, 479–484. Meilinger, T., M. Knauff, and H. H. Bülthoff. 2008. Working memory in wayfinding: A dual task experiment in a virtual city. Proc Cog Sci 32(4): 755–770. Mittelstaedt, M. L., and S. Glasauer. 1991. Idiothetic navigation in gerbils and humans. Zool J Physiol 95: 427–435. Mittelstaedt, M. L., and H. Mittelstaedt. 1996. The influence of otoliths and somatic graviceptors on angular velocity estimation. J Vestib Res 6(5): 355–366. Mittelstaedt, M. L., and H. Mittelstaedt. 2001. Idiothetic navigation in humans: Estimation of path length. Exp Brain Res 13: 318–332. Mohler, B. J., J. L. Campos, M. Weyel, and H. H. Bülthoff. 2007a. Gait parameters while walking in a headmounted display virtual environment and the real world. Proc Eurographics, 85–88. Mohler, B. J., W. B. Thompson, S. H. Creem-Regehr, H. L. Pick, and W. H. Warren. 2007b. Visual flow influences gait transition speed and preferred walking speed. Exp Brain Res 181(2): 221–228. Mohler, B. J., W. B. Thompson, S. H. Creem-Regehr, P. Willemsen, H. L. Pick, and J. J. Rieser. 2007c. Calibration of locomotion due to visual motion in a treadmill-based virtual environment. ACM Trans Appl Percept 4(1): 20–32.
626
The Neural Bases of Multisensory Processes
Müller, P., P. Wonka, S. Haegler, A. Ulmer, and L. Van Gool. 2006. Procedural modeling of buildings. Proc ACM SIGGRAPH 2006/ACM Transactions on Graphics (TOG), 25(3): 614–623. New York: ACM Press. Nico, D., I. Israël, and A. Berthoz. 2002. Interaction of visual and idiothetic information in a path completion task. Exp Brain Res 146: 379–382. Nusseck, H.-G., H. J. Teufel, F. M. Nieuwenhuizen, and H. H. Bülthoff. 2008. Learning system dynamics: Transfer of training in a helicopter hover simulator. Proc AIAA Modeling and Simulation Technologies Conference and Exhibit, 1–11, AIAA, Reston, VA, USA. Peck, T. C., M. C. Whitton, and H. Fuchs. 2008. Evaluation of reorientation techniques for walking in large virtual environments. Proceedings of IEEE Virtual Reality, Reno, NV, 121–128. IEEE Computer Society. Pick, H. L., D. H. Warren, and J. C. Hay. 1969. Conflict in judgments of spatial direction. Percept Psychophys 6(4): 203, 1969. Pick, H. L., D. Wagner, J. J. Rieser, and A. E. Garing. 1999. The recalibration of rotational locomotion. J Exp Psychol Hum Percept Perform 25(5): 1179–1188. Proffitt, D. R., J. Stefanucci, T. Banton, and W. Epstein. 2003. The role of effort in perceiving distance. Psychol Sci 14(2): 106–112. Prokop, T., M. Schubert, and W. Berger. 1997. Visual influence on human locomotion. Exp Brain Res 114: 63–70. Razzaque, S., Z. Kohn, and M. Whitton. 2001. Redirected walking. Proceedings of Eurographics, 289–294. Manchester, UK. Razzaque, S., D. Swapp, M. Slater, M. C. Whitton, and A. Steed. 2002. Redirected walking in place. Proceedings of Eurographics, 123–130. Redlick, F. P., M. Jenkin, and L. R. Harris. 2001. Humans can use optic flow to estimate distance of travel. Vis Res 41: 213–219. Richardson, A. R., and D. Waller. 2005. The effect of feedback training on distance estimation in Virtual Environments. Appl Cogn Psychol 19: 1089–1108. Riecke, B. E., D. W. Cunningham, and H. H. Bülthoff. 2006. Spatial updating in virtual reality: The sufficiency of visual information. Psychol Res 71(3): 298–313. Riecke, B. E., H. A. H. C. van Veen, and H. H. Bülthoff, 2002. Visual homing is possible without landmarks— A path integration study in virtual reality. Pres Teleop Virtual Environ 11(5): 443–473. Riecke, B. E., A. Väljamäe, and J. Schulte-Pelkum. 2009. Moving sounds enhance the visually-induced selfmotion illusion (circular vection) in Virtual Reality. ACM Trans Appl Percept 6(2): 1–27. Rieser, J. J., D. H. Ashmead, C. R. Talor, and G. A. Youngquist. 1990. Visual perception and the guidance of locomotion without vision to previously seen targets. Perception 19(5): 675–689. Rieser, J. J., H. L. Pick, D. H. Ashmead, and A. E. Garing. 1995. Calibration of human locomotion and models of perceptual motor organization. J Exp Psychol Hum Percept Perform 21(3): 480–497. Ruddle, R. A., and S. Lessels. 2006. For efficient navigational search humans require full physical movement but not a rich visual scene. Psychol Sci 17: 460–465. Ruddle, R. A., and S. Lessels. 2009. The benefits of using a walking interface to navigate virtual environments. ACM Trans Comput-Hum Interact 16(1): 1–18. Rushton, S. K., J. M. Harris, and M. R. Lloyd. 1998. Guidance of locomotion on foot uses perceived target location rather than optic flow. Curr Biol 8(21): 1191–1194. Schnapp, B., and W. Warren. 2007. Wormholes in Virtual Reality: What spatial knowledge is learned for navigation? J Vis 7(9): 758, 758a. Seidman, S. H. 2008 Translational motion perception and vestiboocular responses in the absence of non-inertial cues. Exp Brain Res 184: 13–29. Sheik-Nainar, M. A., and D. B. Kaber. 2007. The utility of a Virtual Reality locomotion interface for studying gait behavior. Hum Factors 49(4): 696–709. Sholl, M. J. 1989. The relation between horizontality and rod-and-frame and vestibular navigational performance. J Exp Psychol Learn Mem Cogn 15: 110–125. Siegle, J., J. L. Campos, B. J. Mohler, J. M. Loomis, and H. H. Bülthoff. 2009. Measurement of instantaneous perceived self-motion using continuous pointing. Exp Brain Res 195(3): 429–444. Simons, D. J., and R. F. Wang. 1998. Perceiving real-world viewpoint changes. Psychol Sci 9: 315–320. Souman, J. L., P. Robuffo Giordano, I. Frissen, A. De Luca, and M. O. Ernst. 2010. Making virtual walking real: Perceptual evaluation of a new treadmill control algorithm. ACM Trans Appl Percept 7(2:11): 1–14. Souman, J. L., I. Frissen, M. Sreenivasa, and M. O. Ernst. 2009. Walking straight into circles. Curr Biol 19(18): 1538–1542.
Multimodal Integration during Self-Motion in Virtual Reality
627
Sun, H.-J., A. J. Lee, J. L. Campos, G. S. W. Chan, and D. H. Zhang. 2003. Multisensory integration in speed estimation during self-motion. Cyberpsychol Behav 6(5): 509–518. Sun, H.-J., J. L. Campos, and G. S. W. Chan. 2004a. Multisensory integration in the estimation of relative path length. Exp Brain Res 154(2): 246–254. Sun, H.-J., J. L. Campos, G. S. W. Chan, M. Young, and C. Ellard. 2004b. The contributions of static visual cues, nonvisual cues, and optic flow in distance estimation. Perception 33: 49–65. Tarr, M. J., and W. H. Warren. 2002. Virtual reality in behavioral neuroscience and beyond. Nat Neurosci 5: 1089–1092. Teufel, H. J., H.-G. Nusseck, K. A. Beykirch, J. S. Butler, M. Kerger, and H. H. Bülthoff. 2007. MPI Motion Simulator: Development and analysis of a novel motion simulator. Proc AIAA Modeling and Simulation Technologies Conference and Exhibit, 1–11, American Institute of Aeronautics and Astronautics, Reston, VA, USA. Thompson, W. B., P. Willemsen, A. A. Gooch, S. H. Creem-Regehr, J. M. Loomis, and A. C. Beall. 2004. Does the quality of the computer graphics matter when judging distances in visually immersive environments? Pres Teleop Virtual Environ 13(5): 560–571. Thompson, W. B., S. H. Creem-Regehr, B. J. Mohler, and P. Willemsen. 2005. Investigations on the interactions between vision and locomotion using a treadmill Virtual Environment. Proc. SPIE/IS&T Human Vision & Electronic Imaging Conference, January. Thomson, J. A. 1983. Is continuous visual monitoring necessary in visually guided locomotion? J Exp Psychol Hum Percept Perform 9: 427–443. Tristano, D., J. M. Hollerbach, and R. Christensen. 2000. Slope display on a locomotion interface. In Experimental Robotics VI, ed. P. Corke and J. Trevelyan, 193–201. London: Springer-Verlag. Väljamäe, A., P. Larsson, D. Västfjäll, and M. Kleiner. 2008. Sound representing self-motion in Virtual Environments enhances linear vection. Pres Teleop Virtual Environ 17(1): 43–56. Waller, D., J. M. Loomis, and D. B. M. Haun. 2004. Body-based senses enhance knowledge of directions in large-scale environments. Psychon Bull Rev 11(1): 157–163. Waller, D., J. M. Loomis, and S. D. Steck. 2003. Inertial cues do not enhance knowledge of environmental layout. Psychon Bull Rev 10: 987–993. Waller, D., E. Bachmann, E. Hodgson, and A. C. Beall. 2007. The HIVE: A Huge Immersive Virtual Environment for research in spatial cognition. Behav Res Methods 39: 835–843. Waller, D., and N. Greenauer. 2007. The role of body-based sensory information in the acquisition of enduring spatial representations. Psychol Res 71(3): 322–332. Waller, D., and A. R. Richardson. 2008. Correcting distance estimates by interacting with immersive virtual environments: Effects of task and available sensory information. J Exp Psychol Appl 14(1): 61–72. Warren, W. H., and D. J. Hannon. 1998. Direction of self-motion is perceived from optical-flow. Nature 336: 162–163. Warren, W. H., B. A. Kay, W. D. Zosh, A. P. Duchon, and S. Sahuc. 2001. Optic flow is used to control human walking. Nat Neurosci 4: 213–216. Warren, W. H., and K. J. Kurtz. 1992. The role of central and peripheral vision in perceiving the direction of self-motion. Percept Psycho 51(5): 443–454. Welch, R. B., and D. H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychol Bull 88(3): 638–667. Welchman, A. E., J. M. Lam, and H. H. Bülthoff. 2008. Bayesian motion estimation accounts for a surprising bias in 3D vision. Proc Natl Acad Sci U S A 105(33): 12087–12092. Wilkie, R. M., and J. P. Wann. 2005. The role of visual and nonvisual information in the control of locomotion. J Exp Psychol Hum Percept Perform 31(5): 901–911. Witmer, B. G., and P. B. Kline. 1998. Judging perceived and traversed distance in virtual environments. Pres Teleop Virtual Environ 7: 144–167. Wittlinger, M., R. Wehner, and H. Wolf. 2006. The ant odometer: Stepping on stilts and stumps. Science 312: 1965–1967. Yong, N. A., G. D. Paige, and S. H. Seidman. 2007. Multiple sensory cues underlying the perception of translation and path. J Neurophysiol 97: 1100–1113.
31
Visual–Vestibular Integration for Self-Motion Perception Gregory C. DeAngelis and Dora E. Angelaki
CONTENTS 31.1 The Problem of Self-Motion Perception and the Utility of Visual–Vestibular Integration.............................................................................................................................. 629 31.1.1 Optic Flow................................................................................................................. 630 31.1.2 Vestibular Signals...................................................................................................... 630 31.2 Potential Neural Substrates for Visual–Vestibular Integration.............................................. 631 31.3 Heading Tuning and Spatial Reference Frames in Area MSTd............................................ 633 31.3.1 Heading Tuning......................................................................................................... 633 31.3.2 Reference Frames...................................................................................................... 634 31.4 The Neuronal Combination Rule and Its Dependence on Cue Reliability............................ 636 31.5 Linking Neuronal and Perceptual Correlates of Multisensory Integration........................... 639 31.5.1 Behavioral Results.....................................................................................................640 31.5.2 Neurophysiological Results....................................................................................... 641 31.5.3 Correlations with Behavioral Choice......................................................................... 642 31.6 Conclusion.............................................................................................................................644 Acknowledgments........................................................................................................................... 645 References....................................................................................................................................... 645
31.1 THE PROBLEM OF SELF-MOTION PERCEPTION AND THE UTILITY OF VISUAL–VESTIBULAR INTEGRATION How do we perceive our direction of self-motion through space? To navigate effectively through a complex three-dimensional (3-D) environment, we must accurately estimate our own motion relative to objects around us. Self-motion perception is a demanding problem in sensory integration, requiring the neural combination of visual signals (e.g., optic flow), vestibular signals regarding head motion, and perhaps also somatosensory and proprioceptive cues (Hlavacka et al. 1992, 1996; Dichgans and Brandt 1974). Consider a soccer player running downfield to intercept a pass and head the ball toward the goal. This athlete must be able to accurately judge the trajectory of the ball relative to the trajectory of his/her self-motion, in order to precisely time his/her head thrust to meet the ball. Optic flow and vestibular signals are likely the two most sensitive cues for judging self-motion (Gu et al. 2007, 2008; Fetsch et al. 2009). To understand the need for multisensory integration of these cues, it is useful to consider the strengths and weaknesses of each cue. Although self-motion generally involves both translations and rotations of the observer, we shall limit the scope of this review to translational movements, such that we focus on visual and vestibular cues that determine our perceived direction of heading.
629
630
The Neural Bases of Multisensory Processes
31.1.1 Optic Flow It has long been recognized that visual cues provide a rich source of information about self-motion (Gibson 1950). As we move through the environment, the resulting pattern of full-field retinal motion (optic flow) can be used to estimate heading. In the simplest case, involving an observer with stationary eyes and head moving through a stationary scene, the location of the focus of radial expansion in the optic flow field provides a direct indicator of heading. Many visual psychophysical and theoretical studies have examined how heading can be computed from optic flow (see Warren 2003 for review). The notion that optic flow contributes to self-motion perception is further supported by the fact that optic flow, by itself, can elicit powerful illusions of self-motion. As early as 1875, Ernst Mach described self-motion sensations (i.e., circular and linear vection) induced by optic flow. Numerous studies have subsequently characterized the behavioral observation that largefield optic flow stimulation induces self-motion perception (e.g., Berthoz et al. 1975; Brandt et al. 1973; Dichgans and Brandt 1978). Interpretation of optic flow, however, becomes considerably complicated under more natural conditions. Specifically, optic flow is substantially altered by movements of the eyes and head (Banks et al. 1996; Crowell et al. 1998; Royden et al. 1992, 1994), and by motion of objects in the visual field (Royden and Hildreth 1996; Gibson 1954; Warren and Saunders 1995). An extensive literature, including studies cited above, has been devoted to perceptual mechanisms that compensate for eye and/or head rotation during translational self-motion, making use of both retinal and extraretinal signals (reviewed by Warren 2003). Perceptual compensation for eye and head movements is largely successful, and is likely aided by the fact that the brain contains internal signals related to eye and head movements (e.g., efference copy) that can be used to transform visual signals. The neural basis of this compensation for eye and head movements has been explored considerably (Bradley et al. 1996; Page and Duffy 1999; Shenoy et al. 1999), although our understanding of these compensatory mechanisms is far from complete. Motion of objects in the world presents an even greater challenge to interpretation of optic flow because the brain contains no internal signals related to object motion. In general, the brain needs to solve a source separation problem because optic flow on the retina at any moment in time includes two major components: flow resulting from self-motion along with the static 3-D structure of the environment, and flow resulting from the movement of objects relative to the observer. Some psychophysical studies have suggested that this source separation problem can be solved through purely visual analysis of optic flow (Rushton and Warren 2005; Warren and Rushton 2007, 2008; Matsumiya and Ando 2009), whereas other studies indicate that nonvisual signals may be essential for interpretation of optic flow in the presence of object motion (Wexler 2003; Wexler et al. 2001; Wexler and van Boxtel 2005). Although interactions between object and background motion have been studied physiologically (Logan and Duffy 2006), the neural mechanisms that solve this problem remain unclear. Vestibular signals may be of particular importance in dealing with object motion because the vestibular system provides an independent source of information about head movements that may help to identify optic flow that is inconsistent with self-motion (induced by moving objects).
31.1.2 Vestibular Signals The vestibular system provides a powerful independent source of information about head motion in space. Specifically, vestibular sensors provide information about the angular rotation and linear acceleration of the head in space (Angelaki 2004; Angelaki and Cullen 2008), and thus provide important inputs to self-motion estimation. A role of the vestibular system in the perception of selfmotion has long been acknowledged (Guedry 1974, 1978; Benson et al. 1986; Telford et al. 1995). With regard to heading perception, the limitations of optic flow processing might be overcome by making use of inertial motion signals from the vestibular otolith organs (Benson et al. 1986;
631
Visual–Vestibular Integration for Self-Motion Perception
Fernandez and Goldberg 1976a, 1976b; Guedry 1974). The otoliths behave much like linear accelerometers, and otolith afferents provide the basis for directional selectivity that could in principle be used to guide heading judgments. Indeed, with a sensory organ that signals real inertial motion of the head, one might ask why the nervous system should rely on visual information at all. Part of the answer is that even a reliable linear accelerometer has shortcomings, such as the inability to encode constant-velocity motion and the inability to distinguish between translation and tilt relative to gravity (due to Einstein’s equivalence principle). The latter problem may be resolved using angular velocity signals from the semicircular canals (Angelaki et al. 1999, 2004; Merfeld et al. 1999), but the properties of the canals render this strategy ineffective during low-frequency motion or static tilts. In fact, in the absence of visual cues, linear acceleration is often misperceived as tilt (the somatogravic illusion; Previc et al. 1992; Wolfe and Cramer 1970). This illusion can be quite dangerous for aviators, who feel compelled to pitch the nose of their aircraft downward to compensate for a nonexistent upward tilt, when in fact what they experienced was linear inertial acceleration. In summary, both the visual and vestibular systems are limited in their ability to unambiguously signal self-motion. A sensible approach for heading estimation would thus be to combine visual and vestibular information to overcome the limitations of each modality on its own. As discussed further below, this cross-modal integration can also improve perceptual discrimination of heading over what is possible for each modality alone. Thus, we suggest that multisensory integration of visual and vestibular inputs provides dual benefits: it overcomes important limitations of each sensory system alone and it provides increased sensitivity when both systems are active.
31.2 POTENTIAL NEURAL SUBSTRATES FOR VISUAL– VESTIBULAR INTEGRATION Where should one look in the brain to find neurons that integrate visual and vestibular signals for self-motion perception? One possibility is to look in portions of “visual” cortex that are known to carry selective responses to optic flow stimuli. Another possibility is to look in regions of “vestibular” cortex that may integrate otolith inputs with visual signals. Here, we briefly consider what is known about each of these possibilities. Optic flow–sensitive neurons have been found in the dorsal portion of the medial superior temporal area (MSTd; Tanaka et al. 1986; Duffy and Wurtz 1991, 1995), ventral intraparietal area (VIP; Bremmer et al. 2002a, 2002b; Schaafsma and Duysens 1996), posterior parietal cortex (7a; Siegel and Read 1997), and the superior temporal polysensory area (STP; Anderson and Siegel 1999). Among these areas, MSTd and VIP (Figure 31.1) currently stand out as good candidates for
VIP 3a
2v
PIVC
MST
FIGURE 31.1 (See color insert.) Illustration of some of the areas thought to be involved in processing of visual and/or vestibular signals for self-motion perception (see text for details). A partially inflated surface of cerebral cortex of a macaque monkey is shown. Colored regions indicate different functionally and anatomically defined areas. MST, medial superior temporal; VIP, ventral intra-parietal; PIVC, parieto-insular vestibular cortex.
632
The Neural Bases of Multisensory Processes
integrating visual and vestibular signals to subserve heading perception because (1) they have large receptive fields and selectivity for complex optic flow patterns that simulate self-motion (Duffy and Wurtz 1991, 1995; Tanaka et al. 1986; Tanaka and Saito 1989; Schaafsma and Duysens 1996; Bremmer et al. 2002a), (2) they show some compensation for shifts in the focus of expansion due to pursuit eye movements (Bradley et al. 1996; Zhang et al. 2004; Page and Duffy 1999), and (3) they have been causally linked to heading judgments based on optic flow in microstimulation studies (Britten and van Wezel 1998, 2002; Zhang and Britten 2003). Perhaps most importantly, MSTd and VIP also contain neurons sensitive to physical translation in darkness (Bremmer et al. 1999, 2002b; Duffy 1998; Gu et al. 2006; Chen et al. 2007; Schlack et al. 2002; Takahashi et al. 2007; Chowdhury et al. 2009). This suggests the presence of vestibular signals that may be useful for heading perception, and thus the potential for integration with optic flow signals. In addition to regions conventionally considered to be largely visual in nature, there are several potential loci within the vestibular system where otolith-driven signals regarding translation could be combined with optic flow signals. Putative visual–vestibular convergence has been reported as early as one or two synapses from the vestibular periphery, in the brainstem vestibular nuclei (Daunton and Thomsen 1979; Henn et al. 1974; Robinson 1977; Waespe and Henn 1977) and vestibulo-cerebellum (Markert et al. 1988; Waespe et al. 1981; Waespe and Henn 1981). However, responses to visual (optokinetic) stimuli within these subcortical circuits are more likely related to gaze stabilization and eye movements [optokinetic nystagmus (OKN), vestibulo-ocular reflex (VOR), and/or smooth pursuit] rather than self-motion perception per se. This conclusion is supported by recent experiments (Bryan and Angelaki 2008) showing a lack of optic-flow responsiveness in the vestibular and deep cerebellar nuclei when animals were required to fixate a head-fixed target (suppressing OKN). At higher stages of vestibular processing, several interconnected cortical areas have traditionally been recognized as “vestibular cortex” (Fukushima 1997; Guldin and Grusser 1998), and are believed to receive multiple sensory inputs, including visual, vestibular, and somatosensory/pro prioceptive signals. Specifically, three main cortical areas (Figure 31.1) have been characterized as either exhibiting responses to vestibular stimulation and/or receiving short-latency vestibular signals (trisynaptic through the vestibular nuclei and the thalamus). These include: (1) area 2v, located in the transition zone of areas 2, 5, and 7 near the lateral tip of the intraparietal sulcus (Schwarz and Fredrickson 1971a, 1971b; Fredrickson et al. 1966; Buttner and Buettner 1978); (2) the parietoinsular vestibular cortex (PIVC), located between the auditory and secondary somatosensory cortices (Grusser et al. 1990a, 1990b); and (3) area 3a, located within the central sulcus extending into the anterior bank of the precentral gyrus (Odkvist et al. 1974; Guldin et al. 1992). In addition to showing vestibular responsiveness, neurons in PIVC (Grusser et al. 1990b) and 2v (Buttner and Buettner 1978) were reported to show an influence of visual/optokinetic stimulation, similar to subcortical structures. However, these studies did not conclusively demonstrate that neurons in any of these areas provide robust information about self-motion from optic flow. Indeed, we have recently shown that PIVC neurons generally do not respond to brief (2-second) optic flow stimuli with a Gaussian velocity profile (Chen et al. 2010), whereas these same visual stimuli elicit very robust directional responses in areas MSTd and VIP (Gu et al. 2006; Chen et al. 2007). Thus far, we also have not encountered robust optic flow selectivity in area 2v (unpublished observations). In summary, the full repertoire of brain regions that carry robust signals related to both optic flow and inertial motion remains to be further elaborated, and other areas that serve as important players in multisensory integration for self-motion perception may yet emerge. However, two aspects of the available data are fairly clear. First, extrastriate areas MSTd and VIP contain robust representations of self-motion direction based on both visual and vestibular cues. Second, traditional vestibular cortical areas (PIVC, 2v) do not appear to have sufficiently robust responses to optic flow to be serious candidates for the neural basis of multimodal heading perception. In the remainder of this review, we shall therefore focus on what is known about visual–vestibular integration in area MSTd, as this area has been best studied so far.
633
Visual–Vestibular Integration for Self-Motion Perception
31.3 HEADING TUNING AND SPATIAL REFERENCE FRAMES IN AREA MSTD 31.3.1 Heading Tuning The discovery of vestibular translation responses in MSTd, first reported by Duffy (1998), was surprising because this area is traditionally considered part of the extrastriate visual cortex. The results of Duffy’s groundbreaking study revealed a wide variety of visual–vestibular interactions in MSTd, including enhancement and suppression of responses relative to single-cue conditions, as well as changes in cells’ preferred direction with anticongruent stimulation. Building upon Duffy’s findings, we used a custom-built virtual reality system (Figure 31.2a) to examine the spatial tuning of MSTd neurons in three dimensions (Figure 31.2b), making use of
(a)
(b) Up
Mirror
Righ t
ft
Left
Field coil
Fo re
Down
Screen
A
Monkey
(c)
6 degrees of freedom motion platform
0.3
1.0
0.2
0.5
0.1
0.0
0.0 –0.1
–0.5 –1.0 0.0
(d) Vestibular
–90 –45
Visual
Acceleration Velocity 0.5 1.0 1.5
Time (s)
–0.2 –0.3 2.0
Combined 40 20
45 90
10
(e) –90 –45
60
0
40
Firing rate (sp/s)
30
0
Elevation (º)
Velocity (m/s)
Acceleration (m/s2)
Projector
20
45 90 270
180
90
0
–90
270
180
90
0
–90
270
180
90
0
–90
Azimuth (º)
FIGURE 31.2 (a–c) Apparatus and stimuli used to examine visual–vestibular interactions in rhesus monkeys. (a) 3-D virtual reality system, (b) heading trajectories, and (c) velocity and acceleration profiles used by Gu et al. (2006). 3-D heading tuning functions of two example MSTd neurons: (d) a “congruent cell” and (e) an “opposite” cell. Firing rate (grayscale) is plotted as a function of azimuth (abscissa) and elevation (ordinate) of heading trajectory. For each cell, tuning was measured in three stimulus conditions: vestibular (inertial motion only), visual (optic flow only), and combined visual–vestibular stimulation. (Adapted from Gu, Y. et al., J. Neurosci., 26, 73–85, 2006.)
634
The Neural Bases of Multisensory Processes
stimuli with a Gaussian stimulus velocity profile (Figure 31.2c) that is well suited to activating the otolith organs (Gu et al. 2006; Takahashi et al. 2007). Heading tuning was measured under three stimulus conditions: visual only, vestibular only, and a combined condition in which the stimulus contained precisely synchronized optic flow and inertial motion. We found that about 60% of MSTd neurons show significant directional tuning for both visual and vestibular heading cues. MSTd neurons showed a wide variety of heading preferences, with individual neurons being tuned to virtually all possible directions of translation in 3-D space. Notably, however, there was a strong bias for MSTd neurons to respond best to lateral motions within the frontoparallel plane (i.e., left/right and up/down), with relatively few neurons preferring fore–aft directions of motion. This was true for both visual and vestibular tuning separately (Gu et al. 2006, 2010). Interestingly, MSTd neurons seemed to fall into one of two categories based on their relative preferences for heading defined by visual and vestibular cues. For congruent cells, the visual and vestibular heading preferences are closely matched, as illustrated by the example neuron shown in Figure 31.2d. This neuron preferred rightward motion of the head in both the visual and vestibular conditions. In contrast, opposite cells have visual and vestibular heading preferences that are roughly 180° apart (Gu et al. 2006). For example, the opposite cell in Figure 31.2e prefers rightward and slightly upward motion in the vestibular condition, but prefers leftward and slightly downward translation in the visual condition. For this neuron, responses in the combined stimulus condition (Figure 31.2e, right panel) were very similar to those elicited by optic flow in the visual condition. This pattern of results was common in the study of Gu et al. (2006). However, as discussed further below, this apparent visual dominance was because high-coherence visual stimuli were used. We shall consider this issue in considerably more detail in the next section. The responses of MSTd neurons to translation in the vestibular condition were found to be very similar when responses were recorded during translation in complete darkness (as opposed to during viewing of a fixation target on a dim background), suggesting that spatial tuning seen in the vestibular condition (e.g., Figure 31.2d, e) was indeed of labyrinthine origin (Gu et al. 2006; Chowdhury et al. 2009). To verify this, we examined the responses of MSTd neurons after a bilateral labyrinthectomy. After the lesion, MSTd neurons did not give significant responses in the vestibular condition, and spatial tuning was completely abolished (Gu et al. 2007; Takahashi et al. 2007). Thus, responses observed in MSTd during the vestibular condition arise from otolith-driven input.
31.3.2 Reference Frames Given that neurons in MSTd show spatial tuning for both visual and vestibular inputs, a natural question arises regarding the spatial reference frames of these signals. Vestibular signals regarding translation must initially be coded by the otolith afferents in head-centered coordinates, because the vestibular organs are fixed in the head. In contrast, visual motion signals must initially be coded in retinal (eye-centered) coordinates. Since these two signals arise in different spatial frames of reference, how are they coded when they are integrated by MSTd neurons? Some researchers have suggested that signals from different sensory systems should be expressed in a common reference frame when they are integrated (Groh 2001). On the other hand, computational models show that neurons can have mixed and intermediate reference frames while still allowing signals to be decoded accurately (Deneve et al. 2001; Avillac et al. 2005). To investigate this issue, we tested whether visual and vestibular heading signals in MSTd share a common reference frame (Fetsch et al. 2007). To decouple head-centered and eye-centered coordinates, we measured visual and vestibular heading tuning while monkeys fixated on one of three target locations: straight ahead, 20–25° to the right, and 20–25° to the left. If heading is coded in eye-centered coordinates, the heading preference of the neuron should shift horizontally (in azimuth) by the same amount as the gaze is deviated from straight ahead. If heading is coded in head-centered coordinates, then the heading preference should remain constant as a function of eye position.
635
Visual–Vestibular Integration for Self-Motion Perception
Firing rate (sp/s)
Figure 31.3a shows the effect of eye position on the vestibular heading preference of an MSTd neuron. In this case, heading preference (small white circles connected by dashed line) remains quite constant as eye position varies, indicating head-centered tuning. Figure 31.3b shows the effect of eye position on the visual heading tuning of another MSTd neuron. Here, the heading preference clearly shifts with eye position, such that the cell signals heading in an eye-centered frame of reference. A cross-correlation technique was used to measure the amount of shift of the heading preference relative to the change in eye position. This yields a metric, the displacement index, which
(a)
(b)
Cell 1: Vestibular
Cell 2: Visual
120
40 30
90
–20º
20
60 30
10
Elevation (º)
0º
45
Number of cells
40
+20º
0 45 90 270 180
90
0
–90 270 180
Azimuth (º)
0.89
0.24
25 20
Visual Vestibular
15 10 5 0
–0.5
0.0
0
–90
1.0
Eye-centered
35 30
90
(d)
Headcentered
0.5 1.0 1.5 2.0 Displacement index
2.5
Displacement index
(c)
–90 –45
0.8 0.6 0.4 0.2 0.0
Vestibular Visual Combined
0.5
1 2 Visual–vestibular ratio
4
FIGURE 31.3 Reference frames of visual and vestibular heading signals in MSTd. Tuning functions are plotted for two example cells in (a) vestibular and (b) visual conditions, measured separately at three static eye positions along horizontal meridian: −20° (top), 0° (middle), and +20° (bottom). Dashed white line connects preferred heading in each case, to illustrate horizontal shift (or lack thereof) of tuning function across eye positions. (c) Histogram of displacement index (DI) values for MSTd neurons tested in vestibular (black bars) and visual (gray bars) conditions. DI is defined as angular shift of the tuning function normalized by change in eye position; thus a value of 0 indicates a head- (or body-) centered reference frame and 1 indicates an eyecentered frame. (d) Binned average DI values for three stimulus conditions (vestibular, visual, combined) as a function of relative strength of visual and vestibular single-cue tuning (visual/vestibular ratio). (Adapted from Fetsch, C.R. et al., J. Neurosci., 27, 700–712, 2007.)
636
The Neural Bases of Multisensory Processes
will be 0.0 for head-centered tuning and 1.0 for eye-centered tuning. As shown in Figure 31.3c, we found that visual heading tuning was close to eye-centered, with a median displacement index of 0.89. In contrast, vestibular heading tuning was found to be close to head-centered, with a median displacement index of 0.24. This value for the vestibular condition was significantly larger than 0.0, indicating that vestibular heading tuning was slightly shifted toward eye-centered coordinates. These data show that visual and vestibular signals in MSTd are not expressed in a common reference frame. By conventional thinking, this might cast doubt on the ability of this area to perform sensory integration for heading perception. However, computational modeling suggests that sensory signals need not explicitly occupy a common reference frame for integration to occur (Avillac et al. 2005; Fetsch et al. 2007; Deneve et al. 2001). Moreover, as we will see in a later section, MSTd neurons can account for improved behavioral sensitivity under cue combination. Thus, the conventional and intuitive notion that sensory signals need to be expressed in a common reference frame for multisensory integration to occur may need to be discarded. The results of the study by Fetsch et al. (2007) also provide another challenge to conventional ideas regarding multisensory integration and reference frames. To our knowledge, all previous studies on reference frames of sensory signals have only examined responses during unisensory stimulation. Also relevant is the reference frame exhibited by neurons during combined, multimodal stimulation, and how this reference frame depends on the relative strengths of responses to the two sensory modalities. To examine this issue, Fetsch et al. (2007) measured the reference frame of activity during the combined (visual–vestibular) condition, as well as the unimodal conditions. Average displacement index values were computed as a function of the relative strength of unimodal visual and vestibular responses [visual/vestibular ratio (VVR)]. For the visual (circles) and vestibular (squares) conditions, the average displacement index did not systematically depend on VVR (Figure 31.3d), indicating that the reference frame in the unimodal conditions was largely independent of the relative strengths of visual and vestibular inputs to the neuron under study. In contrast, for the combined condition (diamonds), the average displacement index changed considerably as a function of VVR, such that the reference frame of combined responses was more head-centered for neurons with low VVR and more eye-centered for neurons with high VVR (Figure 31.3d). Thus, the reference frame of responses to multimodal stimuli can vary as a function of the relative strengths of the visual and vestibular inputs. This has potentially important implications for understanding how multisensory responses are decoded, and deserves further study.
31.4 THE NEURONAL COMBINATION RULE AND ITS DEPENDENCE ON CUE RELIABILITY An issue of great interest in multisensory integration has been the manner in which neurons combine their unimodal sensory inputs. Specifically, how is the response to a bimodal stimulus related to the responses to the unimodal components presented separately? Traditionally, this issue has been examined by computing one of two metrics: (1) a multisensory enhancement index, which compares the bimodal response to the largest unimodal response, and (2) an additivity index, which compares the bimodal response to the sum of the unimodal responses (Stein and Stanford 2008). In classic studies of visual–auditory integration in the superior colliculus (Stein and Meredith 1993), bimodal responses were often found to be superadditive (larger than the sum of the unimodal responses) and this was taken as evidence for a nonlinear cue combination rule such as multiplication (Meredith and Stein 1983, 1986). In contrast, a variety of studies of multisensory integration in cortical areas have reported subadditive interactions (Avillac et al. 2007; Morgan et al. 2008; Sugihara et al. 2006). Some of this variation is likely accounted for by variations in the efficacy of unimodal stimuli, as recent studies in the superior colliculus has demonstrated that superadditive interactions become additive or even subadditive as the strength of unimodal stimuli increases (Perrault et al. 2003, 2005; Stanford et al. 2005).
Visual–Vestibular Integration for Self-Motion Perception
637
Although many studies have measured additivity and/or enhancement of multisensory responses, there has been a surprising lack of studies that have directly attempted to measure the mathematical rule by which multisensory neurons combine their unimodal inputs (hereafter the “combination rule”). Measuring additivity (or enhancement) for a limited set of stimuli is not sufficient to characterize the combination rule. To illustrate this point, consider a hypothetical neuron whose bimodal response is the product (multiplication) of its unimodal inputs. The response of this neuron could appear to be subadditive (e.g., 2 × 1 = 2), additive (2 × 2 = 4), or superadditive (2 × 3 = 6) depending on the magnitudes of the two inputs to the neuron. Thus, to estimate the combination rule, it is essential to examine responses to a wide range of stimulus variations in both unimodal domains. Recently, we have performed an experiment to measure the combination rule by which neurons in area MSTd integrate their visual and vestibular inputs related to heading (Morgan et al. 2008). We asked whether bimodal responses in MSTd are well fit by a weighted linear summation of unimodal responses, or whether a nonlinear (i.e., multiplicative) combination rule is required. We also asked whether the combination rule changes with the relative reliability of the visual and vestibular cues. To address these questions, we presented eight evenly spaced directions of motion (45° apart) in the horizontal plane (Figure 31.4, inset). Unimodal tuning curves (Figure 31.4a–c, margins) were measured by presenting these eight headings in both the vestibular and visual stimulus conditions. In addition, we measured a full bimodal interaction profile by presenting all 64 possible combinations of these 8 vestibular and 8 visual headings, including 8 congruent and 56 incongruent (cueconflict) conditions. Figure 31.4a–c shows data from an exemplar “congruent” cell in area MSTd. The unimodal tuning curves (margins) show that this neuron responded best to approximately rightward motion (0°) in both the visual and vestibular conditions. When optic flow at 100% coherence was combined with vestibular stimulation, the bimodal response profile of this neuron (grayscale map in Figure 31.4a) was dominated by the visual input, as indicated by the horizontal band of high firing rates. When the optic flow stimulus was weakened by reducing the motion coherence to 50% (Figure 31.4b), the bimodal response profile showed a more balanced, symmetric peak, indicating that the bimodal response now reflects roughly equal contributions of visual and vestibular inputs. When the motion coherence was further reduced to 25% (Figure 31.4c), the unimodal visual tuning curve showed considerably reduced amplitude and the bimodal response profile became dominated by the vestibular input, as evidenced by the vertical band of high firing rates. Thus, as the relative strengths of visual and vestibular cues to heading vary, bimodal responses of MSTd neurons range from visually dominant to vestibularly dominant. To characterize the combination rule used by MSTd neurons in these experiments, we attempted to predict the bimodal response profile as a function of the unimodal tuning curves. We found that bimodal responses were well fit by a weighted linear summation of unimodal responses (Morgan et al. 2008). On average, this linear model accounted for ~90% of the variance in bimodal responses, and adding various nonlinear components to the model (such as a product term) accounted for only 1–2% additional variance. Thus, weighted linear summation provides a good model for the combination rule used in MSTd, and the weights are typically less than 1 (Figure 31.4d, e), indicating that subadditive interactions are commonplace. How does the weighted linear summation model of MSTd integration depend on the reliability of the cues to heading? As the visual cue varies in reliability due to changes in motion coherence, the bimodal response profile clearly changes shape (Figure 31.4a–c). There are two basic possible explanations for this change in shape. One possibility is that the bimodal response profile changes simply from the fact that lower coherences elicit visual responses with weaker modulation as a function of heading. In this case, the weights with which each neuron combines its vestibular and visual inputs remain constant and the decreased visual influence in the bimodal response profile is simply due to weaker visual inputs at lower coherences. In this scenario, each neuron has a combination rule that is independent of cue reliability. A second possibility is that the weights given to the vestibular and visual inputs could change with the relative reliabilities of the two cues. This outcome
638
The Neural Bases of Multisensory Processes
90
50% Coherence
(b) 180 90
50 25
0
0
40 0 –90 0
90 180
25% Coherence
180 90
50 25
0 –90
0 –90 0
90 180
Vestibular heading (º)
270º
(d) 25
90 180
45º 0º
225º
Firing rate (sp/s)
Visual heading (º)
–90 0
90º
180º
0
–90
(c)
135º
25
0
–90
50
315º 0.55
(e) 25
0.81
Coherence 100% 50%
20 15
15
10
10
5
5
0 0.0 (f ) 1.5
0.4 0.8 Vestibular weight
0
1.2
0.5
25
50
75
Coherence (%)
0.0
(g) 1.5
1
0
0.72 0.87
20
Visual weight
180
Number of cells
100% Coherence
Vestibular weight
(a)
100
0.4 0.8 Visual weight
1.2
1 0.5 0
25
50
75
100
Coherence (%)
FIGURE 31.4 Effects of cue strength (motion coherence) on weighted summation of visual and vestibular inputs by MSTd neurons. (a–c) Comparison of unimodal and bimodal tuning for a congruent MSTd cell, tested at three motion coherences. Grayscale maps show mean firing rates as a function of vestibular and visual headings in bimodal condition (including all 64 possible combinations of 8 visual headings and 8 vestibular headings at 45° intervals). Tuning curves along left and bottom margins show mean (±SEM) firing rates versus heading for unimodal visual and vestibular conditions, respectively. (a) Bimodal responses at 100% coherence are visually dominated. (b) Bimodal responses at 50% coherence show a balanced contribution of visual and vestibular cues. (c) At 25% coherence, bimodal responses appear to be dominated by vestibular input. (d–g) Dependence of vestibular and visual weights on visual motion coherence. Vestibular and visual weights for each MSTd neuron were derived from linear fits to bimodal responses. (d, e) Histograms of vestibular and visual weights computed from data at 100% (black) and 50% (gray) coherence. Triangles are plotted at medians. (f, g) Vestibular and visual weights are plotted as a function of motion coherence for each neuron examined at multiple coherences. Data points are coded by significance of unimodal visual tuning (open vs. filled circles). (Adapted from Morgan, M.L. et al., Neuron, 59, 662–673, 2008.)
would indicate that the neuronal combination rule is not fixed, but changes with cue reliability. This is a fundamental issue of considerable importance in multisensory integration. To address this issue, we obtained the best fit of the weighted linear summation model separately for each motion coherence. At all coherences, the linear model provided a good fit to the bimodal responses. The key question then becomes whether the visual and vestibular weights attributed to each neuron remain constant as a function of coherence or whether they change systematically. Figure 31.4d, e shows the distributions of weights obtained at 100% (black bars) and 50% (gray
Visual–Vestibular Integration for Self-Motion Perception
639
bars) coherence. The average visual weight is significantly higher at 100% coherence than 50% coherence, whereas the average vestibular weight shows the opposite effect. For all neurons that were tested at multiple coherences, Figure 31.4f, g shows how the vestibular and visual weights, respectively, change with coherence for each neuron. There is a clear and significant trend for vestibular weights to decline with coherence, whereas visual weights increase (Morgan et al. 2008). A model in which the weights are fixed across coherences does not fit the data as well as a model in which the weights vary with coherence, for the majority of neurons (Morgan et al. 2008). The improvement in model fit with variable weights (although significant) is rather modest for most neurons, however, and it remains to be determined whether these weight changes have large or small effects on population codes for heading. The findings of Morgan et al. (2008) could have important implications for understanding the neural circuitry that underlies multisensory integration. Whereas the neuronal combination rule is well described as weighted linear summation for any particular values of stimulus strength/energy, the weights in this linear combination rule are not constant when stimulus strength varies. If MSTd neurons truly perform a simple linear summation of their visual and vestibular inputs, then this finding would suggest that the synaptic weights of these inputs change as a function of stimulus strength. Although this is possible, it is not clear how synaptic weights would be dynamically modified from moment to moment when the stimulus strength is not known in advance. Yet, it is well established that human cue integration behavior involves a dynamic, trial-by-trial reweighting of cues. A recent neural theory of cue integration shows that neurons that simply sum their multisensory inputs can account for dynamic cue reweighting at the perceptual level, if their spiking statistics fall into a Poisson-like family (Ma et al. 2006). In this theory, it was not necessary for neurons to change their combination rule with stimulus strength, but this is what the results of Morgan et al. (2008) demonstrate. One possible resolution to this conundrum is that multisensory neurons linearly sum their inputs with fixed weights, at the level of membrane potential, but that some network-level nonlinearity makes the weights appear to change with stimulus strength. A good candidate mechanism that may account for the findings of Morgan et al. (2008) is divisive normalization (Carandini et al. 1997; Heeger 1992). In a divisive normalization circuit, each cell performs a linear weighted summation of its inputs at the level of membrane potential, but the output of each neuron is divided by the summed activity of all neurons in the circuit (Heeger 1992). This model has been highly successful in accounting for how the responses of neurons in the primary visual cortex (V1) change with stimulus strength (i.e., contrast; Carandini et al. 1997) and how neurons in visual area MT combine multiple motion signals (Rust et al. 2006), and has also recently been proposed as an explanation for how selective attention modifies neural activity (Lee and Maunsell 2009; Reynolds and Heeger 2009). Recent modeling results (not shown) indicate that divisive normalization can account for the apparent changes in weights with coherence (Figure 31.4f, g), as well as a variety of other classic findings in multisensory integration (Ohshiro et al. 2011). Evaluating the normalization model of multisensory integration is a topic of current research in our laboratories.
31.5 LINKING NEURONAL AND PERCEPTUAL CORRELATES OF MULTISENSORY INTEGRATION Most physiological studies of multisensory integration have been performed in animals that are anesthetized or passively experiencing sensory stimuli. Ultimately, to understand the neural basis of multisensory cue integration, we must relate neural activity to behavioral performance. Because cue integration may only occur when cues have roughly matched perceptual reliabilities (Alais and Burr 2004; Ernst and Banks 2002), it is critical to address the neural mechanisms of sensory integration under conditions in which cue combination is known to take place perceptually. As a first major step in this direction, we have developed a multisensory heading discrimination task for monkeys
640
The Neural Bases of Multisensory Processes
(Gu et al. 2008; Fetsch et al. 2009). This task enabled us to ask two fundamental questions that had remained unaddressed: (1) Can monkeys integrate visual and vestibular cues near-optimally to improve heading discrimination performance? (2) Can the activity of MSTd neurons account for the behavioral improvement observed?
31.5.1 Behavioral Results Monkeys were trained to report their perceived heading relative to straight ahead in a two-alternative forced choice task (Figure 31.5a). In each trial of this task, the monkey experienced a forward motion with a small leftward or rightward component, and the animal’s task was to make a saccade to one of two choice targets to indicate its perceived heading. Again, three stimulus conditions (visual, vestibular, and combined) were examined, except that the heading angles during the task were limited to a small range around straight forward. Psychometric functions were plotted as the proportion of rightward choices as a function of heading angle (negative, leftward; positive, rightward) and fit with a cumulative Gaussian function (Wichmann and Hill 2001). The standard deviation (σ) of the fitted function was taken as the psychophysical threshold, corresponding to the heading at which the subject was approximately 84% correct.
Time
0.5
0.0 –8
σves = 3.5º σvis = 3.6º σcom = 2.3º –4
0
4
Heading direction (º)
8
3.5
Monkey C 1.2
3.0
1.0
2.5 2.0
Monkey A
0.8
tib ul ar Vi Co sua m l b Pr ine ed d ic tio n
(c) Vestibular Visual Combined
Threshold (º)
1.0
Ve s
Proportion of rightward decisions
(b)
Ve s
1
4
3
2
tib ul ar Vi s Co ua m l Pr bin e d ed ic tio n
(a)
Stimulus condition
FIGURE 31.5 Heading discrimination task and behavioral performance. (a) After fixating a visual target, the monkey experienced forward motion (real and/or simulated with optic flow) with a small leftward or rightward component, and subsequently reported his perceived heading (“left” vs. “right”) by making a saccadic eye movement to one of two choice targets. (b) Psychometric functions for one animal under unimodal (vestibular: dashed curve, visual: solid curve) and bimodal (gray curve) conditions. Psychophysical threshold was defined as the standard deviation (σ) of fitted cumulative Gaussian. (c) Summary of measured and predicted psychophysical thresholds for monkey C. Bars show average threshold (±SE) for vestibular (white), visual (dark gray), and combined conditions (black), along with predicted threshold for combined condition assuming optimal cue integration (light gray). (d) Summary of psychophysical performance for monkey A. (Adapted from Gu, Y. et al., Neuron, 66, 596–609, 2008.)
Visual–Vestibular Integration for Self-Motion Perception
641
Optimal cue-integration models (e.g., Alais and Burr 2004; Ernst and Banks 2002; Knill and 2 Saunders 2003) predict that the threshold in the combined condition (σ comb ) should be lower than the 2 2 single-cue thresholds (σ ves, σ vis), as given by the following expression: 2 σ comb =
2 2 σ vis σ ves 2 2 + σ ves σ vis
(31.1)
To maximize the predicted improvement in performance, the reliability of the visual and vestibular cues (as measured by thresholds in the single-cue conditions) was matched by adjusting the motion coherence of optic flow in the visual display (for details, see Gu et al. 2008). Psychometric functions for one animal are plotted in Figure 31.5b. The vestibular (filled symbols, dashed curve) and visual (open symbols, solid curve) functions are nearly overlapping, with thresholds of 3.5° and 3.6°, respectively. In the combined condition (gray symbols and curve), the monkey’s heading threshold was substantially smaller (2.3°), as evidenced by the steeper slope of the psychometric function. Figure 31.5c, d summarizes the psychophysical data from two monkeys. For both animals, psychophysical thresholds in the combined condition were significantly lower than thresholds in the visual and vestibular conditions, and were quite similar to the optimal predictions generated from Equation 31.1 (Gu et al. 2008). Thus, monkeys integrate visual and vestibular cues near-optimally to improve their sensitivity in the heading discrimination task. Similar results were also found for human subjects (Fetsch et al. 2009).
31.5.2 Neurophysiological Results Having established robust cue integration behavior in macaques, we recorded from single neurons in area MSTd while monkeys performed the heading discrimination task (Gu et al. 2008). Figure 31.6a, b shows tuning curves from two example neurons tested with heading directions evenly spaced in the horizontal plane. The neuron in Figure 31.6a preferred leftward (negative) headings for both visual and vestibular stimuli, and was classified as a congruent cell. In contrast, the neuron in Figure 31.6b preferred leftward headings under the visual condition (solid line) and rightward headings under the vestibular condition (dashed line), and was classified as an opposite cell. Figure 31.6c and d shows the tuning of these example neurons over the much narrower range of headings sampled during the discrimination task. For the congruent cell (Figure 31.6c), heading tuning became steeper in the combined condition, whereas for the opposite cell (Figure 31.6d) it became flatter. To allow a more direct comparison between neuronal and behavioral sensitivities, we used signal detection theory receiver operating characteristic (ROC) analysis; Bradley et al. 1987; Green and Swets 1966; Britten et al. 1992) to quantify the ability of an ideal observer to discriminate heading based on the activity of a single neuron (Figure 31.6e and f, symbols). As with the psychometric data, we fitted these neurometric data with cumulative Gaussian functions (Figure 31.6e and f, smooth curves) and defined the neuronal threshold as the standard deviation of the Gaussian. For the congruent neuron in Figure 31.6e, the neuronal threshold was smallest in the combined condition (gray symbols and lines), indicating that the neuron could discriminate smaller variations in heading when both cues were provided. In contrast, for the opposite neuron in Figure 31.6f, the reverse was true: the neuron became less sensitive in the presence of both cues (gray symbols and lines). The effect of visual–vestibular congruency on neuronal sensitivity in the combined condition was robust across the population of recorded MSTd neurons. To summarize this effect, we defined a congruency index (CI) that ranged from +1 (when visual and vestibular tuning functions have a consistent slope, e.g., Figure 31.6c) to −1 (when they have opposite slopes; Figure 31.6d) (for details, see Gu et al. 2008). We then computed, for each neuron, the ratio of the neuronal threshold in the combined condition to the expected threshold if neurons combine cues optimally according to
642
The Neural Bases of Multisensory Processes
40
(a)
30
(b)
Vestibular Visual
0º
80
–90º
20
90º ±180º
Firing rate (sp/s)
10
40 20
0 –180 –90 40
60
(c)
0
90
180 0º
Combined
0 –180 –90 40
0
90
180
0
10
20
(d)
30
30
20 20
10
Proportion ‘rightward’ decisions of ideal observer
10 –10 1.0
–5
0
5
10
(e)
1.0
0.5
0.0 –10
0 –20
σves = 5.1º σvis = 2.6º σcom = 1.8º –5
0
5
10
–10
(f )
0.5
0.0 –20
σves = 5.7º σvis = 2.6º σcom = 40.8º –10
0
10
20
Heading direction (º)
FIGURE 31.6 Heading tuning and heading sensitivity in area MSTd. (a–b) Heading tuning curves of two example neurons with (a) congruent and (b) opposite visual–vestibular heading preferences. (c–d) Responses of same neurons to a narrow range of heading stimuli presented while the monkey performed the discrimination task. (e–f) Neurometric functions computed by ROC analysis from firing rate data plotted in pan els (c) and (d). Smooth curves show best-fitting cumulative Gaussian functions. (Adapted from Gu, Y. et al., Neuron, 66, 596– 609, 2008.)
Equation 31.1. A significant correlation was seen between the combined threshold/predicted threshold ratio and CI (Figure 31.7a), such that neurons with large positive CIs (congruent cells, black circles) had thresholds close to the optimal prediction (ratios near unity). Thus, neuronal thresholds for congruent MSTd cells followed a pattern similar to the monkeys’ behavior. In contrast, combined thresholds for opposite cells were generally much higher than predicted from optimal cue integration (Figure 31.7a, open circles), indicating that these neurons became less sensitive during cue combination.
31.5.3 Correlations with Behavioral Choice If monkeys rely on area MSTd for heading discrimination, the results of Figure 31.7a suggest that they selectively monitor the activity of congruent cells and not opposite cells. To test this hypothesis, we used the data from the recording experiments (Gu et al. 2007, 2008) to compute “choice
643
Visual–Vestibular Integration for Self-Motion Perception
Threshold ratio (combined/predicted)
(a)
Congruent Opposite Intermediate
10
1
–1.0
Choice probability
(b)
–0.5
0.0
0.5
1.0
0.8 0.7 0.6 0.5 0.4 0.3 0.2 –1.0
–0.5 0.0 0.5 Congruency index
1.0
FIGURE 31.7 Neuronal thresholds and choice probabilities as a function of visual–vestibular congruency in combined condition. (a) Ordinate in this scatter plot represents ratio of threshold measured in combined condition to prediction from optimal cue integration. Abscissa represents CI of heading tuning for visual and vestibular responses. Asterisks denote neurons for which CI is not significantly different from zero. Dashed horizontal line denotes that threshold in combined condition is equal to the prediction. (b) Choice probability (CP) data are plotted as a function of congruency index for each MSTd neuron tested in combined condition. Note that congruent cells (black filled symbols), which have neuronal thresholds similar to optimal prediction in panel (a), also have CPs consistently and substantially larger than 0.5. (Adapted from Gu, Y. et al., Neuron, 66, 596–609, 2008.)
probabilities” (CPs) (Britten et al. 1996). CPs are computed by ROC analysis similar to neuronal thresholds, except that the ideal observer is asked to predict the monkey’s choice (rather than the stimulus) from the firing rate of the neuron. This analysis is performed after the effect of heading on response has been removed, such that it isolates the effect of choice on firing rates. Thus, CPs quantify the relationship between trial-to-trial fluctuations in neural firing rates and the monkeys’ perceptual decisions. A CP significantly greater than 0.5 indicates that the monkey tended to choose the neuron’s preferred sign of heading (leftward or rightward) when the neuron fires more strongly. Such a result is thought to reflect a functional link between the neuron and perception (Britten et al. 1996; Krug 2004; Parker and Newsome 1998). Notably, although MSTd is classically considered visual cortex, CPs significantly larger than 0.5 (mean = 0.55) were seen in the vestibular condition (Gu et al. 2007), indicating that MSTd activity is correlated with perceptual decisions about heading based on nonvisual information. It is of particular interest to examine the relationship between CP and CI in the combined condition, where the monkey makes use of both visual and vestibular cues. Given that opposite cells become insensitive during cue combination and congruent cells increase sensitivity, we might expect CP to depend on congruency in the combined condition. Indeed, Figure 31.7b shows that there is a
644
The Neural Bases of Multisensory Processes
robust correlation between CP and CI (Gu et al. 2008). Congruent cells (black symbols) generally have CPs greater than 0.5, often much greater, indicating that they are robustly correlated with the animal’s perceptual decisions during cue integration. In contrast, opposite cells (unfilled symbols) tend to have CP values near 0.5, and the mean CP for opposite cells does not differ significantly from 0.5 (t-test, p = .08). This finding is consistent with the idea that the animals selectively monitor congruent cells to achieve near-optimal cue integration. These findings suggest that opposite cells are not useful for visual–vestibular cue integration during heading discrimination. What, then, is the functional role of opposite cells? We do not yet know the answer to this question, but we hypothesize that opposite cells, in combination with congruent cells, are important for dissociating object motion from self-motion. In general, the complex pattern of image motion on the retina has two sources: (1) self-motion combined with the 3-D layout of the scene and (2) objects moving in the environment. It is important for estimates of heading not to be biased by the presence of moving objects, and vice versa. Note that opposite cells will not be optimally stimulated when a subject moves through a static environment, but may fire more robustly when retinal image motion is inconsistent with self-motion. Thus, the relative activity of congruent and opposite cells may help identify (and perhaps discount) retinal image motion that is not produced by self-motion. Indeed, ongoing modeling work suggests that decoding a mixed population of congruent and opposite cells allows heading to be estimated with much less bias from moving objects. In summary, by simultaneously monitoring neural activity and behavior, it has been possible to study neural mechanisms of multisensory processing under conditions in which cue integration is known to take place perceptually. In addition to demonstrating near-optimal cue integration by monkeys, a population of neurons has been identified in area MSTd that could account for the improvement in psychophysical performance under cue combination. These findings implicate area MSTd in sensory integration for heading perception and establish a model system for studying the detailed mechanisms by which neurons combine different sensory signals.
31.6 CONCLUSION These studies indicate that area MSTd is one important brain area where visual and vestibular signals might be integrated to achieve robust perception of self-motion. It is likely that other areas also integrate visual and vestibular signals in meaningful ways, and a substantial challenge for the future will be to understand the specific roles that various brain regions play in multisensory perception of self-motion and object motion. In addition, these studies raise a number of important general questions that may guide future studies on multisensory integration in multiple systems and species. What are the respective functional roles of neurons that have congruent or incongruent tuning for two sensory inputs? Do the spatial reference frames in which multiple sensory signals are expressed constrain the contribution of multisensory neurons to perception? Do multisensory neurons generally perform weighted linear summation of their unimodal inputs, or do the mathematical combination rules used by neurons vary across brain regions and across stimuli/tasks within a brain region? How can we account for the change in the weights that neurons apply to their unimodal inputs as the strength of the sensory inputs varies? Does this require dynamic changes in synaptic weights or can this phenomenology be explained in terms of nonlinearities (such as divisive normalization) that operate at the level of the network? During behavioral discrimination tasks involving cue conflict, do single neurons show correlates of the dynamic cue reweighting effects that have been seen consistently in human perceptual studies of cue integration? How do populations of multimodal sensory neurons represent the reliabilities (i.e., variance) of the sensory cues as they change dynamically in the environment? Most of these questions should be amenable to study within the experimental paradigm of visual–vestibular integration that we have presented thus far. Thus, we expect that this will serve as an important platform for tackling critical questions regarding multisensory integration in the future.
Visual–Vestibular Integration for Self-Motion Perception
645
ACKNOWLEDGMENTS We thank Amanda Turner and Erin White for excellent monkey care and training. This work was supported by NIH EY017866 and EY019087 (to DEA) and by NIH EY016178 and an EJLB Foundation grant (to GCD).
REFERENCES Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14: 257–262. Anderson, K. C., and R. M. Siegel. 1999. Optic flow selectivity in the anterior superior temporal polysensory area, STPa, of the behaving monkey. J Neurosci 19: 2681–2692. Angelaki, D. E. 2004. Eyes on target: What neurons must do for the vestibuloocular reflex during linear motion. J Neurophysiol 92: 20–35. Angelaki, D. E., and K. E. Cullen. 2008. Vestibular system: The many facets of a multimodal sense. Annu Rev Neurosci 31: 125–150. Angelaki, D. E., M. Q. Mchenry, J. D. Dickman, S. D. Newlands, and B. J. Hess. 1999. Computation of inertial motion: Neural strategies to resolve ambiguous otolith information. J Neurosci 19: 316–327. Angelaki, D. E., A. G. Shaikh, A. M. Green, and J. D. Dickman. 2004. Neurons compute internal models of the physical laws of motion. Nature 430: 560–564. Avillac, M., S. Ben Hamed, and J. R. Duhamel. 2007. Multisensory integration in the ventral intraparietal area of the macaque monkey. J Neurosci 27: 1922–1932. Avillac, M., S. Deneve, E. Olivier, A. Pouget, and J. R. Duhamel. 2005. Reference frames for representing visual and tactile locations in parietal cortex. Nat Neurosci 8: 941–949. Banks, M. S., S. M. Ehrlich, B. T. Backus, and J. A. Crowell. 1996. Estimating heading during real and simulated eye movements. Vision Res 36: 431–443. Benson, A. J., M. B. Spencer, and J. R. Stott. 1986. Thresholds for the detection of the direction of whole-body, linear movement in the horizontal plane. Aviat Space Environ Med 57: 1088–1096. Berthoz, A., B. Pavard, and L. R. Young. 1975. Perception of linear horizontal self-motion induced by peripheral vision (linearvection) basic characteristics and visual–vestibular interactions. Exp Brain Res 23: 471–489. Bradley, A., B. C. Skottun, I. Ohzawa, G. Sclar, and R. D. Freeman. 1987. Visual orientation and spatial frequency discrimination: A comparison of single neurons and behavior. J Neurophysiol 57: 755–772. Bradley, D. C., M. Maxwell, R. A. Andersen, M. S. Banks, and K. V. Shenoy. 1996. Mechanisms of heading perception in primate visual cortex. Science 273: 1544–1547. Brandt, T., J. Dichgans, and E. Koenig. 1973. Differential effects of central verses peripheral vision on egocentric and exocentric motion perception. Exp Brain Res 16: 476–491. Bremmer, F., J. R. Duhamel, S. Ben Hamed, and W. Graf. 2002a. Heading encoding in the macaque ventral intraparietal area (VIP). Eur J Neurosci 16: 1554–1568. Bremmer, F., F. Klam, J. R. Duhamel, S. Ben Hamed, and W. Graf. 2002b. Visual–vestibular interactive responses in the macaque ventral intraparietal area (VIP). Eur J Neurosci 16: 1569–1586. Bremmer, F., M. Kubischik, M. Pekel, M. Lappe, and K. P. Hoffmann. 1999. Linear vestibular self-motion signals in monkey medial superior temporal area. Ann N Y Acad Sci 871: 272–281. Britten, K. H., W. T. Newsome, M. N. Shadlen, S. Celebrini, and J. A. Movshon. 1996. A relationship between behavioral choice and the visual responses of neurons in macaque MT. Vis Neurosci 13: 87–100. Britten, K. H., M. N. Shadlen, W. T. Newsome, and J. A. Movshon. 1992. The analysis of visual motion: A comparison of neuronal and psychophysical performance. J Neurosci 12: 4745–4765. Britten, K. H., and R. J. Van Wezel. 1998. Electrical microstimulation of cortical area MST biases heading perception in monkeys. Nat Neurosci 1: 59–63. Britten, K. H., and R. J. Van Wezel. 2002. Area MST and heading perception in macaque monkeys. Cereb Cortex 12: 692–701. Bryan, A. S., and D. E. Angelaki. 2008. Optokinetic and vestibular responsiveness in the macaque rostral vestibular and fastigial nuclei. J Neurophysiol 101: 714–720. Buttner, U., and U. W. Buettner. 1978. Parietal cortex (2v) neuronal activity in the alert monkey during natural vestibular and optokinetic stimulation. Brain Res 153: 392–397. Carandini, M., D. J. Heeger, and J. A. Movshon. 1997. Linearity and normalization in simple cells of the macaque primary visual cortex. J Neurosci 17: 8621–8644.
646
The Neural Bases of Multisensory Processes
Chen, A., G. C. Deangelis, and D. E. Angelaki. 2010. Macaque parieto-insular vestibular cortex: Responses to self-motion and optic flow. J Neurosci 30: 3022–3042. Chen, A., E. Henry, G. C. Deangelis, and D. E. Angelaki. 2007. Comparison of responses to three-dimensional rotation and translation in the ventral intraparietal (VIP) and medial superior temporal (MST) areas of rhesus monkey. Program No. 715.19. 2007 Neuroscience Meeting Planner. San Diego, CA: Society for Neuroscience, 2007. Online Society for Neuroscience. Chowdhury, S. A., K. Takahashi, G. C. Deangelis, and D. E. Angelaki. 2009. Does the middle temporal area carry vestibular signals related to self-motion? J Neurosci 29: 12020–12030. Crowell, J. A., M. S. Banks, K. V. Shenoy, and R. A. Andersen. 1998. Visual self-motion perception during head turns. Nat Neurosci 1: 732–737. Daunton, N., and D. Thomsen. 1979. Visual modulation of otolith-dependent units in cat vestibular nuclei. Exp Brain Res 37: 173–176. Deneve, S., P. E. Latham, and A. Pouget. 2001. Efficient computation and cue integration with noisy population codes. Nat Neurosci 4: 826–831. Dichgans, J., and T. Brandt. 1974. The psychophysics of visually-induced perception of self motion and tilt. In The Neurosciences, 123–129. Cambridge, MA: MIT Press. Dichgans, J., and T. Brandt. 1978. Visual–vestibular interaction: Effects on self-motion perception and postural control. In Handbook of sensory physiology, ed. R. Held, H. W. Leibowitz, and H. L. Teuber. Berlin: Springer-Verlag. Duffy, C. J. 1998. MST neurons respond to optic flow and translational movement. J Neurophysiol 80: 1816–1827. Duffy, C. J., and R. H. Wurtz. 1991. Sensitivity of MST neurons to optic flow stimuli: I. A continuum of response selectivity to large-field stimuli. J Neurophysiol 65: 1329–1345. Duffy, C. J., and R. H. Wurtz. 1995. Response of monkey MST neurons to optic flow stimuli with shifted centers of motion. J Neurosci 15: 5192–5208. Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429–433. Fernandez, C., and J. M. Goldberg. 1976a. Physiology of peripheral neurons innervating otolith organs of the squirrel monkey: I. Response to static tilts and to long-duration centrifugal force. J Neurophysiol 39: 970–984. Fernandez, C., and J. M. Goldberg. 1976b. Physiology of peripheral neurons innervating otolith organs of the squirrel monkey: II. Directional selectivity and force–response relations. J Neurophysiol 39: 985–995. Fetsch, C. R., A. H. Turner, G. C. Deangelis, and D. E. Angelaki. 2009. Dynamic reweighting of visual and vestibular cues during self-motion perception. J Neurosci 29: 15601–15612. Fetsch, C. R., S. Wang, Y. Gu, G. C. Deangelis, and D. E. Angelaki. 2007. Spatial reference frames of visual, vestibular, and multimodal heading signals in the dorsal subdivision of the medial superior temporal area. J Neurosci 27: 700–712. Fredrickson, J. M., P. Scheid, U. Figge, and H. H. Kornhuber. 1966. Vestibular nerve projection to the cerebral cortex of the rhesus monkey. Exp Brain Res 2: 318–327. Fukushima, K. 1997. Corticovestibular interactions: Anatomy, electrophysiology, and functional considerations. Exp Brain Res 117: 1–16. Gibson, J. J. 1950. The perception of the visual world. Boston: Houghton-Mifflin. Gibson, J. J. 1954. The visual perception of objective motion and subjective movement. Psychol Rev 61: 304–314. Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. New York: Wiley. Groh, J. M. 2001. Converting neural signals from place codes to rate codes. Biol Cybern 85: 159–165. Grusser, O. J., M. Pause, and U. Schreiter. 1990a. Localization and responses of neurones in the parieto-insular vestibular cortex of awake monkeys (Macaca fascicularis). J Physiol 430: 537–557. Grusser, O. J., M. Pause, and U. Schreiter. 1990b. Vestibular neurones in the parieto-insular cortex of monkeys (Macaca fascicularis): Visual and neck receptor responses. J Physiol 430: 559–583. Gu, Y., D. E. Angelaki, and G. C. Deangelis. 2008. Neural correlates of multisensory cue integration in macaque MSTd. Nat Neurosci 11: 1201–1210. Gu, Y., G. C. Deangelis, and D. E. Angelaki. 2007. A functional link between area MSTd and heading perception based on vestibular signals. Nat Neurosci 10: 1038–1047. Gu, Y., C. R. Fetsch, B. Adeyemo, G. C. Deangelis, and D. E. Angelaki. 2010. Decoding of MSTd population activity accounts for variations in the precision of heading perception. Neuron 66: 596–609. Gu, Y., P. V. Watkins, D. E. Angelaki, and G. C. Deangelis. 2006. Visual and nonvisual contributions to threedimensional heading selectivity in the medial superior temporal area. J Neurosci 26: 73–85.
Visual–Vestibular Integration for Self-Motion Perception
647
Guedry, F. E. 1974. Psychophysics of vestibular sensation. In Handbook of sensory physiology. The vestibular system, ed. H. H. Kornhuber. New York: Springer-Verlag. Guedry Jr., F. E. 1978. Visual counteraction on nauseogenic and disorienting effects of some whole-body motions: A proposed mechanism. Aviat Space Environ Med 49: 36–41. Guldin, W. O., S. Akbarian, and O. J. Grusser. 1992. Cortico-cortical connections and cytoarchitectonics of the primate vestibular cortex: A study in squirrel monkeys (Saimiri sciureus). J Comp Neurol 326: 375–401. Guldin, W. O., and O. J. Grusser. 1998. Is there a vestibular cortex? Trends Neurosci 21: 254–259. Heeger, D. J. 1992. Normalization of cell responses in cat striate cortex. Vis Neurosci 9: 181–197. Henn, V., L. R. Young, and C. Finley. 1974. Vestibular nucleus units in alert monkeys are also influenced by moving visual fields. Brain Res 71: 144–149. Hlavacka, F., T. Mergner, and B. Bolha. 1996. Human self-motion perception during translatory vestibular and proprioceptive stimulation. Neurosci Lett 210: 83–86. Hlavacka, F., T. Mergner, and G. Schweigart. 1992. Interaction of vestibular and proprioceptive inputs for human self-motion perception. Neurosci Lett 138: 161–164. Knill, D. C., and J. A. Saunders. 2003. Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Res 43: 2539–2558. Krug, K. 2004. A common neuronal code for perceptual processes in visual cortex? Comparing choice and attentional correlates in V5/MT. Philos Trans R Soc Lond B Biol Sci 359: 929–941. Lee, J., and J. H. Maunsell. 2009. A normalization model of attentional modulation of single unit responses. PLoS ONE 4: e4651. Logan, D. J., and C. J. Duffy. 2006. Cortical area MSTd combines visual cues to represent 3-D self-movement. Cereb Cortex 16: 1494–1507. Ma, W. J., J. M. Beck, P. E. Latham, and A. Pouget. 2006. Bayesian inference with probabilistic population codes. Nat Neurosci 9: 1432–1438. Markert, G., U. Buttner, A. Straube, and R. Boyle. 1988. Neuronal activity in the flocculus of the alert monkey during sinusoidal optokinetic stimulation. Exp Brain Res 70: 134–144. Matsumiya, K., and H. Ando. 2009. World-centered perception of 3D object motion during visually guided self-motion. J Vis 9: 151–153. Meredith, M. A., and B. E. Stein. 1983. Interactions among converging sensory inputs in the superior colliculus. Science 221: 389–391. Meredith, M. A., and B. E. Stein. 1986. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J Neurophysiol 56: 640–662. Merfeld, D. M., L. Zupan, and R. J. Peterka. 1999. Humans use internal models to estimate gravity and linear acceleration. Nature 398: 615–618. Morgan, M. L., G. C. Deangelis, and D. E. Angelaki. 2008. Multisensory integration in macaque visual cortex depends on cue reliability. Neuron 59: 662–673. Odkvist, L. M., D. W. Schwarz, J. M. Fredrickson, and R. Hassler. 1974. Projection of the vestibular nerve to the area 3a arm field in the squirrel monkey (Saimiri sciureus). Exp Brain Res 21: 97–105. Ohshiro, T., D. E. Angelaki, and G. C. DeAngelis. 2011. A normalization model of multisensory integration. Nat Neurosci In press. Page, W. K., and C. J. Duffy. 1999. MST neuronal responses to heading direction during pursuit eye movements. J Neurophysiol 81: 596–610. Parker, A. J., and W. T. Newsome. 1998. Sense and the single neuron: Probing the physiology of perception. Annu Rev Neurosci 21: 227–277. Perrault Jr., T. J., J. W. Vaughan, B. E. Stein, and M. T. Wallace. 2003. Neuron-specific response characteristics predict the magnitude of multisensory integration. J Neurophysiol 90: 4022–406. Perrault Jr., T. J., J. W. Vaughan, B. E. Stein, and M. T. Wallace. 2005. Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. J Neurophysiol 93: 2575–2586. Previc, F. H., D. C. Varner, and K. K. Gillingham. 1992. Visual scene effects on the somatogravic illusion. Aviat Space Environ Med 63: 1060–1064. Reynolds, J. H., and D. J. Heeger. 2009. The normalization model of attention. Neuron 61: 168–185. Robinson, D. A. 1977. Linear addition of optokinetic and vestibular signals in the vestibular nucleus. Exp Brain Res 30: 447–450. Royden, C. S., M. S. Banks, and J. A. Crowell. 1992. The perception of heading during eye movements. Nature 360: 583–585. Royden, C. S., J. A. Crowell, and M. S. Banks. 1994. Estimating heading during eye movements. Vis Res 34: 3197–3214.
648
The Neural Bases of Multisensory Processes
Royden, C. S., and E. C. Hildreth. 1996. Human heading judgments in the presence of moving objects. Percept Psychophys 58: 836–856. Rushton, S. K., and P. A. Warren. 2005. Moving observers, relative retinal motion and the detection of object movement. Curr Biol 15: R542–R543. Rust, N. C., V. Mante, E. P. Simoncelli, and J. A. Movshon. 2006. How MT cells analyze the motion of visual patterns. Nat Neurosci 9: 1421–1431. Schaafsma, S. J., and J. Duysens. 1996. Neurons in the ventral intraparietal area of awake macaque monkey closely resemble neurons in the dorsal part of the medial superior temporal area in their responses to optic flow patterns. J Neurophysiol 76: 4056–4068. Schlack, A., K. P. Hoffmann, and F. Bremmer. 2002. Interaction of linear vestibular and visual stimulation in the macaque ventral intraparietal area (VIP). Eur J Neurosci 16: 1877–1886. Schwarz, D. W., and J. M. Fredrickson. 1971a. Rhesus monkey vestibular cortex: A bimodal primary projection field. Science 172: 280–281. Schwarz, D. W., and J. M. Fredrickson. 1971b. Tactile direction sensitivity of area 2 oral neurons in the rhesus monkey cortex. Brain Res 27: 397–401. Shenoy, K. V., D. C. Bradley, and R. A. Andersen. 1999. Influence of gaze rotation on the visual response of primate MSTd neurons. J Neurophysiol 81: 2764–2786. Siegel, R. M., and H. L. Read. 1997. Analysis of optic flow in the monkey parietal area 7a. Cereb Cortex 7: 327–346. Stanford, T. R., S. Quessy, and B. E. Stein. 2005. Evaluating the operations underlying multisensory integration in the cat superior colliculus. J Neurosci 25: 6499–6508. Stein, B. E., and M. A. Meredith. 1993. The merging of the senses. Cambridge, MA: MIT Press. Stein, B. E., and T. R. Stanford. 2008. Multisensory integration: Current issues from the perspective of the single neuron. Nat Rev Neurosci 9: 255–266. Sugihara, T., M. D. Diltz, B. B. Averbeck, and L. M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J Neurosci 26: 11138–11147. Takahashi, K., Y. Gu, P. J. May, S. D. Newlands, G. C. Deangelis, and D. E. Angelaki. 2007. Multimodal coding of three-dimensional rotation and translation in area MSTd: Comparison of visual and vestibular selectivity. J Neurosci 27: 9742–9756. Tanaka, K., K. Hikosaka, H. Saito, M. Yukie, Y. Fukada, and E. Iwai. 1986. Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey. J Neurosci 6: 134–144. Tanaka, K., and H. Saito. 1989. Analysis of motion of the visual field by direction, expansion/contraction, and rotation cells clustered in the dorsal part of the medial superior temporal area of the macaque monkey. J Neurophysiol 62: 626–641. Telford, L., I. P. Howard, and M. Ohmi. 1995. Heading judgments during active and passive self-motion. Exp Brain Res 104: 502–510. Waespe, W., U. Buttner, and V. Henn. 1981. Visual–vestibular interaction in the flocculus of the alert monkey: I. Input activity. Exp Brain Res 43: 337–348. Waespe, W., and V. Henn. 1977. Neuronal activity in the vestibular nuclei of the alert monkey during vestibular and optokinetic stimulation. Exp Brain Res 27: 523–538. Waespe, W., and V. Henn. 1981. Visual–vestibular interaction in the flocculus of the alert monkey: II. Purkinje cell activity. Exp Brain Res 43: 349–360. Warren, P. A., and S. K. Rushton. 2007. Perception of object trajectory: Parsing retinal motion into self and object movement components. J Vis 7: 21–11. Warren, P. A., and S. K. Rushton. 2008. Evidence for flow-parsing in radial flow displays. Vis Res 48: 655–663. Warren, W. H. 2003 Optic flow. In The visual neurosciences, ed. L. M. Chalupa and J. S. Werner. Cambridge, MA: MIT Press. Warren, W. H., and J. A. Saunders. 1995. Perceiving heading in the presence of moving objects. Perception 24: 315–331. Wexler, M. 2003. Voluntary head movement and allocentric perception of space. Psychol Sci 14: 340–346. Wexler, M., F. Panerai, I. Lamouret, and J. Droulez. 2001. Self-motion and the perception of stationary objects. Nature 409: 85–88. Wexler, M., and J. J. Van Boxtel. 2005. Depth perception by the active observer. Trends Cogn Sci 9: 431–438. Wichmann, F. A., and N. J. Hill. 2001. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept Psychophys 63: 1293–1313. Wolfe, J. W., and R. L. Cramer. 1970. Illusions of pitch induced by centripetal acceleration. Aerosp Med 41: 1136–1139.
Visual–Vestibular Integration for Self-Motion Perception
649
Zhang, T., and K. H. Britten. 2003. Microstimulation of area VIP biases heading perception in monkeys. Program No. 339.9. 2003 Neuroscience Abstract Viewer/Itinerary Planner. New Orleans, LA: Society for Neuroscience. Zhang, T., H. W. Heuer, and K. H. Britten. 2004. Parietal area VIP neuronal responses to heading stimuli are encoded in head-centered coordinates. Neuron 42: 993–1001.
Section VIII Naturalistic Multisensory Processes: Communication Signals
32
Unity of the Senses for Primate Vocal Communication Asif A. Ghazanfar
CONTENTS 32.1 Introduction........................................................................................................................... 653 32.2 Multisensory Communication Is the Default Mode of Communication............................... 654 32.3 Monkeys Link Facial Expressions to Vocal Expressions...................................................... 654 32.4 Dynamic Faces Modulate Voice Processing in Auditory Cortex . ....................................... 655 32.5 Auditory Cortical Interactions with Superior Temporal Sulcus Mediates Face/Voice Integration.............................................................................................................................. 656 32.6 Viewing Vocalizing Conspecifics.......................................................................................... 658 32.7 Somatosensory Feedback during Vocal Communication...................................................... 659 32.8 Emergence of Multisensory Systems for Communication....................................................660 32.9 Conclusions............................................................................................................................ 661 Acknowledgments........................................................................................................................... 661 References....................................................................................................................................... 662
32.1 INTRODUCTION The basic tenet of neocortical organization is: different regions of the cortex have different functions. Some regions receive visual, auditory, tactile, olfactory, and gustatory sensations. Each of these sensory regions is thought to send projections that converge on an “association area,” which then enables the association between the different senses and between the senses and movement. According to a highly influential two-part review by Norman Geschwind, entitled, “Disconnexion syndromes in animals and man” (Geschwind 1965a, 1965b), the connections between sensory association areas are not robust in nonhuman animals, limiting their ability to make cross-modal sensory associations. In contrast, humans can readily make such associations, for example, between the sight of a lion and the sounds of its roar. This picture of human versus nonhuman cross-modal abilities based on anatomy led to the idea that human speech and language evolved in parallel with robust cross-modal connections within the neocortex. Geschwind claimed that the “ability to acquire speech has as a prerequisite the ability to form cross-modal associations” (Geschwind 1965a, 1965b). This view of cross-modal associations as a potentially uniquely human capacity remains present even in more current ideas about the evolution of language. For example, it has been suggested that human language depends on our unique ability to imitate in multiple modalities, which in turn relies on a “substantial change in neural organization, one that affects not only imitation but also communication” (Hauser et al. 2002, p. 1575). The purpose of this review is twofold: (1) to refute the view that the cross-modal (multisensory, hereafter) associations are mediated solely through association areas and (2) to debunk the view that human communication is uniquely multisensory. To achieve these two goals, I will focus on the multisensory nature of nonhuman primate vocal communication and the many possible roles that one, nonassociation area plays: the auditory cortex. 653
654
The Neural Bases of Multisensory Processes
32.2 MULTISENSORY COMMUNICATION IS THE DEFAULT MODE OF COMMUNICATION It is widely accepted that human speech is fundamentally a multisensory behavior, with face-to-face communication perceived through both the visual and auditory channels. Such multisensory speech perception is evident even at the earliest stages of human cognitive development (Gogate et al. 2001; Patterson and Werker 2003); its integration across the two modalities is ubiquitous and automatic (McGurk and MacDonald 1976), and at the neural level, audiovisual speech integration occurs at the “earliest” stages of cortical processing (Ghazanfar and Schroeder 2006). Indeed, there are strong arguments suggesting that multisensory speech is the primary mode of speech perception and is not a capacity that is “piggybacked” on to auditory speech perception (Rosenblum 2005). This implies that the perceptual mechanisms, neurophysiology, and evolution of speech perception are based on primitives that are not tied to a single sensory modality (Romanski and Ghazanfar 2009). The essence of these ideas is shared by many investigators in the domain of perception (Liberman and Mattingly 1985; Meltzoff and Moore 1997; Fowler 2004).
32.3 MONKEYS LINK FACIAL EXPRESSIONS TO VOCAL EXPRESSIONS What is true for human speech is also true for vocal communication in nonhuman primates: vision and audition are inextricably linked. Human and primate vocalizations are produced by coordinated movements of the lungs, larynx (vocal folds), and the supralaryngeal vocal tract (Fitch and Hauser 1995; Ghazanfar and Rendall 2008). The vocal tract consists of the column of air derived from the pharynx, mouth, and nasal cavity. In humans, speech-related vocal tract motion results in the predictable deformation of the face around the oral aperture and other parts of the face (Yehia et al. 1998, 2002; Jiang et al. 2002). For example, human adults automatically link high-pitched sounds to facial postures producing an /i/ sound and low-pitched sounds to faces producing an /a/ sound (Kuhl et al. 1991). In primate vocal production, there is a similar link between acoustic output and facial dynamics. Different macaque monkey vocalizations are produced with unique lip configurations and mandibular positions and the motion of such articulators influences the acoustics of the signal (Hauser et al. 1993; Hauser and Ybarra 1994). Coo calls, such as /u/ in speech, are produced with the lips protruded, whereas screams, such as the /i/ in speech, are produced with the lips retracted (Figure 32.1). Thus, it is likely that many of the facial motion cues that humans use for speechreading are present in other primates as well. Given that both humans and other extant primates use both facial and vocal expressions as communication signals, it is perhaps not surprising that many primates other than humans recognize the correspondence between the visual and auditory components of vocal signals. Macaque monkeys (Macaca mulatta), capuchins (Cebus apella), and chimpanzees (Pan troglodytes) all recognize auditory–visual correspondences between their various vocalizations (Ghazanfar and Logothetis 2003; Izumi and Kojima 2004; Parr 2004; Evans et al. 2005). For example, rhesus monkeys readily match the facial expressions of “coo” and “threat” calls with their associated vocal components (Ghazanfar and Logothetis 2003). Perhaps more pertinent, rhesus monkeys can also segregate competing voices in a chorus of coos, much as humans might with speech in a cocktail party scenario, and match them to the correct number of individuals seen cooing on a video screen (Jordan et al. 2005). Finally, macaque monkeys use formants (i.e., vocal tract resonances) as acoustic cues to assess age-related body size differences among conspecifics (Ghazanfar et al. 2007). They do so by linking across modalities the body size information embedded in the formant spacing of vocalizations (Fitch 1997) with the visual size of animals who are likely to produce such vocalizations (Ghazanfar et al. 2007). Taken together, these data suggest that humans are not at all unique in their ability to perceive communication signals across modalities. Indeed, as will be described below, vocal communication is a fully integrated multi-sensori-motor system with numerous similarities between humans
655
Unity of the Senses for Primate Vocal Communication
Frequency (kHz)
16
0
0
0.8
Frequency (kHz)
16
0
0
Time (s)
0.26
FIGURE 32.1 Exemplars of facial expressions produced concomitantly with vocalizations. Rhesus monkey coo and scream calls taken at midpoint of expressions with their corresponding spectrograms.
and monkeys and in which the auditory cortex may serve as a key node in a larger neocortical network.
32.4 DYNAMIC FACES MODULATE VOICE PROCESSING IN AUDITORY CORTEX Traditionally, the linking of vision with audition in the multisensory vocal perception described above would be attributed to the functions of association areas such as the superior temporal sulcus in the temporal lobe or the principal and intraparietal sulci located in the frontal and parietal lobes, respectively. Although these regions may play important roles (see below), they are certainly not necessary for all types of multisensory behaviors (Ettlinger and Wilson 1990), nor are they the sole regions for multisensory convergence (Ghazanfar and Schroeder 2006; Driver and Noesselt 2008). The auditory cortex, in particular, has many potential sources of visual inputs (Ghazanfar and Schroeder 2006), and this is borne out in the increasing number of studies demonstrating visual modulation of auditory cortical activity (Schroeder and Foxe 2002; Ghazanfar et al. 2005, 2008; Bizley et al. 2007; Kayser et al. 2007, 2008). Here we focus on those auditory cortical studies investigating face/voice integration specifically. Recordings from both primary and lateral belt auditory cortex reveal that responses to the voice are influenced by the presence of a dynamic face (Ghazanfar et al. 2005, 2008). Monkey subjects viewing unimodal and bimodal versions of two different species-typical vocalizations (coos and grunts) show both enhanced and suppressed local field potential (LFP) responses in the bimodal condition relative to the unimodal auditory condition (Ghazanfar et al. 2005). Consistent with evoked potential studies in humans (Besle et al. 2004; van Wassenhove et al. 2005), the combination of faces and voices led to integrative responses (significantly different from unimodal responses) in the vast majority of auditory cortical sites—both in the primary auditory cortex and the lateral belt auditory cortex. The data demonstrated that LFP signals in the auditory cortex are capable of multisensory integration of facial and vocal signals in monkeys (Ghazanfar et al. 2005) and have subsequently been confirmed at the single unit level in the lateral belt cortex as well (Ghazanfar et al. 2008; Figure 32.2a).
656
The Neural Bases of Multisensory Processes
Pr Grunt
Gr Grunt
80
Face + voice Voice Face Disk + voice
Spikes/s
100
0
0 –400
0
400 Time (ms)
800
1200
–400
0
400 Time (ms)
800
1200
FIGURE 32.2 (See color insert.) Single neuron examples of multisensory integration of Face + Voice stimuli compared with Disk + Voice stimuli in lateral belt area. Left: enhanced response when voices are coupled with faces, but no similar modulation when coupled with disks. Right: similar effects for a suppressed response. x-Axes show time aligned to onset of face (solid line). Dashed lines indicate onset and offset of voice signal. y-Axes depict firing rate of neuron in spikes per second. Shaded regions denote SEM.
The specificity of face/voice integrative responses was tested by replacing the dynamic faces with dynamic disks that mimicked the aperture and displacement of the mouth. In human psychophysical experiments, such artificial dynamic stimuli can still lead to enhanced speech detection, but not to the same degree as a real face (Bernstein et al. 2004; Schwartz et al. 2004). When cortical sites or single units were tested with dynamic disks, far less integration was seen when compared to the real monkey faces (Ghazanfar et al. 2005, 2008; Figure 32.2). This was true primarily for the lateral belt auditory cortex (LFPs and single units) and was observed to a lesser extent in the primary auditory cortex (LFPs only). This suggests that there may be increasingly specific influences of “extra” sensory modalities as one moves away from the primary sensory regions. Unexpectedly, grunt vocalizations were overrepresented relative to coos in terms of enhanced multisensory LFP responses (Ghazanfar et al. 2005). As coos and grunts are both produced frequently in a variety of affiliative contexts and are broadband spectrally, the differential representation cannot be attributed to experience, valence, or the frequency tuning of neurons. One remaining possibility is that this differential representation may reflect a behaviorally relevant distinction, as coos and grunts differ in their direction of expression and range. Coos are generally contact calls rarely directed toward any particular individual. In contrast, grunts are often directed toward individuals in one-on-one situations, often during social approaches as in baboons and vervet monkeys (Cheney and Seyfarth 1982; Palombit et al. 1999). Given their production at close range and context, grunts may produce a stronger face/voice association than coo calls. This distinction appeared to be reflected in the pattern of significant multisensory responses in the auditory cortex, that is, this multisensory bias toward grunt calls may be related to the fact the grunts (relative to coos) are often produced during intimate, one-to-one social interactions.
32.5 AUDITORY CORTICAL INTERACTIONS WITH SUPERIOR TEMPORAL SULCUS MEDIATES FACE/VOICE INTEGRATION The face-specific visual influence on the lateral belt auditory cortex begs the question as to its anatomical source. Although there are multiple possible sources of visual input to auditory cortex (Ghazanfar and Schroeder 2006), the STS is likely to be a prominent one, particularly for integrating faces and voices, for the following reasons. First, there are reciprocal connections between the STS and the lateral belt and other parts of the auditory cortex (Barnes and Pandya 1992; Seltzer and Pandya 1994). Second, neurons in the STS are sensitive to both faces and biological motion (Harries and Perrett 1991; Oram and Perrett 1994). Finally, the STS is known to be multisensory (Benevento
657
Unity of the Senses for Primate Vocal Communication
et al. 1977; Bruce et al. 1981; Schroeder and Foxe 2002; Barraclough et al. 2005; Chandrasekaran and Ghazanfar 2009). One mechanism for establishing whether auditory cortex and the STS interact at the functional level is to measure their temporal correlations as a function of stimulus condition. Concurrent recording LFPs and spiking activity in the lateral belt of the auditory cortex and the upper bank of the STS revealed that functional interactions, in the form of gamma band correlations, between these two regions increased in strength during presentations of faces and voices together relative to the unimodal conditions (Ghazanfar et al. 2008; Figure 32.3a). Furthermore, these interactions were not solely modulations of response strength, as phase relationships were significantly less variable (tighter) in the multisensory conditions (Figure 32.3b). The influence of the STS on the auditory cortex was not merely on its gamma oscillations. Spiking activity seems to be modulated, but not “driven,” by ongoing activity arising from the STS. Three lines of evidence suggest this scenario. First, visual influences on single neurons were most robust when in the form of dynamic faces and were only apparent when neurons had a significant response to a vocalization (i.e., there were no overt responses to faces alone). Second, these integrative responses were often “face-specific” and had a wide distribution of latencies, which suggested that the face signal was an ongoing signal that influenced auditory responses (Ghazanfar et al. 2008). Finally, this hypothesis for an ongoing signal is supported by the sustained gamma band activity between the auditory cortex and the STS and by a spike-field coherence analysis. This analysis reveals that just before spiking activity in the auditory cortex, there is an increase in gamma band power in the STS (Ghazanfar et al. 2008; Figure 32.3c).
Face + voice
Face
(b)
Voice
2.0
Normalized amplitude
1.8
Frequency (Hz)
198 164 136 113 94 78 64 54 44 37 –400 –200
1.6 1.4 1.2 0
200
Time (ms)
Frequency (Hz)
198
600 –400 –200
0
200
Time (ms)
Face + voice
198
400
600 –400 –200
0
200
Time (ms)
Face
400
600
198
1.0
1.25
Face + voice Voice Face Disk + voice
1.20 1.15 1.10 1.05 1.00
40 60 80 100120140160180
Frequency (Hz)
Voice
Mean normalized power
(c)
400
Normalized concentration
(a)
1.3
139
139
139
1.2
97
97
97
1.1
68
68
68
1.0
48 –200 –150 –100 –50
48 –200 –150 –100 –50
48 –200 –150 –100 –50
0
Time (ms)
50
100
0
Time (ms)
50
100
0
Time (ms)
50
100
0.9
FIGURE 32.3 (See color insert.) (a) Time–frequency plots (cross-spectrograms) illustrate modulation of functional interactions (as a function of stimulus condition) between lateral belt auditory cortex and STS for a population of cortical sites. x-Axes depict time in milliseconds as a function of onset of auditory signal (solid black line). y-Axes depict frequency of oscillations in Hz. Color bar indicates amplitude of these signals normalized by baseline mean. (b) Population phase concentration from 0 to 300 ms after voice onset. x-Axes depict frequency in Hz. y-Axes depict average normalized phase concentration. Shaded regions denote SEM across all electrode pairs and calls. All values are normalized by baseline mean for different frequency bands. Right panel shows phase concentration across all calls and electrode pairs in gamma band for four conditions. (c) Spike-field cross-spectrogram illustrates relationship between spiking activity of auditory cortical neurons and STS local field potential across population of cortical sites. x-Axes depict time in milliseconds as a function of onset of multisensory response in auditory neuron (solid black line). y-Axes depict frequency in Hz. Color bar denotes cross-spectral power normalized by baseline mean for different frequencies.
658
The Neural Bases of Multisensory Processes
Both the auditory cortex and the STS have multiple bands of oscillatory activity generated in responses to stimuli that may mediate different functions (Lakatos et al. 2005; Chandrasekaran and Ghazanfar 2009). Thus, interactions between the auditory cortex and the STS are not limited to spiking activity and high frequency gamma oscillations. Below 20 Hz, and in response to naturalistic audiovisual stimuli, there are directed interactions from the auditory cortex to the STS, whereas above 20 Hz (but below the gamma range), there are directed interactions from the STS to the auditory cortex (Kayser and Logothetis 2009). Given that different frequency bands in the STS integrate faces and voices in distinct ways (Chandrasekaran and Ghazanfar 2009), it is possible that these lower frequency interactions between the STS and the auditory cortex also represent distinct multisensory processing channels. Two things should be noted here. The first is that functional interactions between the STS and the auditory cortex are not likely to occur solely during the presentation of faces with voices. Other congruent, behaviorally salient audiovisual events such as looming signals (Maier et al. 2004; Gordon and Rosenblum 2005; Cappe et al. 2009) or other temporally coincident signals may elicit similar functional interactions (Noesselt et al. 2007; Maier et al. 2008). The second is that there are other areas that, consistent with their connectivity and response properties (e.g., sensitivity to faces and voices), could also (and very likely) have a visual influence on the auditory cortex. These include the ventrolateral prefrontal cortex (Romanski et al. 2005; Sugihara et al. 2006) and the amygdala (Gothard et al. 2007; Kuraoka and Nakamura 2007).
32.6 VIEWING VOCALIZING CONSPECIFICS Humans and other primates readily link facial expressions with appropriate, congruent vocal expressions. What cues they use to make such matches are not known. One method for investigating such behavioral strategies is the measurement of eye movement patterns. When human subjects are given no task or instruction regarding what acoustic cues to attend, they will consistently look at the eye region more than the mouth when viewing videos of human speakers (Klin et al. 2002). Macaque monkeys exhibit the exact same strategy. The eye movement patterns of monkeys viewing conspecifics producing vocalizations reveal that monkeys spend most of their time inspecting the eye region relative to the mouth (Ghazanfar et al. 2006; Figure 32.4a). When they did fixate on the mouth, it was highly correlated with the onset of mouth movements (Figure 32.4b). This, too, was highly reminiscent of human strategies: subjects asked to identify words increased their fixations onto the mouth region with the onset of facial motion (Lansing and McConkie 2003). Somewhat surprisingly, activity in both primary auditory cortex and belt areas is influenced by eye position. When the spatial tuning of primary auditory cortical neurons is measured with the eyes gazing in different directions, ~30% of the neurons are affected by the position of the eyes (WernerReiss et al. 2003). Similarly, when LFP-derived current-source density activity was measured from the auditory cortex (both primary auditory cortex and caudal belt regions), eye position significantly modulated auditory-evoked amplitude in about 80% of sites (Fu et al. 2004). These eye-position effects occurred mainly in the upper cortical layers, suggesting that the signal is fed back from another cortical area. A possible source includes the frontal eye field located in the frontal lobes, the medial portion of which generates relatively long saccades (Robinson and Fuchs 1969), is interconnected with both the STS (Seltzer and Pandya 1989; Schall et al. 1995) and multiple regions of the auditory cortex (Schall et al. 1995; Hackett et al. 1999; Romanski et al. 1999). It does not take a huge stretch of the imagination to link these auditory cortical processes to the oculomotor strategy for looking at vocalizing faces. A dynamic, vocalizing face is a complex sequence of sensory events, but one that elicits fairly stereotypical eye movements: we and other primates fixate on the eyes but then saccade to the mouth when it moves before saccading back to the eyes. Is there a simple scenario that could link the proprioceptive eye position effects in the auditory cortex with its face/voice integrative properties (Ghazanfar and Chandrasekaran 2007)? Reframing (ever so slightly) the hypothesis of Schroeder and colleagues (Lakatos et al. 2007; Schroeder et al.
659
Unity of the Senses for Primate Vocal Communication (b)
60
Eye Mouth
% Fixations
40
20
0
Normal
Mismatch
Silent
Fixation onset relative to video start (s)
(a)
30
Without sound With sound
25 20 15 10 5 0
0
5
10
15
20
25
Mouth movement onset relative to video start (s)
30
FIGURE 32.4 (a) Average fixation on eye region versus mouth region across three subjects while viewing a 30-s video of vocalizing conspecific. Audio track had no influence on proportion of fixations falling onto mouth or eye region. Error bars represent SEM. (b) We also find that when monkeys do saccade to mouth region, it is tightly correlated with onset of mouth movements (r = .997, p < .00001).
2008), one possibility is that the fixations at the onset of mouth movements send a signal to the auditory cortex, which resets the phase of an ongoing oscillation. This proprioceptive signal thus primes the auditory cortex to amplify or suppress (depending on the timing of) a subsequent auditory signal originating from the mouth. Given that mouth movements precede the voiced components of both human (Abry et al. 1996) and monkey vocalizations (Ghazanfar et al. 2005; Chandrasekaran and Ghazanfar 2009), the temporal order of visual to proprioceptive to auditory signals is consistent with this idea. This hypothesis is also supported (although indirectly) by the finding that sign of face/voice integration in the auditory cortex and the STS is influenced by the timing of mouth movements relative to the onset of the voice (Ghazanfar et al. 2005; Chandrasekaran and Ghazanfar 2009).
32.7 SOMATOSENSORY FEEDBACK DURING VOCAL COMMUNICATION Numerous lines of both physiological and anatomical evidence demonstrate that at least some regions of the auditory cortex respond to touch as well as sound (Schroeder and Foxe 2002; Fu et al. 2003; Kayser et al. 2005; Hackett et al. 2007a, 2007b; Lakatos et al. 2007; Smiley et al. 2007). Yet, the sense of touch is not something we normally associate with vocal communication. It can, however, influence what we hear under certain circumstances. For example, kinesthetic feedback from one’s own speech movements also integrates with heard speech (Sams et al. 2005). More directly, if a robotic device is used to artificially deform the facial skin of subjects in a way that mimics the deformation seen during speech production, then subjects actually hear speech differently (Ito et al. 2009). Surprisingly, there is a systematic perceptual variation with speechlike patterns of skin deformation that implicates a robust somatosensory influence on auditory processes under normal conditions (Ito et al. 2009). The somatosensory system’s influence on the auditory system may also occur during vocal learning. When a mechanical load is applied to the jaw, causing a slight protrusion, as subjects repeat words (“saw,” “say,” “sass,” and “sane”), it can alter somatosensory feedback without changing the acoustics of the words (Tremblay et al. 2003). Measuring adaptation in the jaw trajectory after many trials revealed that subjects learn to change their jaw trajectories so that they are similar to the preload trajectory—despite not hearing anything different. This strongly implicates a role for somatosensory feedback that parallels the role for auditory feedback in guiding vocal production (Jones and Munhall 2003, 2005). Indeed, the very same learning effects are observed with deaf subjects when they turn their hearing aids off (Nasir and Ostry 2008).
660
The Neural Bases of Multisensory Processes
Although the substrates for these somatosensory–auditory effects have not been explored, interactions between the somatosensory system and the auditory cortex seem like a likely source for the phenomena described above for the following reasons. First, many auditory cortical fields respond to, or are modulated by, tactile inputs (Schroeder et al. 2001; Fu et al. 2003; Kayser et al. 2005). Second, there are intercortical connections between somatosensory areas and the auditory cortex (Cappe and Barone 2005; de la Mothe et al. 2006; Smiley et al. 2007). Third, the Caudomedial auditory area CM, where many auditory–tactile responses seem to converge, is directly connected to somatosensory areas in the retroinsular cortex and the granular insula (de la Mothe et al. 2006; Smiley et al. 2006). Oddly enough, a parallel influence of audition on somatosensory areas has also been reported: neurons in the “somatosensory” insula readily and selectively respond to vocalizations (Beiser 1998; Remedios et al. 2009). Finally, the tactile receptive fields of neurons in auditory cortical area CM are confined to the upper body, primarily the face and neck regions (areas consisting of, or covering, the vocal tract) (Fu et al. 2003) and the primary somatosensory cortical (area 3b) representation for the tongue (a vocal tract articulator) projects to auditory areas in the lower bank of the lateral sulcus (Iyengar et al. 2007). All of these facts lend further credibility to the putative role of somatosensory–auditory interactions during vocal production and perception. Like humans, other primates also adjust their vocal output according to what they hear. For example, macaques, marmosets (Callithrix jacchus), and cotton-top tamarins (Saguinus oedipus) adjust the loudness, timing, and acoustic structure of their vocalizations depending on background noise levels and patterns (Sinnott et al. 1975; Brumm et al. 2004; Egnor and Hauser 2006; Egnor et al. 2006, 2007). The specific number of syllables and temporal modulations in heard conspecific calls can also differentially trigger vocal production in tamarins (Ghazanfar et al. 2001, 2002). Thus, auditory feedback is also very important for nonhuman primates, and altering such feedback can influence neurons in the auditory cortex (Eliades and Wang 2008). At this time, however, no experiments have been conducted to investigate whether somatosensory feedback plays a role in influencing vocal feedback. The neurophysiological and neuroanatomical data described above suggest that it is not unreasonable to think that it does.
32.8 EMERGENCE OF MULTISENSORY SYSTEMS FOR COMMUNICATION The behavioral and neurobiological data and speculation described above beg the question of how might such an integrated system emerge ontogenetically. Although there are numerous studies on the development of multisensory processes in humans (see Lewkowicz and Lickliter 1994 for review), there are only a handful of reports for primates (Gunderson 1983; Gunderson et al. 1990; Adachi et al. 2006; Batterson et al. 2008; Zangehenpour et al. 2008). Given that monkeys and humans develop at different rates, it is important to know how this might influence the behavior and neural circuitry underlying multisensory communication. Furthermore, there is only one neurobiological study of multisensory integration in the developing primate (Wallace and Stein 2001). This study suggests that although neurons in the newborn macaque monkey may respond to more than one modality, they are unable to integrate them—that is, they do not produce enhanced responses to bimodal stimulation like they do in adult monkeys. Taken together, one line of investigation suggests that an interaction between developmental timing (heterochrony) and social experience may shape the neural circuits underlying both human and primate vocal communication. Three lines of evidence demonstrate that the rate of neural development in Old World monkeys is faster than in humans and that, as a result, they are neurologically precocial relative to human infants. First, in terms of overall brain size at birth, Old World monkeys are among the most precocial of all mammals (Sacher and Staffeldt 1974), possessing ~65% of their brain size at birth compared to only ~25% for human infants (Sacher and Staffeldt 1974; Malkova et al. 2006). Second, fiber pathways in the developing monkey brain are more heavily myelinated than in the human brain at the same postnatal age (Gibson 1991), suggesting that postnatal myelination in the rhesus monkey brain is about three to four times faster than in the human brain (Gibson 1991; Malkova et al. 2006).
Unity of the Senses for Primate Vocal Communication
661
All sensorimotor tracts are heavily myelinated by 2 to 3 months after birth in rhesus monkeys, but not until 8 to 12 months after birth in human infants. Finally, at the behavioral level, the differential patterns of brain growth in the two species lead to differential timing in the emergence of speciesspecific motor, socioemotional, and cognitive abilities (Antinucci 1989; Konner 1991). The heterochrony of neural and behavioral development across different primate species raises the possibility that the development of multisensory integration may be different in monkeys relative to humans. In particular, Turkewitz and Kenny (1982) suggested that the neural limitations imposed by the relatively slow rate of neural development in human infants may actually be advantageous because the limitations may provide them with greater functional plasticity. This, in turn, may make human infants initially more sensitive to a broader range of sensory stimulation and to the relations among multisensory inputs. This theoretical observation has received empirical support from studies showing that infants go through a process of “perceptual narrowing” in their processing of unisensory as well as multisensory information, that is, where initially they exhibit broad sensory tuning, they later exhibit narrower tuning. For example, 4- to 6-month-old human infants can match rhesus monkey faces and voices, but 8- to 10-month-old infants no longer do so (Lewkowicz and Ghazanfar 2006). These findings suggest that as human infants acquire increasingly greater experience with conspecific human faces and vocalizations—but none with heterospecific faces and vocalizations—their sensory tuning (and their neural systems) narrows to match their early experience. If a relatively immature state of neural development leaves a developing organism more “open” to the effects of early sensory experience, then it stands to reason that the more advanced state of neural development in monkeys might result in a different outcome. In support of this, a study of infant vervet monkeys that was identical in design to the human infant study of cross-species multisensory matching (Lewkowicz and Ghazanfar 2006) revealed that, unlike human infants, they exhibit no evidence of perceptual narrowing (Zangehenpour et al. 2008). That is, the infant vervet monkeys could match faces and voices of rhesus monkeys despite the fact that they had no prior experience with macaque monkeys and that they continued to do so well beyond the ages where such matching ability declines in human infants (Zangehenpour et al. 2008). The reason for this lack of perceptual narrowing may lie in the precocial neurological development of this Old World monkey species. These comparative developmental data reveal that although monkeys and humans may appear to share similarities at the behavioral and neural levels, their different developmental trajectories are likely to reveal important differences. It is important to keep this in mind when making claims about homologies at either of these levels.
32.9 CONCLUSIONS The overwhelming evidence from the studies reviewed here, and numerous other studies from different domains of neuroscience, all converge on the idea that the neocortex is fundamentally multisensory (Ghazanfar and Schroeder 2006). This is not terribly surprising given that the sensory experiences of humans and other animals are profoundly multimodal. This does not mean, however, that every cortical area is uniformly multisensory. Indeed, I hope that the role of the auditory cortex reviewed above for vocal communication illustrates that cortical areas maybe weighted differently by “extra”-modal inputs depending on the task at hand and its context.
ACKNOWLEDGMENTS The author gratefully acknowledges the scientific contributions and numerous discussions with the following people: Chand Chandrasekaran, Kari Hoffman, David Lewkowicz, Joost Maier, and Hjalmar Turesson. This work was supported by NIH R01NS054898 and NSF BCS-0547760 CAREER Award.
662
The Neural Bases of Multisensory Processes
REFERENCES Abry, C., M.-T. Lallouache, and M.-A. Cathiard. 1996. How can coarticulation models account for speech sensitivity in audio-visual desynchronization? In Speechreading by humans and machines: Models, systems and applications, ed. D. Stork and M. Henneke, 247–255. Berlin: Springer-Verlag. Adachi, I., H. Kuwahata, K. Fujita, M. Tomonaga, and T. Matsuzawa. 2006. Japanese macaques form a crossmodal representation of their own species in their first year of life. Primates 47: 350–354. Antinucci, F. 1989. Systematic comparison of early sensorimotor development. In Cognitive structure and development in nonhuman primates, ed. F. Antinucci, 67–85. Hillsday, NJ: Lawrence Erlbaum Associates. Barnes, C. L., and D. N. Pandya. 1992. Efferent cortical connections of multimodal cortex of the superior temporal sulcus in the rhesus-monkey. Journal of Comparative Neurology 318: 222–244. Barraclough, N. E., D. Xiao, C. I. Baker, M. W. Oram, and D. I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience 17: 377–391. Batterson, V. G., S. A. Rose, A. Yonas, K. S. Grant, and G. P. Sackett. 2008. The effect of experience on the development of tactual–visual transfer in pigtailed macaque monkeys. Developmental Psychobiology 50: 88–96. Beiser, A. 1998. Processing of twitter-call fundamental frequencies in insula and auditory cortex of squirrel monkeys. Experimental Brain Research 122: 139–148. Benevento, L. A., J. Fallon, B. J. Davis, and M. Rezak. 1977. Auditory–visual interactions in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57: 849–872. Bernstein, L. E., E. T. Auer, and S. Takayanagi. 2004. Auditory speech detection in noise enhanced by lipreading. Speech Communication 44: 5–18. Besle, J., A. Fort, C. Delpuech, and M. H. Giard. 2004. Bimodal speech: Early suppressive visual effects in human auditory cortex. European Journal of Neuroscience 20: 2225–2234. Bizley, J. K., F. R. Nodal, V. M. Bajo, I. Nelken, and A. J. King. 2007. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex 17: 2172–2189. Bruce, C., R. Desimone, and C. G. Gross. 1981. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology 46: 369–384. Brumm, H., K. Voss, I. Kollmer, and D. Todt. 2004. Acoustic communication in noise: Regulation of call characteristics in a New World monkey. Journal of Experimental Biology 207: 443–448. Cappe, C., and P. Barone. 2005. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience 22: 2886–2902. Cappe, C., G. Thut, V. Romei, and M. M. Murray. 2009. Selective integration of auditory–visual looming cues by humans. Neuropsychologia 47: 1045–1052. Chandrasekaran, C., and A. A. Ghazanfar. 2009. Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. Journal of Neurophysiology 101: 773–788. Cheney, D. L., and R. M. Seyfarth. 1982. How vervet nonkeys perceive their grunts—Field playback experiments. Animal Behaviour 30: 739–751. De La Mothe, L. A., S. Blumell, Y. Kajikawa, and T. A. Hackett. 2006. Cortical connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology 496: 27–71. Driver, J., and T. Noesselt. 2008. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments. Neuron 57: 11–23. Egnor, S. E. R., and M. D. Hauser. 2006. Noise-induced vocal modulation in cotton-top tamarins (Saguinus oedipus). American Journal of Primatology 68: 1183–1190. Egnor, S. E. R., C. G. Iguina, and M. D. Hauser., 2006. Perturbation of auditory feedback causes systematic perturbation in vocal structure in adult cotton-top tamarins. Journal of Experimental Biology 209: 3652–3663. Egnor, S. E. R., J. G. Wickelgren, and M. D. Hauser. 2007. Tracking silence: Adjusting vocal production to avoid acoustic interference. Journal of Comparative Physiology A–Neuroethology Sensory Neural and Behavioral Physiology 193: 477–483. Eliades, S. J., and X. Q. Wang. 2008. Neural substrates fo vocalization feedback monitoring in primate auditory cortex. Nature 453: 1102–1107. Ettlinger, G., and W. A. Wilson. 1990. Cross-modal performance: Behavioural processes, phylogenetic considerations and neural mechanisms. Behavioural Brain Research 40: 169–192.
Unity of the Senses for Primate Vocal Communication
663
Evans, T. A., S. Howell, and G. C. Westergaard. 2005. Auditory–visual cross-modal perception of communicative stimuli in tufted capuchin monkeys (Cebus apella). Journal of Experimental Psychology—Animal Behavior Processes 31: 399–406. Fitch, W. T. 1997. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. Journal of the Acoustical Society of America 102: 1213–1222. Fitch, W. T., and M. D. Hauser. 1995. Vocal production in nonhuman primates—Acoustics, physiology, and functional constraints on honest advertisement. American Journal of Primatology 37: 191–219. Fowler, C. A. 2004. Speech as a supramodal or amodal phenomenon. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B.E. Stein, 189–201. Cambridge, MA: MIT Press. Fu, K. M. G., T. A. Johnston, A. S. Shah, L. Arnold, J. Smiley, T. A. Hackett, P. E. Garraghty, and C. E. Schroeder. 2003. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience 23: 7510–7515. Fu, K. M. G., A. S. Shah, M. N. O’Connell, T. Mcginnis, H. Eckholdt, P. Lakatos, J. Smiley, and C. E. Schroeder. 2004. Timing and laminar profile of eye-position effects on auditory responses in primate auditory cortex. Journal of Neurophysiology 92: 3522–3531. Geschwind, N. 1965a. Disconnexion syndromes in animals and man, Part I. Brain 88: 237–294. Geschwind, N. 1965b. Disconnexion syndromes in animals and man, Part II. Brain 88: 585–644. Ghazanfar, A. A., C. Chandrasekaran, and N. K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience 28: 4457–4469. Ghazanfar, A. A., and C. F. Chandrasekaran. 2007. Paving the way forward: Integrating the senses through phase-resetting of cortical oscillations. Neuron 53: 162–164. Ghazanfar, A. A., J. I. Flombaum, C. T. Miller, and M. D. Hauser. 2001. The units of perception in the antiphonal calling behavior of cotton-top tamarins (Saguinus oedipus): Playback experiments with long calls. Journal of Comparative Physiology A – Neuroethology Sensory Neural and Behavioral Physiology 187: 27–35. Ghazanfar, A. A., and N. K. Logothetis. 2003. Facial expressions linked to monkey calls. Nature 423: 937–938. Ghazanfar, A. A., J. X. Maier, K. L. Hoffman, and N. K. Logothetis. 2005. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience 25: 5004–5012. Ghazanfar, A. A., K. Nielsen, and N. K. Logothetis. 2006. Eye movements of monkeys viewing vocalizing conspecifics. Cognition 101: 515–529. Ghazanfar, A. A., and D. Rendall. 2008. Evolution of human vocal production. Current Biology 18: R457–R460. Ghazanfar, A. A., and C. E. Schroeder. 2006. Is neocortex essentially multisensory? Trends in Cognitive Sciences 10: 278–285. Ghazanfar, A. A., D. Smith-Rohrberg, A. A. Pollen, and M. D. Hauser. 2002. Temporal cues in the antiphonal long-calling behaviour of cottontop tamarins. Animal Behaviour 64: 427–438. Ghazanfar, A. A., H. K. Turesson, J. X. Maier, R. Van Dinther, R. D. Patterson, and N. K. Logothetis. 2007. Vocal tract resonances as indexical cues in rhesus monkeys. Current Biology 17: 425–430. Gibson, K. R. 1991. Myelination and behavioral development: A comparative perspective on questions of neoteny, altriciality and intelligence. In Brain maturation and cognitive development: Comparative and cross-cultural perspectives, ed. K. R. Gibson and A. C. Petersen, 29–63. New York: Aldine de Gruyter. Gogate, L. J., A. S. Walker-Andrews, and L. E. Bahrick. 2001. The intersensory origins of word comprehension: An ecological–dynamic systems view. Developmental Science 4: 1–18. Gordon, M. S., and L. D. Rosenblum. 2005. Effects of intrastimulus modality change on audiovisual time-toarrival judgments. Perception and Psychophysics 67: 580–594. Gothard, K. M., F. P. Battaglia, C. A. Erickson, K. M. Spitler, and D. G. Amaral. 2007. Neural responses to facial expression and face identity in the monkey amygdala. Journal of Neurophysiology 97: 1671–1683. Gunderson, V. M. 1983. Development of cross-modal recognition in infant pigtail monkeys (Macaca nemestrina). Developmental Psychology 19: 398–404. Gunderson, V. M., S. A. Rose, and K. S. Grantwebster. 1990. Cross-modal transfer in high-risk and low-risk infant pigtailed macaque monkeys. Developmental Psychology 26: 576–581. Hackett, T. A., L. A. de La Mothe, I. Ulbert, G. Karmos, J. Smiley, and C. E. Schroeder. 2007a. Multisensory convergence in auditory cortex: II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology 502: 924–952.
664
The Neural Bases of Multisensory Processes
Hackett, T. A., J. F. Smiley, I. Ulbert, G. Karmos, P. Lakatos, L. A. de La Mothe, and C. E. Schroeder., 2007b. Sources of somatosensory input to the caudal belt areas of auditory cortex. Perception 36: 1419–1430. Hackett, T. A., I. Stepniewska, and J. H. Kaas. 1999. Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain Research 817: 45–58. Harries, M. H., and D. I. Perrett. 1991. Visual processing of faces in temporal cortex—Physiological evidence for a modular organization and possible anatomical correlates. Journal of Cognitive Neuroscience 3: 9–24. Hauser, M. D., N. Chomsky, and W. Fitch. 2002. The faculty of language: What is it, who has it, and how did it evolve? Science 298: 1569–1579. Hauser, M. D., C. S. Evans, and P. Marler. 1993. The role of articulation in the production of rhesus-monkey, Macaca mulatta. Vocalizations. Animal Behaviour 45: 423–433. Hauser, M. D., and M. S. Ybarra. 1994. The role of lip configuration in monkey vocalizations—Experiments using xylocaine as a nerve block. Brain and Language 46: 232–244. Ito, T., M. Tiede, and D. J. Ostry. 2009. Somatosensory function in speech perception. Proceedings of the National Academy of Sciences of the United States of America 106: 1245–1248. Iyengar, S., H. Qi, N. Jain, and J. H. Kaas. 2007. Cortical and thalamic connections of the representations of the teeth and tongue in somatosensory cortex of new world monkeys. Journal of Comparative Neurology 501: 95–120. Izumi, A., and S. Kojima. 2004. Matching vocalizations to vocalizing faces in a chimpanzee (Pan troglodytes). Animal Cognition 7: 179–184. Jiang, J. T., A. Alwan, P. A. Keating, E. T. Auer, and L. E. Bernstein. 2002. On the relationship between face movements, tongue movements, and speech acoustics. EURASIP Journal of Applied Signal Processing 1174–1188. Jones, J. A., and K. G. Munhall. 2003. Learning to produce speech with an altered vocal tract: The role of auditory feedback. Journal of the Acoustical Society of America 113: 532–543. Jones, J. A., and K. G. Munhall. 2005. Remapping auditory–motor representations in voice production. Current Biology 15: 1768–1772. Jordan, K. E., E. M. Brannon, N. K. Logothetis, and A. A. Ghazanfar. 2005. Monkeys match the number of voices they hear with the number of faces they see. Current Biology 15: 1034–1038. Kayser, C., and N. K. Logothetis. 2009. Directed interactions between auditory and superior temporal cortices and their role in sensory integration. Frontiers in Integrative Neuroscience 3: 7. doi: 10.3389/ neuro.07.007.2009. Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2005. Integration of touch and sound in auditory cortex. Neuron 48: 373–384. Kayser, C., C. I. Petkov, M. Augath, and N. K. Logothetis. 2007. Functional imaging reveals visual modulation of specific fields in auditory cortex. Journal of Neuroscience 27: 1824–1835. Kayser, C., C. I. Petkov, and N. K. Logothetis. 2008. Visual modulation of neurons in auditory cortex. Cerebral Cortex 18: 1560–1574. Klin, A., W. Jones, R. Schultz, F. Volkmar, and D. Cohen. 2002. Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Archives of General Psychiatry 59: 809–816. Konner, M. 1991. Universals of behavioral development in relation to brain myelination. In Brain maturation and cognitive development: Comparative and cross-cultural perspectives, ed. K. R. Gibson and A. C. Petersen, 181–223. New York: Aldine de Gruyter. Kuhl, P. K., K. A. Williams, and A. N. Meltzoff. 1991. Cross-modal speech perception in adults and infants using nonspeech auditory stimuli. Journal of Experimental Psychology: Human perception and performance 17: 829–840. Kuraoka, K., and K. Nakamura. 2007. Responses of single neurons in monkey amygdala to facial and vocal emotions. Journal of Neurophysiology 97: 1379–1387. Lakatos, P., C.-M. Chen, M. N. O’Connell, A. Mills, and C. E. Schroeder. 2007. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53: 279–292. Lakatos, P., A. S. Shah, K. H. Knuth, I. Ulbert, G. Karmos, and C. E. Schroeder. 2005. An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology 94: 1904–1911. Lansing, I. R., and G. W. Mcconkie. 2003. Word identification and eye fixation locations in visual and visualplus-auditory presentations of spoken sentences. Perception and Psychophysics 65: 536–552. Lewkowicz, D. J., and A. A. Ghazanfar. 2006. The decline of cross-species intersensory perception in human infants. Proceedings of the National Academy of Sciences of the United States of America 103: 6771–6774.
Unity of the Senses for Primate Vocal Communication
665
Lewkowicz, D. J., and R. Lickliter. 1994. The development of intersensory perception: Comparative perspectives. Hillsdale, NJ: Lawrence Erlbaum Associates. Liberman, A. M., and I. Mattingly. 1985. The motor theory revised. Cognition 21: 1–36. Maier, J. X., C. Chandrasekaran, and A. A. Ghazanfar. 2008. Integration of bimodal looming signals through neuronal coherence in the temporal lobe. Current Biology 18: 963–968. Maier, J. X., J. G. Neuhoff, N. K. Logothetis, and A. A. Ghazanfar. 2004. Multisensory integration of looming signals by Rhesus monkeys. Neuron 43: 177–181. Malkova, L., E. Heuer, and R. C. Saunders. 2006. Longitudinal magnetic resonance imaging study of rhesus monkey brain development. European Journal of Neuroscience 24: 3204–3212. Mcgurk, H., and J. Macdonald. 1976. Hearing lips and seeing voices. Nature 264: 229–239. Meltzoff, A. N., and M. Moore. 1997. Explaining facial imitation: A theoretical model. Early Development and Parenting 6: 179–192. Nasir, S. M., and D. J. Ostry. 2008. Speech motor learning in profoundly deaf adults. Nature Neuroscience 11: 1217–1222. Noesselt, T., J. W. Rieger, M. A. Schoenfeld, M. Kanowski, H. Hinrichs, H.-J. Heinze, and J. Driver. 2007. Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience 27: 11431–11441. Oram, M. W., and D. I. Perrett. 1994. Responses of anterior superior temporal polysensory (Stpa) neurons to biological motion stimuli. Journal of Cognitive Neuroscience 6: 99–116. Palombit, R. A., D. L. Cheney, and R. M. Seyfarth. 1999. Male grunts as mediators of social interaction with females in wild chacma baboons (Papio cynocephalus ursinus). Behaviour 136: 221–242. Parr, L. A. 2004. Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition. Animal Cognition 7: 171–178. Patterson, M. L., and J. F. Werker. 2003. Two-month-old infants match phonetic information in lips and voice. Developmental Science 6: 191–196. Remedios, R., N. K. Logothetis, and C. Kayser. 2009. An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. Journal of Neuroscience 29: 1034–1045. Robinson, D. A., and A. F. Fuchs. 1969. Eye movements evoked by stimulation of frontal eye fields. Journal of Neurophysiology 32: 637–648. Romanski, L. M., B. B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. Journal of Neurophysiology 93: 734–747. Romanski, L. M., J. F. Bates, and P. S. Goldman-Rakic. 1999. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 403: 141–157. Romanski, L. M., and A. A. Ghazanfar. 2009. The primate frontal and temporal lobes and their role in multisensory vocal communication. In Primate neuroethology, ed. M. L. Platt and A. A. Ghazanfar. Oxford: Oxford Univ. Press. Rosenblum, L. D. 2005. Primacy of multimodal speech perception. In Handbook of speech perception, ed. D. B. Pisoni and R. E. Remez, 51–78. Malden, MA: Blackwell. Sacher, G. A., and E. F. Staffeldt. 1974. Relation of gestation time to brain weight for placental mammals: Implications for the theory of vertebrate growth. American Naturalist 108: 593–615. Sams, M., R. Mottonen, and T. Sihvonen. 2005. Seeing and hearing others and oneself talk. Cognitive Brain Research 23: 429–435. Schall, J. D., A. Morel, D. J. King, and J. Bullier. 1995. Topography of visual cortex connections with frontal eye field in macaque: Convergence and segregation of processing streams. Journal of Neuroscience 15: 4464–4487. Schroeder, C. E., and J. J. Foxe, 2002. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Cognitive Brain Research 14: 187–198. Schroeder, C. E., P. Lakatos, Y. Kajikawa, S. Partan, and A. Puce. 2008. Neuronal oscillations and visual amplification of speech. Trends in Cognitive Science 12: 106–113. Schroeder, C. E., R. W. Lindsley, C. Specht, A. Marcovici, J. F. Smiley, and D. C. Javitt. 2001. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85: 1322–1327. Schwartz, J.-L., F. Berthommier, and C. Savariaux. 2004. Seeing to hear better: Evidence for early audio-visual interactions in speech identification. Cognition 93: B69–B78. Seltzer, B., and D. N. Pandya. 1989. Frontal-lobe connections of the superior temporal sulcus in the rhesusmonkey. Journal of Comparative Neurology 281: 97–113. Seltzer, B., and D. N. Pandya. 1994. Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology 343: 445–463.
666
The Neural Bases of Multisensory Processes
Sinnott, J. M., W. C. Stebbins, and D. B. Moody. 1975. Regulation of voice amplitude by monkey. Journal of the Acoustical Society of America 58: 412–414. Smiley, J. F., T. A. Hackett, I. Ulbert, G. Karmas, P. Lakatos, D. C. Javitt, and C. E. Schroeder. 2007. Multisensory convergence in auditory cortex: I. Cortical connections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology 502: 894–923. Sugihara, T., M. D. Diltz, B. B. Averbeck, and L. M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26: 11138–11147. Tremblay, S., D. M. Shiller, and D. J. Ostry. 2003. Somatosensory basis of speech production. Nature 423: 866–869. Turkewitz, G., and P. A. Kenny. 1982. Limitations on input as a basis for neural organization and perceptual development: A preliminary theoretical statement. Developmental Psychobiology 15: 357–368. Van Wassenhove, V., K. W. Grant, and D. Poeppel. 2005. Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America 102: 1181–1186. Wallace, M. T., and B. E. Stein. 2001. Sensory and multisensory responses in the newborn monkey superior colliculus. Journal of Neuroscience 21: 8886–8894. Werner-Reiss, U., K. A. Kelly, A. S. Trause, A. M. Underhill, and J. M. Groh. 2003. Eye position affects activity in primary auditory cortex of primates. Current Biology 13: 554–562. Yehia, H., P. Rubin, and E. Vatikiotis-Bateson. 1998. Quantitative association of vocal-tract and facial behavior. Speech Communication 26: 23–43. Yehia, H. C., T. Kuratate, and E. Vatikiotis-Bateson. 2002. Linking facial animation, head motion and speech acoustics. Journal of Phonetics 30: 555–568. Zangehenpour, S., A. A. Ghazanfar, D. J. Lewkowicz, and R. J. Zatorre. 2008. Heterochrony and cross-species intersensory matching by infant vervet monkeys. PLoS ONE 4: e4302.
33
Convergence of Auditory, Visual, and Somatosensory Information in Ventral Prefrontal Cortex Lizabeth M. Romanski
CONTENTS 33.1 Introduction........................................................................................................................... 667 33.2 Anatomical Innervation of Ventral Prefrontal Cortex........................................................... 668 33.2.1 Visual Projections to Ventral Prefrontal Cortex........................................................ 668 33.2.2 Auditory Projections to Prefrontal Cortex.................................................................668 33.2.3 Somatosensory Connections with Prefrontal Cortex................................................ 670 33.3 Physiological Responses in VLPFC Neurons........................................................................ 670 33.3.1 Visual Responses....................................................................................................... 670 33.3.2 Auditory Responses and Function in Prefrontal Cortex............................................ 671 33.3.3 Prefrontal Responses to Vocalizations...................................................................... 672 33.3.4 Somatosensory Responses......................................................................................... 673 33.3.5 Multisensory Responses............................................................................................ 676 33.3.6 Functional Considerations......................................................................................... 678 References....................................................................................................................................... 678
33.1 INTRODUCTION Our ability to recognize and integrate auditory and visual stimuli is the basis for many cognitive processes but is especially essential in meaningful communication. Although many brain regions contribute to recognition and integration of sensory signals, the frontal lobes both receive a multitude of afferents from sensory association areas and have influence over a wide region of the nervous system to govern behavior. Furthermore, the frontal lobes are special in that they have been associated with language processes, working memory, planning, and reasoning, which all depend on the recognition and integration of a vast network of signals. Research has also shown that somatosensory afferents reach the frontal lobe and that in specific regions single cells encode somatosensory signals. In this chapter we will focus on the ventrolateral prefrontal cortex (VLPFC), also known as the inferior convexity in some studies, and describe the connectivity of VLPFC with auditory, visual, and somatosensory cortical areas. This connectivity provides the circuitry for prefrontal responses to these stimuli, which will also be described from previous research. The potential function of combined auditory, visual, and somatosensory inputs will be described with regard to communication and object recognition. 667
668
The Neural Bases of Multisensory Processes
33.2 ANATOMICAL INNERVATION OF VENTRAL PREFRONTAL CORTEX The prefrontal cortex receives a widespread array of afferents from cortical and subcortical areas. These include sensory, motor, and association cortex and thalamic nuclei. The extensive innervation of the frontal lobe is nonetheless organized, and particular circuits have been investigated and carefully characterized leading to a better understanding of frontal lobe function based on this connectivity. Although many areas of the frontal lobe receive converging inputs, we will focus on the multisensory innervation of the VLPFC.
33.2.1 Visual Projections to Ventral Prefrontal Cortex Much of what we know about the cellular functions of the primate prefrontal cortex is based on the processing of visual information. Thus, it is not surprising that many studies have examined projections from visual association cortex to the primate prefrontal cortex. With regard to the frontal lobe, early anatomical studies by Helen Barbas, Deepak Pandya, and their colleagues (Barbas 1988; Barbas and Mesulam 1981; Barbas and Pandya 1989; Chavis and Pandya 1976) examined the innervation of the entire prefrontal mantle by visual association areas. These studies denoted some specificity in the innervation of dorsal, ventral, and medial prefrontal cortices. Barbas was among the first to note that basoventral prefrontal cortices were more strongly connected with extrastriate ventral visual areas, which have been implicated in pattern recognition and feature discrimination, whereas medial and dorsal prefrontal cortices are more densely connected with medial and dorsolateral occipital and parietal areas, which are associated with visuospatial functions (Barbas 1988). This dissociation was echoed by Bullier and colleagues (1996), who found some segregation of inputs to PFC when paired injections of tracers were placed into temporal and parietal visual processing regions. In their study, the visual temporal cortex projected mainly to area 45, located ventrolaterally in the PFC, whereas the parietal cortex sent projections to both ventrolateral PFC (area 45) and dorsolateral PFC (DLPC) (areas 8a and 46) (Schall et al. 1995; Bullier et al. 1996). Tracing and lesion studies by Ungerleider et al. (1989) showed that area TE projected specifically to three ventral prefrontal targets including the anterior limb of the arcuate sulcus (area 45), the inferior convexity just ventral to the principal sulcus (areas 46v and 12) and within the lateral orbital cortex (areas 11, 12o). These projections are via the uncinate fasciculus (Ungerleider et al. 1989). The selective connectivity of ventrolateral PFC areas 12 and 45, which contain object- and faceselective neurons (O’Scalaidhe et al. 1997, 1999; Wilson et al. 1993), with inferotemporal areas TE and TEO was specifically documented by Webster and colleagues (1994). Comparison of TE and TEO connectivity in their study revealed a number of important differences, including the finding that it is mainly area TE that projects to ventrolateral PFC and orbitofrontal areas 11, 12, and 13. These orbital regions have also been associated with visual object functions.
33.2.2 Auditory Projections to Prefrontal Cortex In early anatomical studies, lesion/degeneration techniques were used to reveal projections from the caudal superior temporal gyrus (STG) to the periprincipalis, periarcuate, and inferior convexity regions of the frontal lobe and from the middle and rostral STG to rostral principalis and orbital regions (Pandya et al. 1969; Pandya and Kuypers 1969; Jones and Powell 1970; Chavis and Pandya 1976). Studies with anterograde and retrograde tracers that were aimed at determining the overall connectivity between the temporal and frontal lobes brought additional specificity (Pandya and Sanides 1973; Galaburda and Pandya 1983; Barbas 1988; Barbas and Mesulam 1981; Barbas and Pandya 1989; Cipolloni and Pandya 1989). Studies of the periprincipalis and arcuate region showed that the anterior and middle aspects of the principal sulcus, including areas 9, 10, and 46, were connected with the middle and caudal STG (Barbas and Mesulam 1985; Petrides and Pandya 1988), whereas area 8 receives projections from mostly caudal STG (Barbas and Mesulam 1981; Petrides
669
Convergence of Information in Ventral Prefrontal Cortex
and Pandya 1988). Latter studies confirmed the connection of the posterior STG with areas 46, dorsal area 8, and the middle STG with rostral–dorsal 46 and 10, area 9, and area 12 (Petrides and Pandya 1988; Barbas 1992). Connections of ventrolateral prefrontal areas with auditory association cortex have been considered by several groups. Cytoarchitectonic analysis of the VLPFC suggested that the region labeled by Walker as area 12 in the macaque monkey has similar characteristics as that of human area 47, and was thus renamed in the macaque as area 47/12 by Petrides and Pandya (1988). Analysis of the connections of areas 45 and 47/12 in the VLPFC has shown that they receive innervation from the STG, the inferotemporal cortex, and from multisensory regions within the superior temporal sulcus. Combining physiological recording with anatomical tract tracing, Romanski and colleagues (1999) analyzed the connections of physiologically defined areas of the belt and parabelt auditory cortex and determined that the projections to prefrontal cortex are topographically arranged so that rostral and ventral prefrontal cortex receive projections from the anterior auditory association cortex (areas AL and anterior parabelt), whereas caudal prefrontal regions are innervated by the posterior auditory cortex (areas CL and caudal parabelt; Figure 33.1). Together with recent auditory physiological recordings from the lateral belt (Tian et al. 2001) and from the prefrontal cortex (Romanski and Goldman-Rakic 2002; Romanski et al. 2005), these studies suggest that separate auditory streams originate in the anterior and posterior auditory cortex and target anterior-ventrolateral object, and
(a)
1
10
2
3
asd 8b 8a 9 46d 46v 12vl 45 asv ls
cs
AI ML
CL
AL
sts
3 (b)
2
asd
asd
1 ps 46
46
45
12vl 12o
los
FIGURE 33.1 Innervation of prefrontal cortex by auditory belt and parabelt injections. (a) Projections from anterior auditory cortex to ventrolateral prefrontal cortex (VLPFC) are shown with black arrows and projections from caudal auditory cortex to dorsolateral prefrontal cortex (DLPFC) are shown in white. (b) Coronal sections through the frontal lobe detail anatomical connections. Injections placed into anterior auditory belt area AL resulted in projections to rostral 46, ventrolateral area 12vl, and lateral orbital cortex area 12o (shown in black). Projections from caudal auditory cortex area CL and adjacent parabelt targeted caudal dorsal prefrontal cortex areas 46, area 8a, and part of area 45 (shown as white cells and fibers). Projections from ML included some dorsal and ventral targets and are shown in gray. asd, dorsal ramus of arcuate sulcus; asv, ventral ramus of arcuate sulcus; cs, central sulcus; ls, lateral sulcus; sts, superior temporal sulcus.
670
The Neural Bases of Multisensory Processes
dorsolateral spatial domains in the frontal lobe, respectively (Romanski 2007), similar to those of the visual system. Ultimately, this also implies that auditory and visual afferents target similar regions of dorsolateral and ventrolateral prefrontal cortex (Price 2008). The convergence of auditory and visual ventral stream inputs to the same VLPFC domain implies that they may be integrated and combined to serve a similar function, that of object recognition.
33.2.3 Somatosensory Connections with Prefrontal Cortex Previous studies have noted connections between the principal sulcus and inferior convexity with somatosensory cortical areas (Barbas and Mesulam 1985), most notably SII and 7b (Cavada and Goldman-Rakic 1989; Preuss and Goldman-Rakic 1989; Carmichael and Price 1995). Injections that included the ventral bank of the principal sulcus and the anterior part of area 12 resulted in strong labeling of perisylvian somatic cortex including the second somatosensory area (SII) and insular cortex (Preuss and Goldman-Rakic 1989). Anterograde studies have confirmed this showing that area SII has a projection to the inferior convexity and principal sulcus region of the macaque frontal lobe (Cipolloni and Pandya 1999). This region of the PFC overlaps with the projection field of auditory association cortex and visual extrastriate cortex.
33.3 PHYSIOLOGICAL RESPONSES IN VLPFC NEURONS 33.3.1 Visual Responses In 1993, Wilson et al. (1993) published a groundbreaking study revealing a physiological dissociation between dorsal and ventral prefrontal cortex (Figure 33.2). In this study, the authors showed that DLPFC cells responded in a spatial working memory task, with single cells exhibiting selective cue and delay activity for discrete eccentric locations. In the same animals, it was shown that electrode penetrations into VLPFC regions, which included the expanse of the inferior convexity (areas 47/12 lateral, 12 orbital, and area 45), revealed single-unit responses to pictures of objects and
Dorsal stream spatial vision
8a
46 10
DLPFC 12
Parietal cortex
45
VLPFC
Ventral stream object vision
V1 Inferotemporal cortex
FIGURE 33.2 Lateral brain schematic of visual pathways in nonhuman primate showing dorsal–spatial and ventral–object visual streams that terminate in DLPFC and VLPFC, respectively. Wilson et al. (1993) showed that neurons in DLPFC (black) respond during perception and memory of visuospatial information, whereas neurons in VLPFC (gray) responded to object features including color, form, and type of visual stimulus. Later studies by O’Scalaidhe et al. (1997, 1999) described “face cells” that were localized to gray region, of VLPFC in areas 12 and 45.
Convergence of Information in Ventral Prefrontal Cortex
671
faces. These VLPFC cells did not respond in the spatial working memory task but did respond in an object-fixation task and an object-conditional association task. Further electrophysiological and neuroimaging studies have demonstrated face selectivity in this same area of VLPFC (O’Scalaidhe et al. 1997, 1999; Tsao et al. 2008), confirming this functional domain separation. Although these studies were the first to demonstrate an electrophysiological dissociation between DLPFC and VLPFC, they were not the first to suggest a functional difference and to show the preference for object as opposed to spatial processing in the ventral prefrontal cortex. An earlier study by Mishkin and Manning (1978) showed that lesions of the VLPFC in nonhuman primates interfere with the processing of nonspatial information, including color and form. These ventral prefrontal lesions had a severe and lasting impairment on the performance of three nonspatial tasks, whereas lesions of the principal sulcus had only a transient effect (Mishkin and Manning 1978). Just a few years earlier, Passingham (1975) had also suggested a dissociation between dorsal and ventral PFC. In their study, rhesus monkeys were trained on delayed color matching task and delayed spatial alternation tasks. Lesions of the VLPFC resulted in an impairment only in the delayed color matching task, whereas lesions of the DLPFC only impaired the delayed spatial alternation task. These results, like the Wilson et al. study two decades later, demonstrated a double dissociation of dorsal and ventral PFC and suggested a role in the processing of object features and recognition for the VLPFC. Further analysis of the properties of cells in the VLPFC was done by Joaquin Fuster and colleagues. In their electrophysiological analysis of ventral prefrontal neurons, they showed that single cells are responsive to simple and complex visual stimuli presented at the fovea (Pigarev et al. 1979; Rosenkilde et al. 1981). The foveal receptive field properties of these cells had first been shown in studies by Suzuki and Azuma (1977), who examined receptive field properties of neurons across the expanse of lateral prefrontal cortex. The receptive fields of neurons in DLPFC were found to lie outside the fovea and favored the contralateral visual field, whereas neurons below the principal sulcus in areas 12/47 and 45 were found to be driven best by visual stimuli shown within the fovea (Suzuki and Azuma 1977). Hoshi et al. (2000) examined the spatial distribution of location-selective and shape-selective neurons during cue, delay, and response periods, and found more locationselective neurons in the posterior part of the lateral PFC, whereas more shape-selective neurons were found in the anterior part, corresponding to area 12/47. Ninokura et al. (2004) found that cells that responded selectively to the physical properties (color and shape) of objects were localized to the VLPFC. These various studies fostered the notion that visual neurons in VLPFC were tuned to nonspatial features including color, shape, and type of object, and had receptive fields representing areas in and around the fovea. Finally, studies from Goldman-Rakic and colleagues further demonstrated that neurons in the VLPFC were not only responsive to object features, but that some neurons were highly specialized and were face-selective (Wilson et al. 1993; O’Scalaidhe et al. 1997, 1999). The face-selective neurons were found in several discrete regions including an anterior location that appears to be area 12/47, a posterior, periarcuate, location within area 45, and some penetrations into the orbital cortex also yielded face cells. These single-unit responses were further corroborated with functional magnetic resonance imaging (fMRI) data by Tsao and colleagues (2008). In their fMRI, study they showed that three loci within the VLPFC of macaques were selectively activated by faces (Tsao et al. 2008; Figure 33.3). These three locations correspond roughly to the same anterior, posterior, and ventral/orbital locations that O’Scalaidhe et al. (1997, 1999) mapped as being face-responsive in their single-unit recording studies. Demonstration by both methods of visual responsiveness and face selectivity substantiates the notion that the VLPFC is involved in object and face processing.
33.3.2 Auditory Responses and Function in Prefrontal Cortex The frontal lobe has long been linked with complex auditory function through its association with language functions and Broca’s area. What we hear and say seems to be important to frontal lobe
672
The Neural Bases of Multisensory Processes +28.5
6.5
+30.0
4
6.5 4
FIGURE 33.3 (See color insert.) Activation of macaque prefrontal cortex by faces in the study of Tsao et al. (2008). Shown here are two coronal sections showing “face patches” in VLPFC (activations are yellow), delineated with white arrows. (Reprinted from Tsao, D. Y. et al., Nat. Neurosci., 11, 877–879, 2008. With permission.)
neurons. In the human brain, the posterior aspects of Broca’s area are thought to be especially involved in the phonetic and motor control of speech, whereas more anterior regions have been shown to be activated during semantic processing, comprehension, and auditory working memory (Zatorre et al. 1992; Paulesu et al. 1993; Buckner et al. 1995; Demb et al. 1995; Fiez et al. 1996; Stromswold et al. 1996; Cohen et al. 1997; Gabrieli et al. 1998; Stevens et al. 1998; Price 1998; Posner et al. 1999; Gelfand and Bookheimer 2003). Examination of prefrontal auditory function in nonhuman primates has not received as much attention as visual prefrontal function. A few studies have investigated the effects of large prefrontal lesions on behavioral task performance of auditory discrimination or mnemonic processing of complex acoustic stimuli. In each of these four studies, relatively large lesions of the lateral PFC were shown to cause an impairment in an auditory go/no-go task for food reward (Weiskrantz and Mishkin 1958; Gross and Weiskrantz 1962; Gross 1963; Goldman and Rosvold 1970). This was taken as evidence of the PFC’s involvement in modality-independent processing especially in tasks requiring inhibitory control (Weiskrantz and Mishkin 1958). Despite the localization of language function in the human brain to ventral frontal lobe regions and the demonstration that lesions of lateral PFC in nonhuman primates interferes with auditory discrimination, single-cell responses to acoustic stimuli have only been sporadically noted in the frontal lobes of Old and New World monkeys (Benevento et al. 1977; Bodner et al. 1996; Newman and Lindsley 1976; Tanila et al. 1992, 1993; Wollberg and Sela 1980). However, a close look at these studies reveals that few of the studies sampled neurons in ventrolateral and orbitofrontal regions. Most recordings in the past have been confined to the dorsolateral surface of the frontal lobe where projections from secondary and tertiary auditory cortices are sparse. Only one early study recorded from the lateral orbital region in the macaque cortex and found both auditory and visual responses to simple visual flashes and to broadband auditory clicks (Benevento et al. 1977). Furthermore, none of the studies tested neurons systematically with naturalistic and species-relevant acoustic stimuli. Recent approaches to frontal lobe auditory function have utilized naturalistic stimuli, including species-specific vocalizations and have extended the area of investigation to orbital and ventral PFC regions.
33.3.3 Prefrontal Responses to Vocalizations After establishing the area of the prefrontal cortex that receive dense afferents from early-auditory cortical regions (Romanski et al. 1999a, 1999b), Romanski and Goldman-Rakic, revealed a discrete
Convergence of Information in Ventral Prefrontal Cortex
673
auditory responsive region in the macaque VLPFC (Romanski and Goldman-Rakic 2002). This VLPFC region has neurons that respond to complex acoustic stimuli, including species-specific vocalizations, and lies adjacent to the object- and face-selective region proposed previously (O’Scalaidhe et al. 1997, 1999; Wilson et al. 1993). Although VLPFC auditory neurons have not been thoroughly tested for directional selectivity, further examination has suggested that they encode complex auditory features and thus respond to complex stimuli on the basis of similar acoustic features (Romanski et al. 2005; Averbeck and Romanski 2006). Use of a large library of rhesus macaque vocalizations to test auditory selectivity in prefrontal neurons has shown that VLPFC neurons are robustly responsive to species-specific vocalizations (Romanski et al. 2005). A cluster analysis of these vocalization responses did not show a clustering of responses to vocalizations depicting similar functions (i.e., food calls) but demonstrated that neurons tend to respond to multiple vocalizations with similar acoustic morphology (Romanski et al. 2005; Figure 33.4). Neuroimaging in rhesus monkeys has revealed a small ventral prefrontal locus that was active during presentation of complex acoustic stimuli including vocalizations (Poremba and Mishkin 2007). Additional electrophysiological recording studies by Cohen and colleagues have suggested that prefrontal auditory neurons may also participate in the categorization of species-specific vocalizations (Gifford et al. 2005). These combined data are consistent with a role for VLPFC auditory neurons in a ventral auditory processing stream that analyzes the features of auditory objects including vocalizations. Evidence for object-based auditory processing in the ventral frontal lobe of the human brain is suggested by neuroimaging studies that have detected activation in the VLPFC not only by speech stimuli but by nonspeech and music stimuli (Belin et al. 2000; Binder et al. 2000; Scott et al. 2000; Zatorre et al. 2004) in auditory recognition tasks and voice recognition tasks (Fecteau et al. 2005). The localization of an auditory object processing stream in the human brain to the very same ventral prefrontal region in a nonhuman primate suggests a functional similarity between this area and human language-processing regions located in the inferior frontal gyrus (Deacon 1992; Romanski and Goldman- Rakic 2002).
33.3.4 Somatosensory Responses Fewer studies have examined the responses of prefrontal neurons to somatosensory stimuli. This may be partly attributable to the lack of an easy association between a known human function for somatosensory stimuli and the frontal lobes, as there is for language and audition. One group, however, has demonstrated responses to somatosensory stimuli in single lateral prefrontal neurons. Recordings in the prefrontal cortex were made while macaque monkeys performed a somatosensory discrimination task (Romo et al. 1999). Neurons were found whose discharge rates varied before and during the delay period between the two stimuli, as a monotonic function of the base stimulus frequency (Figure 33.5). These cells were localized specifically to the VLPFC, also known as the inferior convexity (Romo et al. 1999) within the same general ventral prefrontal regions where object-, face-, and auditory-responsive neurons have been recorded. The feature-based encoding of these cells supports their role in an object-based ventral stream function. In addition to this demonstration of prefrontal neuronal function in a somatosensory task, there is an early lesion study that noted an impairment in a somatosensory alternation task after large lateral prefrontal lesions but not after parietal lesions (Ettlinger and Wegener 1958). In human neuroimaging studies, it has been shown that the ventral frontal lobe is activated by somatosensory stimulation (Hagen et al. 2002). In their study, two discrete ventral frontal brain regions were responsive to somatosensory stimulation including the posterior inferior frontal gyrus and the orbitofrontal cortex. Additional neuroimaging studies have examined frontal lobe activation during haptic shape perception and discrimination. A recent fMRI study found that several frontal lobe sites were activated during haptic shape-selectivity (Miquee et al. 2008) and during visuo-haptic processing (Stilla and Sathian 2008). Most interesting is the demonstration of vibrotactile working
1000
25K
0
0
25K
0
0
Warble
1000
1000
Copulation scream
0
0
Time (s)
0.580
0
0.250
Submissive scream Copulation scream
1000
Harmonic arch
0
Submissive scream
0
0
Coo
Gecker
(c)
1000
1000
0
2
4
6
8
0
0
Grunt
Bark
1000
1000
Scream Cop Scream
Shrill Bark Warble Grunt Coo
Harmonic Arch Girney Gecker Bark
0
0
Shrill bark
Girney
1000
1000
FIGURE 33.4 A vocalization responsive cell in VLPFC. (a) Response to 10 vocalization exemplars is shown in raster/spike density plots. Strongest response was to submissive scream vocalizations and copulation scream vocalizations, which are similar in acoustic features as shown in spectrogram in panel (b). A cluster analysis (shown in c) of mean firing rate to these calls shows that calls with similar acoustic features tend to evoke similar neuronal responses. (Modified from Romanski, L. M. et al., J. Neurophysiol., 93, 734– 747, 2005.)
(b)
33.0 sp/s
(a)
Frequency (Hz)
674 The Neural Bases of Multisensory Processes
675
Convergence of Information in Ventral Prefrontal Cortex (a)
(b)
80
51
Hz 0
0 (c)
(d)
82
39
Hz 0 (e)
0 (f )
28
77
Hz 0
0 140
s
2
3
0
1
s
2
3
0
106
105
Number of neurons
(g)
1
45
0
0
1 s
2
3
FIGURE 33.5 Single-neuron spike density functions from six different neurons. Dark bars above each plot indicate times during which the neuron’s firing rate carried significant (P < .01) monotonic signal about base stimulus. (a, c, e) Positive monotonic neurons. (b, d, f) Negative monotonic neurons. (g) Total number of recorded neurons (during fixed 3-s delay period runs) carrying a significant signal about the base stimulus, as a function of time relative to beginning of delay period. Individual neurons may participate in more than one bin. Base stimulus period is shaded gray, and minimum and maximum number of neurons, during the first, middle, and last seconds of delay period, respectively, are indicated with arrows. (Reprinted by permission from Macmillan Publishers Ltd., Romo, R. et al., Nature, 399, 470–473, 1999. With permission.)
memory activation of human VLPFC areas 47/12 and 45 by Kostopoulos et al. (2007). In their fMRI study, the authors not only demonstrated activity of the VLPFC during a vibrotactile working memory task but also showed functional connectivity with the secondary somatosensory cortex, which was also active in this vibrotactile delayed discrimination task. The area activated, area 47 in the human brain, is analogous to monkey area 12/47, where face and vocalization responses have been recorded (O’Scalaidhe et al. 1997, 1999; Romanski and Goldman-Rakic 2002; Romanski et al. 2005). The anatomical, electrophysiological, and neuroimaging data suggest that somatosensory stimuli may converge in similar VLPFC regions where auditory and visual responsive neurons are found and may combine to participate in object recognition.
676
The Neural Bases of Multisensory Processes
33.3.5 Multisensory Responses The anatomical, physiological, and behavioral data described above show that the ventral frontal lobe receives afferents carrying information about auditory, visual, and somatosensory stimuli. Furthermore, physiological studies indicate that VLPFC neurons prefer complex information and are activated by stimuli with social communication information, that is, faces and vocalizations. Although only one group has examined somatosensory responses in the prefrontal cortex thus far, several imaging studies have shown activation of the ventral frontal lobe with haptic stimulation, which also holds some importance in social communication and also in object recognition. Although many human neuroimaging studies have posited a role for the frontal lobes in the integration of auditory and visual speech or communication information (Gilbert and Fiez 2004; Hickok et al. 2003; Jones and Callan 2003; Homae et al. 2002), few studies have addressed the cellular mechanisms underlying frontal lobe multisensory integration. An early study by Benevento et al. (1977) made intracellular electrophysiological recordings in the lateral orbital cortex (area 12 orbital) and found that single cells were responsive to simple auditory and visual stimuli (Benevento et al. 1977). Fuster and colleagues recorded from the lateral frontal cortex during an audiovisual matching task (Fuster et al. 2000; Bodner et al. 1996). In this task, prefrontal cortex cells responded selectively to tones, and most of them also responded to colors according to the task rule (Fuster et al. 2000). Gaffan and Harrison (1991) determined the importance of ventral prefrontal cortex in sensory integration by showing that lesions disrupt the performance of crossmodal matching involving auditory and visual objects. Importantly, Rao et al. (1997) have described integration of object and location information in single prefrontal neurons. A recent study by Romanski and colleagues has documented multisensory responses to combined auditory and visual stimuli in the VLPFC. In this study, rhesus monkeys were presented with movies of familiar monkeys vocalizing while single neurons were recorded from the VLPFC (Sugihara et al. 2006). These movies were separated into audio and video streams, and neural responses to the unimodal stimuli were compared to combined audiovisual responses. Interestingly, about half of the neurons recorded in the VLPFC were multisensory in that they responded to both unimodal auditory and visual stimuli or responded differently to simultaneously presented audiovisual stimuli than to either unimodal stimuli (Sugihara et al. 2006). As has been shown in the superior colliculus, the STS, and auditory cortex, prefrontal neurons exhibited enhancement or suppression (Figure 33.6), and, like the STS, suppression was more commonly observed than enhancement. Multisensory responses were stimulus-dependent in that not all combinations of face-vocalization stimuli elicited a multisensory response. Hence, our estimate of multisensory neurons is most likely a lower bound. If the stimulus battery tested were large enough, we would expect that more neurons would be shown to be multisensory rather than the default, unimodal visual. It was also interesting that face/voice stimuli evoked multisensory responses more frequently than nonface/nonvoice combinations, as in auditory cortex (Ghazanfar et al. 2008) and in the STS (Barraclough et al. 2005). This adds support to the notion that VLPFC is part of a circuit that is specialized for integrating face and voice information rather than integrating all forms of auditory and visual stimuli generically. Specialization for the integration of communication-relevant audiovisual stimuli in the frontal lobe, and particularly in the VLPFC, is also apparent in the human brain. An fMRI study has shown that area 47 in the human brain is active during the simultaneous presentation of gestures and speech (Xu et al. 2009). In this study, Braun and colleagues found overlap of activation in area 47 when subjects viewed gestures or listened to a voicing of the phrase that fit the gesture. The region of activation in this study of the human brain is a homologous area to that recorded by Sugihara et al., suggesting that this region of the VLPFC is specialized for the multisensory integration of communication-relevant auditory and visual information, namely, gestures (i.e., facial) and vocal sounds. Thus, there is evidence that auditory, visual, and somatosensory information is reaching the VLPFC, and is converging within areas 12/47 and 45 (Figure 33.7). Furthermore, this information
677
Convergence of Information in Ventral Prefrontal Cortex Multisensory enhancement Aud
120
0 –250 0
Vis
AV
30
Spikes/s
(a)
1000 1250
0
V
AV
A
V
AV
Multisensory suppression
35
0 –250 0
Aud
Vis
AV
15
Spikes/s
(b)
A
1000 1250
0
FIGURE 33.6 Multisensory neuronal responses in prefrontal cortex. Responses of two single units are shown in (a) and (b) as raster/spike density plots to auditory vocalization alone (Aud), face alone (Vis), and both presented simultaneously (AV). A bar graph of mean response to these stimuli is shown at right depicting auditory (dark gray), visual (white), and multisensory (light gray) responses. Cell in panel (a) exhibited multisensory enhancement and cell in panel (b) showed multisensory suppression.
appears to be related most to communication. Although Romo et al. (1999) showed evidence of somatosensory processing related to touch, the innervation of ventral prefrontal cortex includes afferents from the face region of SII (Preuss et al. 1989). This somatosensory face information is arriving at ventral prefrontal regions that receive information about face identity, features, and expression from areas TE and TPO (Webster et al. 1994; O’Scalaidhe et al. 1997, 1999), in addition as Audiovisual responsive cells (Sugihara et al. 2006)
ps
Somatosensory responsive region (Romo et al. 1999)
ls
Auditory responsive sts region (Romanski and GoldmanRakic 2002) Visual/Face responsive region (O'Scalidhe et al. 1997)
FIGURE 33.7 Auditory, visual, and somatosensory convergence in VLPFC is shown on a lateral brain schematic of macaque frontal lobe. VLPFC location of vocalization-responsive area (dark gray), visual object- and face-responsive area (light gray), somatosensory-responsive area (dashed line circle), audiovisual-responsive cells (black dots) are all depicted on prefrontal cortex of macaque monkey in which they were recorded as, arcuate sulcus; ls, lateral sulcus; ps, principal sulcus; sts, superior temporal sulcus. (Data from Sugihara, T. et al., J. Neurosci., 26, 11138–11147, 2006.)
678
The Neural Bases of Multisensory Processes
to auditory inputs that carry information regarding species-specific vocalizations (Romanski et al. 2005).
33.3.6 Functional Considerations Although a number of studies have examined the response of prefrontal neurons to face, vocalization, and somatosensory stimuli during passive fixation tasks, it is expected that the VLPFC utilizes these stimuli in more complex processes. There is no doubt that the context of a task will affect the firing of VLPFC neurons. Nonetheless, face and vocalization stimuli are different from typical simple sensory stimuli in that they already carry semantic meaning and emotional valence and need no additional task contexts to make them relevant. A face or vocalization, even when presented in a passive task, will be associated with previous experiences, emotions, and meanings that will evoke responses in a number of brain areas that project to the VLPFC, whereas simple sensory stimuli do not have innate associations and depend only on task contingencies to give them relevance. Thus, responses to face, voice, and other communication-relevant stimuli in prefrontal neurons are the sum total of experience with these stimuli in addition to any task or contextual information presented. The combination of somatosensory face or touch information, visual face information, and vocalization information could play a number of roles. First, the general process of conjunction allows for the combining of auditory, visual, and/or somatosensory stimuli for many known and, as yet, unknown functions. Thus, the VLPFC may serve a general purpose in allowing complex stimuli related to any of the modalities to be integrated. This may be especially relevant for the frontal lobe when the information is to be remembered or operated on in some way. However, a function more directly suited to the process of communication would be feedback control of articulation. Auditory information that is coded phonologically and mouth or face movements perceived via somatosensory input would be integrated, and then orofacial movements could be adjusted to alter the production of sounds via a speech/vocalization output circuit. The posterior part of the inferior frontal gyrus (Broca’s area) has been shown, via lesions analysis and neuroimaging, to play a role in the production of this phonetic code, or articulatory stream. In contrast, the anterior inferior frontal gyrus may integrate auditory, somatosensory, and visual perceptual information to produce this stream (Papoutsi et al. 2009). The somatosensory feedback regarding positioning of the mouth and face would play an important role in control of articulation. The visual face and auditory vocalization information available to these neurons could provide further information from a speaker that warrants a reply or could provide information about a hand or face during a gesture. Thus, a third function for the combination of auditory, visual, and somatosensory information would be the perception, memory, and execution of gestures that accompany speech and vocalizations. The VLPFC may also be part of a larger circuit that has been called the mirror neuron system. This system is purported to be involved in the perception and execution of gestures as occurs in imitation (Rizzolatti and Craighero 2004). The VLPFC has reciprocal connections with many parts of the mirror neuron circuit. Finally, convergence of auditory, visual, and haptic information could also be used in face or object recognition especially when one sense is not optimal, and additional information from other sensory modalities is needed to confirm identification. The convergence of these sensory modalities and others may play additional functional roles during a variety of complex cognitive functions.
REFERENCES Averbeck, B. B., L. M. Romanski. 2006. Probabilistic encoding of vocalizations in macaque ventral lateral prefrontal cortex. Journal Neuroscience 26: 11023–11033. Barbas, H. 1988. Anatomic organization of basoventral and mediodorsal visual recipient prefrontal regions in the rhesus monkey. Journal of Comparative Neurology 276: 313–342.
Convergence of Information in Ventral Prefrontal Cortex
679
Barbas, H. 1992. Architecture and cortical connections of the prefrontal cortex in the rhesus monkey. Advances in Neurology 57: 91–115. Barbas, H., and D. N. Pandya. 1989. Architecture and intrinsic connections of the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 286: 353–375. Barbas, H., and M. M. Mesulam. 1981. Organization of afferent input to subdivisions of area 8 in the rhesus monkey. Journal of Comparative Neurology 200: 407–431. Barbas, H., and M. M. Mesulam. 1985. Cortical afferent input to the principalis region of the rhesus monkey. Neuroscience 15: 619–637. Barraclough, N. E., D. Xiao, C. I. Baker, M. W. Oram, and D. I. Perrett. 2005. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. Journal of Cognitive Neuroscience 17: 377–391. Belin, P., R. J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex. Nature 403: 309–312. Benevento, L. A., J. Fallon, B. J. Davis, and M. Rezak. 1977. Auditory–visual interaction in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology 57: 849–872. Binder, J. R., J. A. Frost, T. A. Hammeke, P. S. Bellgowan, J. A. Springer, J. N. Kaufman, and E. T. Possing. 2000. Human temporal lobe activation by speech and nonspeech sounds. Cerebral Cortex 10: 512–528. Bodner, M., J. Kroger, and J. M. Fuster. 1996. Auditory memory cells in dorsolateral prefrontal cortex. Neuroreport 7: 1905–1908. Buckner, R. L., M. E. Raichle, and S. E. Petersen. 1995. Dissociation of human prefrontal cortical areas across different speech production tasks and gender groups. Journal of Neurophysiology 74: 2163–2173. Bullier, J., J. D. Schall, and A. Morel. 1996. Functional streams in occipito-frontal connections in the monkey. Behavioural Brain Research 76: 89–97. Carmichael, S. T., and J. L. Price. 1995. Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. Journal of Comparative Neurology 363: 642–664. Cavada, C., and P. S. Goldman-Rakic. 1989. Posterior parietal cortex in rhesus monkey: II. Evidence for segregated corticocortical networks linking sensory and limbic areas with the frontal lobe. Journal of Comparative Neurology 287: 422–445. Chavis, D. A., and D. N. Pandya. 1976. Further observations on corticofrontal connections in the rhesus monkey. Brain Research 117: 369–386. Cipolloni, P. B., and D. N. Pandya. 1989. Connectional analysis of the ipsilateral and contralateral afferent neurons of the superior temporal region in the rhesus monkey. Journal of Comparative Neurology 281: 567–585. Cipolloni, P. B., and D. N. Pandya. 1999. Cortical connections of the frontoparietal opercular areas in the rhesus monkey. Journal of Comparative Neurology 403: 431–458. Cohen, J. D., W. M. Perlstein, T. S. Braver, L. E. Nystrom, D. C. Noll, J. Jonides, and E. E. Smith. 1997. Temporal dynamics of brain activation during a working memory task. Nature 386: 604–608. Cohen, Y. E., F. Theunissen, B. E. Russ, and P. Gill. 2007. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. Journal of Neurophysiology 97: 1470–1484. Deacon, T. W. 1992. Cortical connections of the inferior arcuate sulcus cortex in the macaque brain. Brain Research 573: 8–26. Demb, J. B., J. E. Desmond, A. D. Wagner, C. J. Vaidya, G. H. Glover, and J. D. Gabrieli. 1995. Semantic encoding and retrieval in the left inferior prefrontal cortex: A functional MRI study of task difficulty and process specificity. Journal of Neuroscience 15: 5870–5878. Ettlinger, G., and J. Wegener. 1958. Somaesthetic alternation, discrimination and orientation after frontal and parietal lesions in monkeys. The Quarterly Journal of Experimental Psychology 10: 177–186. Fecteau, S., J. L. Armony, Y. Joanette, and P. Belin. 2005. Sensitivity to voice in human prefrontal cortex. Journal of Neurophysiology 94: 2251–2254. Fiez, J. A., E. A. Raife, D. A. Balota, J. P. Schwarz, M. E. Raichle, and S. E. Petersen. 1996. A positron emission tomography study of the short-term maintenance of verbal information. Journal of Neuroscience 16: 808–822. Fuster, J. M., M. Bodner, and J. K. Kroger. 2000. Cross-modal and cross-temporal association in neurons of frontal cortex. Nature 405: 347–351. Gabrieli, J. D. E., R. A. Poldrack, and J. E. Desmond. 1998. The role of left prefrontal cortex in language and memory. Proceedings of the National Academy of Sciences 95: 906–913. Gaffan, D., and S. Harrison. 1991. Auditory–visual associations, hemispheric specialization and temporalfrontal interaction in the rhesus monkey. Brain 114: 2133–2144.
680
The Neural Bases of Multisensory Processes
Galaburda, A. M., and D. N. Pandya. 1983. The intrinsic architectonic and connectional organization of the superior temporal region of the rhesus monkey. Journal of Comparative Neurology 221: 169–184. Gelfand, J. R., and S. Y. Bookheimer. 2003. Dissociating neural mechanisms of temporal sequencing and processing phonemes. Neuron 38: 831–842. Ghazanfar, A. A., C. Chandrasekaran, and N. K. Logothetis. 2008. Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience 28: 4457–4469. Gilbert, A. M., and J. A. Fiez. 2004. Integrating rewards and cognition in the frontal cortex. Cognitive, Affective and Behavioral Neuroscience 4: 540–552. Goldman, P. S., and H. E. Rosvold. 1970. Localization of function within the dorsolateral prefrontal cortex of the rhesus monkey. Experimental Neurology 27: 291–304. Gross, C. G. 1963. A comparison of the effects of partial and total lateral frontal lesions on test performance by monkeys. Journal of Comparative Physiological Psychology 56: 41–47. Gross, C. G., and L. Weiskrantz. 1962. Evidence for dissociation of impairment on auditory discrimination and delayed response following lateral frontal lesions in monkeys. Experimental Neurology 5: 453–476. Hagen, M. C., D. H. Zald, T. A. Thornton, and J. V. Pardo. 2002. Somatosensory processing in the human inferior prefrontal cortex. Journal of Neurophysiology 88: 1400–1406. Hickok, G., B. Buchsbaum, C. Humphries, and T. Muftuler. 2003. Auditory–motor interaction revealed by fMRI: Speech, music, and working memory in area spt. Journal of Cognitive Neuroscience 15: 673–682. Homae, F., R. Hashimoto, K. Nakajima, Y. Miyashita, and K. L. Sakai. 2002. From perception to sentence comprehension: The convergence of auditory and visual information of language in the left inferior frontal cortex. NeuroImage 16: 883–900. Hoshi, E., K. Shima, and J. Tanji. 2000. Neuronal activity in the primate prefrontal cortex in the process of motor selection based on two behavioral rules. Journal of Neurophysiology 83: 2355–2373. Gifford III, G. W., K. A. Maclean, M. D. Hauser, and Y. E. Cohen. 2005. The neurophysiology of functionally meaningful categories: Macaque ventrolateral prefrontal cortex plays a critical role in spontaneous categorization of species-specific vocalizations. Journal of Cognitive Neuroscience 17: 1471–1482. Jones, E. G., and T. P. Powell. 1970. An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93: 793–820. Jones, J. A., and D. E. Callan. 2003. Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect. Neuroreport 14: 1129–1133. Kostopoulos, P., M. C. Albanese, and M. Petrides. 2007. Ventrolateral prefrontal cortex and tactile memory disambiguation in the human brain. Proceedings of the National Academy of Sciences of the United States of America 104: 10223–10228. Miquee, A., C. Xerri, C. Rainville, J. L. Anton, B. Nazarian, M. Roth, and Y. Zennou-Azogui. 2008. Neuronal substrates of haptic shape encoding and matching: A functional magnetic resonance imaging study. Neuroscience 152: 29–39. Mishkin, M., and F. J. Manning. 1978. Non-spatial memory after selective prefrontal lesions in monkeys. Brain Research 143: 313–323. Newman, J. D., and D. F. Lindsley. 1976. Single unit analysis of auditory processing in squirrel monkey frontal cortex. Experimental Brain Research 25: 169–181. Ninokura, Y., H. Mushiake, and J. Tanji. 2004. Integration of temporal order and object information in the monkey lateral prefrontal cortex. Journal of Neurophysiology 91: 555–560. O’Scalaidhe, S. P. O., F. A. W. Wilson, and P. G. R. Goldman-Rakic. 1999. Face-selective neurons during passive viewing and working memory performance of rhesus monkeys: Evidence for intrinsic specialization of neuronal coding. Cerebral Cortex 9: 459–475.1 O’Scalaidhe, S. P., F. A. Wilson, and P. S. Goldman-Rakic. 1997. Areal segregation of face-processing neurons in prefrontal cortex. Science 278: 1135–1138. Pandya, D. N., and F. Sanides. 1973. Architectonic parcellation of the temporal operculum in rhesus monkey and its projection pattern. Zeitschrift fuer Anatomie und Entwicklungsgeschichte 139: 127–161. Pandya, D. N., and H. G. Kuypers. 1969. Cortico-cortical connections in the rhesus monkey. Brain Research 13: 13–36. Pandya, D. N., M. Hallett, and S. K. Kmukherjee. 1969. Intra- and interhemispheric connections of the neocortical auditory system in the rhesus monkey. Brain Research 14: 49–65. Papoutsi, M., J. A. de Zwart, J. M. Jansma, M. J. Pickering, J. A. Bednar, and B. Horwitz. 2009. From phonemes to articulatory codes: An fMRI study of the role of Broca’s area in speech production. Cerebral Cortex 19: 2156–2165.
Convergence of Information in Ventral Prefrontal Cortex
681
Passingham, R. 1975. Delayed matching after selective prefrontal lesions in monkeys (Macaca mulatta). Brain Research 92: 89–102. Paulesu, E., C. D. Frith, and R. S. J. Frackowiak. 1993. The neural correlates of the verbal component of working memory. Nature 362: 342–5.32 Petrides, M., and D. N. Pandya. 1988. Association fiber pathways to the frontal cortex from the superior temporal region in the rhesus monkey. Journal of Comparative Neurology 273: 52–66. Pigarev, I. N., G. Rizzolatti, and C. Schandolara. 1979. Neurons responding to visual stimuli in the frontal lobe of macaque monkeys. Neuroscience Letters 12: 207–212. Poremba, A., and M. Mishkin. 2007. Exploring the extent and function of higher-order auditory cortex in rhesus monkeys. Hearing Research 229: 14–23. Posner, M. I., Y. G. Abdullaev, B. D. McCandliss, and S. C. Sereno. 1999. Neuroanatomy, circuitry and plasticity of word reading. Neuroreport 10: R12–R23. Preuss, T. M., and P. S. Goldman-Rakic. 1989. Connections of the ventral granular frontal cortex of macaques with perisylvian premotor and somatosensory areas: Anatomical evidence for somatic representation in primate frontal association cortex. Journal of Comparative Neurology 282: 293–316. Price, C. J. 1998. The functional anatomy of word comprehension and production. Trends in Cognitive Sciences 2: 281–288. Price, J. L. 2008. Multisensory convergence in the orbital and ventrolateral prefrontal cortex. Chemosensory Perception 1: 103–109. Rao, S. C., G. Rainer, and E. K. Miller. 1997. Integration of what and where in the primate prefrontal cortex. Science 276: 821–824. Rizzolatti, G., and L. Craighero. 2004. The mirror–neuron system. Annual Review of Neuroscience 27: 169–192. Romanski, L. M. 2007. Representation and integration of auditory and visual stimuli in the primate ventral lateral prefrontal cortex. Cerebral Cortex 17 S1: i61–i69. Romanski, L. M., and P. S. Goldman-Rakic. 2002. An auditory domain in primate prefrontal cortex. Nature Neuroscience 5: 15–16. Romanski, L. M., B. B. Averbeck, and M. Diltz. 2005. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. Journal of Neurophysiology 93: 734–747. Romanski, L. M., B. Tian, J. Fritz, M. Mishkin, P. S. Goldman-Rakic, and J. P. Rauschecker. 1999b. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience 2: 1131–1136. Romanski, L. M., J. F. Bates, and P. S. Goldman-Rakic. 1999a. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology 403: 141–157. Romo, R., C. D. Brody, A. Hernandez, and L. Lemus. 1999. Neuronal correlates of parametric working memory in the prefrontal cortex. Nature 399: 470–473. Rosenkilde, C. E., R. H. Bauer, and J. M. Fuster. 1981. Single cell activity in ventral prefrontal cortex of behaving monkeys. Brain Research 209: 375–394. Schall, J. D., A. Morel, D. J. King, and J. Bullier. 1995. Topography of visual cortex connections with frontal eye field in macaque: Convergence and segregation of processing streams. Journal of Neuroscience 15: 4464–4487. Scott, S. K., C. C. Blank, S. Rosen, and R. J. Wise. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain 12: 2400–2406. Stevens, A. A., P. S. Goldman-Rakic, J. C. Gore, R. K. Fulbright, and B. E. Wexler. 1998. Cortical dysfunction in schizophrenia during auditory word and tone working memory demonstrated by functional magnetic resonance imaging. Archives of General Psychiatry 55: 1097–1103. Stilla, R., and K. Sathian. 2008. Selective visuo-haptic processing of shape and texture. Human Brain Mapping 29: 1123–1138. Stromswold, K., D. Caplan, N. Alpert, and S. Rauch. 1996. Localization of syntactic comprehension by positron emission tomography. Brain & Language 52: 452–473. Sugihara, T., M. D. Diltz, B. B. Averbeck, and L. M. Romanski. 2006. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. Journal of Neuroscience 26: 11138–11147. Suzuki, H., and M. Azuma. 1977. Prefrontal neuronal activity during gazing at a light spot in the monkey. Brain Research 126: 497–508. Tanila, H., S. Carlson, I. Linnankoski, and H. Kahila. 1993. Regional distribution of functions in dorsolateral prefrontal cortex of the monkey. Behavioral Brain Research 53: 63–71. Tanila, H., S. Carlson, I. Linnankoski, F. Lindroos, and H. Kahila. 1992. Functional properties of dorsolateral prefrontal cortical neurons in awake monkey. Behavioral Brain Research 47: 169–180.
682
The Neural Bases of Multisensory Processes
Tian, B., D. Reser, A. Durham, A. Kustov, and J. P. Rauschecker. 2001. Functional specialization in rhesus monkey auditory cortex. Science 292: 290–293. Tsao, D. Y., N. Schweers, S. Moeller, and W. A. Freiwald. 2008. Patches of face-selective cortex in the macaque frontal lobe. Nature Neuroscience 11: 877–879. Ungerleider, L. G., D. Gaffan, and V. S. Pelak. 1989. Projections from inferior temporal cortex to prefrontal cortex via the uncinate fascicle in rhesus monkeys. Experimental Brain Research 76: 473–484. Webster, M. J., J. Bachevalier, and L. G. Ungerleider. 1994. Connections of inferior temporal areas TEO and TE with parietal and frontal cortex in macaque monkeys. Cerebral Cortex 4: 470–483. Weiskrantz, L., and M. Mishkin. 1958. Effects of temporal and frontal cortical lesions on auditory discrimination in monkeys. Brain 80: 406–414. Wilson, F. A., S. P. O’Scalaidhe, and P. S. Goldman-Rakic. 1993. Dissociation of object and spatial processing domains in primate prefrontal cortex. Science 260: 1955–1958. Wollberg, Z., and J. Sela. 1980. Frontal cortex of the awake squirrel monkey: Responses of single cells to visual and auditory stimuli. Brain Research 198: 216–220. Xu, J., P. J. Gannon, K. Emmorey, J. F. Smith, and A. R. Braun. 2009. Symbolic gestures and spoken language are processed by a common neural system. Proceedings of the National Academy of Sciences of the United States of America 106: 20664–20669. Zatorre, R. J., A. C. Evans, E. Meyer, and A. Gjedde. 1992. Lateralization of phonetic and pitch discrimination in speech processing. Science 256: 846–849. Zatorre, R. J., M. Bouffard, and P. Belin. 2004. Sensitivity to auditory object features in human temporal neocortex. The Journal of Neuroscience 24: 3637–3642.
34
A Multisensory Perspective on Human Auditory Communication Katharina von Kriegstein
CONTENTS 34.1 34.2 34.3 34.4
Introduction........................................................................................................................... 683 The Auditory Perspective on Auditory Communication....................................................... 684 The Visual Perspective on Visual Communication............................................................... 685 The Multisensory Perspective on Auditory Communication................................................ 686 34.4.1 Improving Unisensory Recognition by Multisensory Learning................................ 687 34.4.1.1 Face Benefit: Auditory Recognition Is Improved after Voice–Face Learning...................................................................................................... 687 34.4.1.2 Is the Face Benefit Caused by Greater Attention during Voice–Face Learning?.................................................................................................... 688 34.4.1.3 Importance of a Common Cause for Rapid Learning Effects.................... 689 34.4.2 Auditory–Visual Model for Human Auditory Communication................................690 34.4.2.1 Visual Face Areas Are Behaviorally Relevant for Auditory Recognition.................................................................................................690 34.4.3 A Multisensory Predictive Coding Framework for Auditory Communication . ...... 693 34.5 Conclusions and Future Directions........................................................................................ 694 References....................................................................................................................................... 695
34.1 INTRODUCTION We spend a large amount of our time communicating with other people. Much of this communication occurs face to face, where the availability of sensory input from several modalities (e.g., auditory, visual, tactile, olfactory) ensures a robust perception of information (e.g., Sumby and Pollack 1954; Gick and Derrick 2009). Robustness, in this case, means that the perception of a communication signal is veridical even when parts of the signal are noisy or occluded (Ay et al. 2007). For example, if the auditory speech signal is noisy, then the concurrent availability of visual speech signals (e.g., lip movements and gestures) improves the perception of the speech information (Sumby and Pollack 1954; Ross et al. 2007). The robustness in face-to-face communication does not only pertain to speech recognition (Sumby and Pollack 1954; Ross et al. 2007), but also to other information relevant for successful human interaction, for example, recognition of gender (Smith et al. 2007), emotion (de Gelder and Vroomen 1995; Massaro and Egan 1996), or identity (Schweinberger et al. 2007). Nevertheless, in our daily life there are also often situations when only a single modality is available, for example, when talking on the phone, listening to the radio, or when seeing another person from a distance. Current models assume that perception in these unimodal tasks is based on and 683
684
The Neural Bases of Multisensory Processes
constrained to the unimodal sensory system. For example, in this view, solely the auditory system is involved in the initial sensory analysis of the auditory speech signal during a telephone conversation (see, e.g., Belin et al. 2004; Scott 2005; Hickok and Poeppel 2007). Similarly, it is assumed that solely the visual system is involved in the initial sensory analysis of faces (Bruce and Young 1986; Haxby et al. 2000). In this chapter, I will review evidence that these models might need to be extended; perception in human communication may always involve multisensory processing even when our brains are processing only unimodal input (see, e.g., Hall et al. 2005; Pitcher et al. 2008; von Kriegstein et al. 2008b). This involvement of multisensory processing might contribute to the robustness of perception. I will start with a brief overview on mechanisms and models for auditory speech and visual face processing from a modality-specific perspective. This will be followed by a summary and discussion of recent behavioral and functional neuroimaging experiments in human auditory communication that challenge the modality-specific view. They show that an interaction between auditory and visual sensory processing can increase robustness and high performance in auditory-only communication. I conclude with a view how one can explain these findings with a model that unifies unimodal and multimodal recognition.
34.2 THE AUDITORY PERSPECTIVE ON AUDITORY COMMUNICATION Given good listening conditions, auditory speech perception leads to reliable comprehension of what is said. Auditory speech additionally reveals information about many other things, for example, the identity (Sheffert et al. 2002), social and geographical background (Clopper and Pisoni 2004; Thomas and Reaser 2004), or the emotional state of the speaker (Johnson et al. 1986; Scherer 1986). Although all this information is relevant for successful human communication in this chapter, I will focus mostly on two aspects: (1) recognition of what is said (speech recognition) and (2) recognition of who is talking (speaker recognition). Large areas in the human brain are dedicated to processing auditory speech. It is still a matter of debate how speech-specific these areas are (Price et al. 2005; Nelken and Bar-Yosef 2008; von Kriegstein et al. 2007; Bizley et al. 2009). Basic perceptual features of speech and nonspeech sounds are pitch and timbre. Voice pitch is related to the vibration rate of the glottal folds. This information is processed relatively early in the auditory hierarchy, that is, in the brainstem (inferior colliculus) and close to the primary auditory cortex in Heschl’s gyrus (Griffiths et al. 2001; Patterson et al. 2002; Penagos et al. 2004; von Kriegstein et al. 2010). Timbre is an umbrella term and operationally defined as the perceptual difference of two sounds having the same pitch, duration, and intensity (American Standards Association 1960). It comprises such acoustic features as the spectral envelope (i.e., the shape of the power spectrum) and the amplitude envelope (i.e., the shape of the amplitude waveform) of the sound (Grey 1977; Iverson and Krumhansl 1993; McAdams et al. 1995). The difference between the two speech sounds /a/ and /o/ (spoken with the same voice pitch, intensity, and duration) is based on the different positions of the articulators (lips, tongue, etc.), which affects the timbre of the sound. Moreover, the difference between an /a/ spoken by two different speakers (with the same voice pitch) differs in the timbre, for example, because the two speakers have distinct sizes of the vocal tract. In contrast to pitch, differential responses to timbre in nonspeech and speech sounds have been reported further away from primary auditory cortex in superior temporal gyrus (STG) and sulcus (STS) (Menon et al. 2002; Warren et al. 2005). Posterior STG/STS contains regions that are more involved in processing certain aspects of timbre of speech sounds (i.e., those reflecting the size of the speaker) in contrast to similar aspects of timbre in nonspeech sounds (i.e., those reflecting the size of the musical instrument or animal; von Kriegstein et al. 2007). Bilateral posterior STG/STS has also been implicated in mapping acoustic signals into speech sounds and speech sound categories, that is, phonemes (for review, see Hickok and Poeppel 2007; Obleser and Eisner 2008). Speech processing is left-lateralized if the experimental design emphasizes understanding what is said, for example, if speech recognition tasks are contrasted with speaker recognition tasks (but using the same auditory speech input) (Leff et al. 2008; Scott et al.
A Multisensory Perspective on Human Auditory Communication
685
2000; von Kriegstein et al. 2003, 2008a). In contrast, right temporal lobe regions [temporal lobe voice areas (TVA)] are more involved in extracting the nonlinguistic voice properties of the speech signal such as speaker identity (Belin et al. 2000, 2002; von Kriegstein et al. 2003; von Kriegstein and Giraud 2004). This left–right dichotomy is also supported by lesion studies that typically find speech processing deficits after left-hemispheric lesions. In contrast, acquired phonagnosia, that is, a deficiency in recognizing identity by voice, has been reported with right parietal and temporal lobe damage (Van Lancker and Canter 1982; Van Lancker et al. 1989; Neuner and Schweinberger 2000; Lang et al. 2009). Whether the left–right dichotomy is only relative is still a matter of debate (Hickok and Poeppel 2000). For example, although speech recognition can be impaired after left hemispheric lesions (Boatman et al. 1995), it can also be impaired after right hemispheric lesions in adverse listening conditions (Boatman et al. 2006). The functional view of hemispheric specialization might boil down to a specialization of different regions for different time windows in the speech input. There is evidence that the right hemisphere samples over longer time windows than the left hemisphere (Poeppel 2003; Boemio et al. 2005; Giraud et al. 2007; Abrams et al. 2008; Overath et al. 2008). This implies that the relative specialization of the left hemisphere for speech processing is a result of the highly variable nature of the acoustic input required for speech recognition. In contrast, the relative specialization of the right hemisphere for speaker processing might be a result of the relatively constant nature of speaker parameters, which also enable us to identify others by voice (Lavner et al. 2000; Sheffert et al. 2002). In addition to temporal lobe areas, there is evidence that motor regions (i.e., primary motor and premotor cortex) play a role in the sensory analysis of speech sounds at the level of phonemes and syllables (Liberman and Mattingly 1985; Watkins et al. 2003; D’Ausilio et al. 2009); however, whether this involvement reflects a necessary sensory mechanism or other mechanisms necessary for spoken language comprehension is still being debated (Hickok and Poeppel 2007; Scott et al. 2009). At a higher level, one of the overarching goal of the sensory analysis of speech signals is to understand spoken language or to recognize who is talking. The former involves a range of processing steps from connecting speech sounds to words and sentences, to grammatical rules and semantic processing. These processing steps involve an extended network of brain areas (see, e.g., Vigneau et al. 2006; Price 2000; Marslen-Wilson and Tyler 2007). For example, prefrontal areas (BA 44/45) have been implicated in relatively complex language tasks such as syntax or working memory (Friederici 2002; Hickok and Poeppel 2007). Furthermore, semantic analysis might involve several temporal lobe areas as well as an associative system comprising many widely distributed brain regions (Martin and Caramazza 2003; Barsalou 2008). One example for such semantic analysis is the involvement of specific areas in the motor cortex for action words (Pulvermuller et al. 2006; Hauk et al. 2008). Moreover, the recognition of who is talking involves processing steps beyond sensory analysis of speaker characteristics and voice identification, for example, associating a specific face or name with the voice. This is thought to involve several extra-auditory areas, for example, supramodal areas coding for person identity or visual areas that are involved in face identity processing (Ellis et al. 1997; Gainotti et al. 2003; Tsukiura et al. 2006; von Kriegstein and Giraud 2006; Campanella and Belin 2007).
34.3 THE VISUAL PERSPECTIVE ON VISUAL COMMUNICATION A substantial amount of communication information is transmitted in the visual domain. A particularly important visual input is the face. Dynamic face information complements auditory information about what is said (e.g., lip movements; Chandrasekaran et al. 2009). Moreover, the face provides a reliable means for recognizing people as well as the emotion of the speaker. I will focus here on face information although there are, of course, other types of visual information that play important roles in communication. These include hand gestures during face-to-face communication or text and emoticons in e-mail communication.
686
The Neural Bases of Multisensory Processes
Similarly to the auditory modality, face processing is assumed to be separated into processing of variable aspects of the face (e.g., expression, speech-related orofacial movements) and processing of the more invariant aspects of the face (i.e., face identity) (Bruce and Young 1986; Burton et al. 1999; Haxby et al. 2000). This distinction was initially based on behavioral studies showing that face movement or expression processing can be separately impaired from face-identity processing (Bruce and Young 1986; Young et al. 1993). For example, the patient LM cannot speech-read from moving faces but has intact face recognition (Campbell et al. 1997). In contrast, prosopagnosics, that is people who have a deficiency in recognizing identity from the face, are thought to be unimpaired in the recognition of dynamic aspects of the human face (Humphreys et al. 1993; Lander et al. 2004; Duchaine et al. 2003). The prevalent model of face processing assumes that aspects relevant for identity recognition are processed in the fusiform face area (FFA) in the ventrotemporal cortex (Sergent et al. 1992; Kanwisher et al. 1997; Haxby et al. 2000; Bouvier and Engel 2006). Recognition of face expression and face movement involves the mid/posterior STS (Puce et al. 1998; Pelphrey et al. 2005; Thompson et al. 2007). However, not all studies are in support for two entirely separate routes in processing face identity and face dynamics, and the extent of specialization of the two areas for dynamic versus invariant aspects of faces is still under debate (O’Toole et al. 2002; Calder and Young 2005; Thompson et al. 2005; Fox et al. 2009). Visual and visual association cortices have been described as the “core system” of face perception. In contrast, the “extended system” of face perception involves several nonvisual brain regions— for example, the amygdala and anterior temporal lobe for processing social significance, emotion, and person identity (Baron-Cohen et al. 2000; Haxby et al. 2000; Neuner and Schweinberger 2000; Haxby et al. 2002; Kleinhans et al. 2009). Furthermore, in the model developed by Haxby and colleagues, the extended system also comprises auditory cortices that are activated in response to lipreading from faces (Calvert et al. 1997; Haxby et al. 2000).
34.4 THE MULTISENSORY PERSPECTIVE ON AUDITORY COMMUNICATION There is growing evidence that sensory input of one modality can lead to neuronal responses or altered processing in sensory areas of another modality (Calvert et al. 1997; Sathian et al. 1997; Zangaladze et al. 1999; Hall et al. 2005; von Kriegstein et al. 2005; Besle et al. 2008). For example, lipreading from visual-only videos of faces is associated with responses in auditory cortices even if no auditory input is available (i.e., Heschl’s and Planum temporale) (Hall et al. 2005; Pekkola et al. 2005; Besle et al. 2008). Furthermore, if both the auditory and visual information of a speaker’s face are available, then the neuronal dynamics in auditory sensory cortices are modulated by the visual information (van Wassenhove et al. 2005; Arnal et al. 2009). In these studies, the amount of modulation was dependent on how predictable the visual information was in relation to the auditory signal (van Wassenhove et al. 2005; Arnal et al. 2009). For example, a visual /p/, which is a speech sound that can be visually easily distinguished from other speech sounds, led to faster auditory responses to the auditory stimulus than the more difficult-to-distinguish /k/. Because visual information about the facial movements precedes auditory information in time (Chandrasekaran et al. 2009), these altered responses to auditory stimuli could reflect the transmission of predictive visual information to the auditory cortices. This predictive information could be used to improve recognition by resolving auditory ambiguities. In this view, the alteration of responses in auditory cortices would provide a benefit for processing of the auditory stimulus. Such a mechanism might be responsible for the robustness of perception in multisensory situations. The above studies have shown that input from one modality (e.g., visual) can influence responses in the cortices of another input modality (e.g., auditory) and by that might improve behavioral performance in multisensory situations. What, however, happens in unisensory situations, when only one input modality is available? Does it have any behavioral relevance that the input modality (e.g., auditory) influences the activity in sensory cortices of another modality (e.g., visual) that does not
A Multisensory Perspective on Human Auditory Communication
687
receive any direct sensory (i.e., visual) input? Recent research suggests that it does. For example, activation of visual areas has been shown to improve recognition of speech information in auditoryonly situations, such as when talking on the phone (von Kriegstein and Giraud 2006; von Kriegstein et al. 2006; von Kriegstein et al. 2008b). These findings show that (after a brief period of audiovisual learning), activation of visual association cortices (i.e., the FFA and the face-movement sensitive STS) is correlated with behavioral benefits for auditory-only recognition. Such findings are at odds with the above-described unisensory perspective on auditory-only communication, because they imply that not only auditory sensory but also visual sensory areas are instrumental for auditoryonly tasks. In the following, I will review these behavioral and neuroimaging findings in detail and discuss the implications for models of human auditory-only perception in human communication.
34.4.1 Improving Unisensory Recognition by Multisensory Learning 34.4.1.1 Face Benefit: Auditory Recognition Is Improved after Voice–Face Learning Recent studies show that a brief period of prior face-to-face communication improves our ability to identify a particular speaker by his/her voice in auditory-only conditions (Sheffert and Olson 2004; von Kriegstein et al. 2006; von Kriegstein et al. 2008b) as well as understanding what this speaker is saying (von Kriegstein et al. 2008b). One of the earliest indications for such beneficial speaker-specific effects of prior face-to-face communication came from a behavioral study on recognition of speakers in auditory-only situations (Sheffert and Olson 2004). In this study, subjects were first trained to associate names with five speakers’ voices. Training was done in two groups. One group (audiovisual) learned via previously recorded auditory–visual videos of the speaker’s voice and face. The other group (auditory-only) learned by listening to the auditory track of the same videos (the face was not visible). After training, both groups were tested on recognizing the speakers by voice in auditory-only conditions. The results showed that the audiovisual learning was more effective than the auditory-only learning. This beneficial effect of multisensory learning on voice recognition has been reproduced in two further studies involving several control conditions (von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b; see Figure 34.1 for an example design from one of the studies). Translated into everyday life, these findings would imply that auditory-only recognition of voices of, for example, TV presenters, is easier than the recognition of voices of radio speakers (who one has never seen speaking), given the same acoustic quality and total amount of exposure. Furthermore, it was shown that not only voice recognition is improved after audiovisual learning, but also recognizing what is said (von Kriegstein et al. 2008b); previous voice–face video training improved the recognition of words in auditory-only sentences more than a matched control training that did not involve faces (Figure 34.1). As in the previous studies, the sentences during the word recognition task were spoken by the same trained speakers but were different from the training sentences. In the following, we will term the behavioral improvement on auditory tasks after a voice–face training, relative to a matched control training, the “face benefit” (Figure 34.1; von Kriegstein et al. 2008b). Besides speaker-specific face benefits, there are speaker-independent face benefits in human auditory communication. For example, learning of foreign phonemes has been shown to be more efficient when training is performed with audiovisual videos of a talking face, in contrast to training without dynamic face information (Hardison 2003; Hazan et al. 2005; Hirata and Kelly, 2010). For visually well distinguishable phonemes /b/ and /p/ the face benefit is higher than for the visually less salient speech sounds /l/ and /r/ (Hazan et al. 2005). This face benefit generalizes from the training speaker to other speakers, that is, listening to and watching the language teacher will also improve phoneme discrimination in auditory-only conditions for other speakers speaking that language. The studies on speaker-independent face benefits in phoneme recognition use relatively long training sessions (e.g., ca. 7 h in total for learning the consonants b, v, and p with videos of five speakers) before testing for differences between the auditory–visual and auditory-only training conditions
688
The Neural Bases of Multisensory Processes
Training (<2 min/speaker)
Test speech and speaker recognition Video
Voice–face learning
Daniel
Nico
Peter
Task instruction Speech
Speaker
Results (% correct)
“Face benefit”
Stimulus block 94% +
er
+
geht 82%
+
Peter
+
Daniel 2%
Voice–occupation learning
Symbol
Ingo
5%
Jan
Speech
Martin
Speaker
+
sie
+
rennt
+
Ingo
+
Martin
92%
77%
FIGURE 34.1 Example for experimental design. Subjects were first trained on voices and names of six different speakers. For three of these speakers, training was done with a voice–face video of the speaking person (voice–face training). For the three other speakers training was done with the voice and a symbol for the speaker’s occupation. In the subsequent test session, subjects performed a speech or speaker recognition task on blocks of sentences spoken by previously trained speakers. Results show mean % correct recognition over subjects. Face benefit is calculated as difference in performance after voice–face vs. voice–occupation training. (Adapted from von Kriegstein, K. et al., Proc. Natl. Acad. Sci. U.S.A. 105, 6747–6752, 2008b.)
with phonemes spoken by a different set of speakers. In contrast, the speaker-specific face benefits have been shown to develop very quickly. For example, Sheffert and Olson trained their subjects with ca. 50 words from each of five speakers. Further studies showed that less than 2 min of training per speaker already resulted in a significant face benefit [i.e., 9% for speaker recognition in the study of von Kriegstein and Giraud 2006, and ca. 5% (speaker)/2%(speech) in the report of von Kriegstein et al. 2008b]. Note, however, that the brief exposure times required for speaker-specific face benefits seem to have their lower limits. Speaker recognition ability has been investigated after presentation of only one sentence (mean duration 15 syllables/ca. 900 ms) and after multiple repetitions of a sentence (45 syllables/ca. 2.7 s) (Cook and Wilding 1997, 2001). For the one-sentence condition, voice recognition was actually worse after voice–face exposure (in contrast to voice-only exposure). For the three-sentence condition, voice recognition was the same after voice–face exposure (in contrast to voice-only exposure). Thus, the beneficial effect of voice–face training for voice recognition in auditory-only conditions seems to occur somewhere between 3 s and 2 min of training (Cook and Wilding 1997, 2001; von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b). 34.4.1.2 Is the Face Benefit Caused by Greater Attention during Voice–Face Learning? One simple explanation for the face benefits could be that seeing people talking is much more exciting and attention-grabbing than just listening to the audio track, even if it is additionally accompanied by a visual symbol (Figure 34.1). This increase in attention during training with videos
A Multisensory Perspective on Human Auditory Communication
689
may lead to better performance during test in auditory-only conditions. However, there is strong evidence against this possibility. First, in the Sheffert and Olson (2004) study, subjects additionally performed an old/new recognition test on words spoken by the familiarized speakers or nonfamiliar speakers. If subjects paid more attention during the voice–face training (in contrast to the voiceonly training), they should remember words from the voice–face training condition better (than those from the voice-only training). However, there was no such difference in word memory for the two training conditions. Second, in the von Kriegstein and Giraud (2006) study, subjects were additionally trained to recognize ringtones of cell phones. In one condition, subjects were trained with videos of hands operating cell phones. In the control condition, subjects were trained with the brand names of cell phones. Subsequently, ringtone recognition was tested in an auditory-only condition. If training with videos was more attention-grabbing, then one would expect better recognition of ringtones after training with videos in contrast to after training with brand names. However, there was no such benefit for ringtone recognition. Third, the probably most compelling argument against an attentional effect is that the face benefits for speech and speaker recognition are behaviorally dissociable. This dissociability was shown in a study on developmental prosopagnosic subjects and controls (von Kriegstein et al. 2008b). Developmental prosopagnosia is a lifelong inability to recognize other people by their face (McConachie 1976; Behrmann and Avidan 2005; Duchaine and Nakayama 2005; Gruter et al. 2007). The perception of facial dynamics has been shown to be unimpaired (Lander et al. 2004). In our study (von Kriegstein et al. 2008b), we trained prosopagnosics and control subjects to associate six speakers’ voices with their names (see Figure 34.1). Training was done in two conditions. In one condition (voice–face), subjects learned via previously recorded auditory–visual videos of the speaker’s voice and face. In the control condition (voice–symbol), subjects learned by listening to the auditory track of the same videos and seeing a visual symbol for the occupation of the person. After training, all subjects were tested on two tasks in auditory-only conditions. In one task, subjects recognized the speakers by voice (speaker recognition), in the other task subjects recognized what was said (speech recognition). If the improvement in auditory-only conditions by prior voice–face training (i.e., the face benefit) depends on attention, one would expect that both groups have similar face benefits on the two tasks. This was not the case. Although prosopagnosics have a normal face benefit for speech recognition (as compared to controls), they had no face benefit for speaker recognition (which is different from controls). This means that the face benefit in speech recognition can be normal, whereas the face benefit in speaker recognition can be selectively impaired. It suggests that the face benefits in speech and speaker recognition rely on two distinct and specific mechanisms instead of one common attentional mechanism. I will explain what these mechanisms might be in terms of brain processes in Section 34.4.2. 34.4.1.3 Importance of a Common Cause for Rapid Learning Effects Multisensory learning improves unisensory recognition not only for human communication signals. For example, Seitz et al. (2006) trained subjects to detect visual motion within a visual dot pattern. There were two training conditions. In one condition, the visual motion was accompanied by moving sounds (audiovisual training). The other condition was a visual-only motion detection training. The audiovisual training resulted in better performance (as compared to the visual-only training) on motion detection in the visual-only test. The audiovisual training benefit occurred only if dots and sounds moved in the same direction but not if they moved in opposing directions (Kim et al. 2008; reviewed by Shams and Seitz 2008). These findings are compatible with the view that multisensory training is beneficial for unisensory tasks if information in each modality is based on a common cause, which has physically highly predictable consequences in the sensorium (von Kriegstein and Giraud 2006). For example, when an object is moving it produces consequences in the auditory and visual domain, which are not arbitrarily related, because they are caused by the same movement. Similarly if a foreign phoneme is learned in multisensory situations, then the vocal tract movements of the speaker cause the acoustic properties of the speech sound. They are not arbitrarily related either, because a certain speech sound is, at least in ecologically valid situations, caused
690
The Neural Bases of Multisensory Processes
by a specific vocal tract movement. This common cause results in similar and tightly correlated dynamics in the visual and auditory modality (Chandrasekaran et al. 2009). Not only movement has an expression in the visual and auditory modality, but also shape and other material properties (Lakatos et al. 1997; Smith et al. 2005). For example, voices give information about the physical characteristics of the speaker, such as body size, because the length of the vocal tract influences the timbre of the voice and is correlated with body size (Smith et al. 2005). In contrast, other auditory– visual events can be arbitrarily related. Ringtones and cell phones, for example, relate to a unique ecologically valid multimodal source, but their association is arbitrary. The visual appearance of the cell phone does not physically cause the characteristics of the ringtone and vice versa. We assume that the rapid acquisition of face benefits and multisensory learning benefits occurs if the brain can exploit already existing knowledge about the relationship between auditory and visual modalities (von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b). This would explain why there are rapid learning benefits when auditory and visual information is tightly correlated, whereas there are no such rapid learning benefits when they are arbitrarily related (Seitz et al. 2006; von Kriegstein and Giraud 2006; Kim et al. 2008).
34.4.2 Auditory–Visual Model for Human Auditory Communication There are several types of mechanisms that have been suggested to account for the behavioural benefits in unisensory conditions after multisensory learning. The conventional view (“auditoryonly model”) would assume that the brain uses auditory-only processing capabilities for the sensory analysis of auditory information. In this case, the face benefits (or other multisensory learning benefits) could be explained by an increase in effectiveness of sensory processing in auditory areas. Such a mechanism has been suggested previously (Seitz and Dinse 2007; Shams and Seitz 2008), but to my knowledge has not been tested in detail yet. In contrast, the “audiovisual model” assumes that the brain uses previously learned audiovisual speaker-specific information to improve recognition in auditory-only conditions (von Kriegstein et al. 2008b). In this view, even without visual input, face-processing areas use encoded knowledge about the visual orofacial kinetics of talking and simulate a speaker to make predictions about the trajectory of what is heard (Figure 34.2). This visual online simulation places helpful constraints on auditory perception to improve recognition by resolving auditory ambiguities. This model implies that (1) visual face processing areas are involved in auditory-only tasks and that (2) this involvement is behaviorally relevant. There is neuroimaging evidence in support of the audiovisual model and I will review this evidence in the following. 34.4.2.1 Visual Face Areas Are Behaviorally Relevant for Auditory Recognition Recent neuroimaging studies show that face-sensitive areas (STS and FFA) are involved in the recognition of auditory communication signals (von Kriegstein et al. 2005; von Kriegstein and Giraud 2006; von Kriegstein et al. 2006, 2008b). They suggest that the FFA is behaviorally relevant for auditory-only speaker recognition, and that the face-movement sensitive STS is behaviorally relevant for auditory-only speech recognition. 34.4.2.1.1 FFA and Speaker Recognition Several studies focused on the blood oxygen level dependent (BOLD) responses in the FFA during FFA speaker recognition in auditory-only conditions. The FFA is more activated if subjects perform (1) a speaker task (in contrast to a speech task) for personally familiar speakers (in contrast to nonfamiliar speakers (von Kriegstein et al. 2005, 2006); (2) a speaker task after voice–face learning (in contrast to a speaker task before voice–face learning (von Kriegstein and Giraud 2006); (3) a speaker task after voice–face learning (in contrast to a speaker task after a matched control learning) (von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b); (4) a speaker task in contrast to a speech task after voice–face learning (in contrast to the same contrast after a matched control learning (von Kriegstein et al. 2008b). In summary, FFA activation during auditory-only speaker
691
A Multisensory Perspective on Human Auditory Communication (a)
(b)
Training
After Training Speech recognition
Speech recognition Left (fronto)temporal
Vocal speech
Auditory structural analysis
Right m/a STS
Voice identity
Facial speech
(Left) pSTS
Left (fronto)temporal
Visual structural analysis
Face identity
Person recognition
(Left) pSTS Vocal speech
Facial speech
Auditory structural analysis
Right FFA
Right m/a STS
Voice identity
Visual structural analysis
Face identity
Right FFA
Person recognition
FIGURE 34.2 Audiovisual model for human communication. Schematic for processing of human communication signals during speech and speaker recognition. (a) Audiovisual input enters auditory and visual preprocessing areas. These feed into two distinct networks, which process speech and speaker information. This panel schematically depicts potential mechanism during voice–face training (see Figure 34.1) as well as areas potentially involved in this process. (b) Auditory-only input enters auditory preprocessing areas. For speech recognition, facial and vocal speech areas interact while engaging concurrently with higher levels of speech processing. Similarly, for speaker recognition, face and voice identity areas interact while engaging concurrently with higher levels of speaker identity processing. This panel schematically depicts potential mechanism during auditory testing after voice–face training (see Figure 34.1) as well as areas potentially involved in this process. Note that interactions between boxes do not imply direct anatomical connections and that boxes may represent more than one area, in particular for higher levels of speech and speaker recognition.
tasks is increased after prior voice–face experience and is task-specific. Activation of the FFA is higher if subjects are asked to recognize the speaker in contrast to recognizing what is said, even if the stimulus input for the two tasks is exactly the same. Figure 34.3 shows an example for FFA activity during speaker recognition after and before voice–face and voice–name learning. Note that in contrast to the increased FFA activation after voice–face learning, the auditory voice region in the right temporal lobe (TVA) shows similar activation increase for the two training conditions (Figure 34.3). This could be taken as indication against the view that face benefits can be explained by an increased effectiveness of auditory-only processing. Not only does the level of activation change after a brief voice–face learning, but also the functional connectivity of the FFA to other brain areas. When subjects recognize previously heard voices of nonfamiliar people, the FFA is functionally connected to a frontoparietal network (von Kriegstein and Giraud 2006). This pattern is similar to the connectivity pattern of the FFA, when subjects are instructed to vividly imagine faces without any meaningful sensory input besides the task instructions (Ishai et al. 2002; Mechelli et al. 2004). The connectivity changes dramatically after a brief voice–face training. After training, the functional connectivity of the FFA to the fronto parietal network is decreased. In contrast, connectivity between FFA and auditory voice-sensitive areas (TVA) increases (von Kriegstein and Giraud 2006). A similar pattern of connectivity between FFA and TVA can also be found during recognition of personally familiar speakers’ voices (von Kriegstein et al. 2005). The change in connectivity suggests that the FFA activation after voice–face training results from a different mechanism than before training or during task-instructed imagery. The more direct connectivity between FFA and TVA after voice–face learning is compatible with the hypothesis that auditory and visual areas interact already at stages of sensory analysis as suggested by the audiovisual model (von Kriegstein and Giraud 2006).
692
The Neural Bases of Multisensory Processes Speaker recognition in auditory-only conditions
Fusiform face area Signal change
Signal change
Right TVA
Voice–face
Voice–name
Before training
Voice–face
Voice–name
After training
FIGURE 34.3 Blood oxygen level dependent (BOLD) responses in voice- (left panel) and face-sensitive (right panel) areas after and before different types of audiovisual training. In this study, control training involved learning of voice–name associations (instead of voice–occupation symbol association displayed in Figure 34.1). Note that increase in activation in auditory voice areas (TVA) is similar for both training conditions. In contrast, responses in fusiform face area increase only after voice–face training, not after voice– name training. Signal change here refers to a contrast between speaker recognition and ringtone recognition (for details, see von Kriegstein and Giraud 2006).
34.4.2.1.2 Face-Movement Sensitive Posterior STS and Speech Recognition The face-movement sensitive posterior STS is also activated after a brief voice–face training (in contrast to a matched control training), but only if the subjects’ task is to recognize what has been said (in contrast to a speaker recognition task) (von Kriegstein et al. 2008b). In this study, the facemovement sensitive posterior STS has been located with visual stimuli only (Figure 34.4, blue) and has been shown to be distinct from STS areas that are involved in speech recognition in general (Figure 34.4, green). 34.4.2.1.3 FFA and Face-Movement Sensitive STS Play Distinct Roles in Auditory Recognition The task-specificity of the activation in FFA and face-movement sensitive STS suggests that these two regions serve different roles in auditory speech and speaker perception. If these roles are within the domain of sensory analysis, then one would expect that the amount of activation correlates positively with performance on auditory recognition tasks. Recent research confirms this (von Kriegstein et al. 2006; von Kriegstein et al. 2008b). It was found that subjects, who profit most from the prior voice–face training when they perform auditory-only tasks, have a high activation level in visual face-sensitive areas. Specifically, the activation level of the face-movement sensitive STS correlates positively with the across-subjects face benefits in speech recognition (Figure 34.4, red), whereas the activity in the FFA correlates positively with the across-subject face benefits in speaker recognition (von Kriegstein et al. 2008b). Furthermore, the behavioral dissociation of face benefits for speech and speaker recognition in auditory-only conditions (see Section 34.4.1.2) is paralleled by a neuroanatomical dissociation. In von Kriegstein et al.’s (2008) study, both prosopagnosics and controls had a positive correlation of the face benefit in speech recognition with the amount of STS activation. Controls also had a positive correlation of the face benefit in speaker recognition with the amount of FFA activation. In contrast, in prosopagnosics there was no positive correlation of the face benefit in speaker
A Multisensory Perspective on Human Auditory Communication
693
y = –45
y = –51
speech > object
speech > speaker
Speech task (face benefit) Visual face area localizer
FIGURE 34.4 (See color insert.) Face-sensitive left STS (blue) is located in regions of STS that are distinct from those that are responsive to auditory speech (green). Positive correlation of activity in STS with face benefit in speech task (red) overlaps with the face area (overlap in purple) but not with the auditory area (green) (for more details on specific contrasts used, see von Kriegstein et al. 2008b). y, MNI coordinate in anterior– posterior direction.
recognition with the amount of FFA activation. The behavioral and neuroanatomical dissociation is in accord with the audiovisual model (Figure 34.2). Speech and speaker recognition largely rest on two different sets of audiovisual correlations. Speech recognition is based predominantly on fast time-varying acoustic cues produced by the varying vocal tract shape (Fant 1960), and much of this is visible on the speaker’s face (Yehia et al. 1998). Conversely, speaker recognition uses predominantly very slowly varying properties of the speech signal such as the acoustic properties of the vocal tract length (Lavner et al. 2000). If the brain uses encoded visual information for processing auditory-only speech, the behavioral improvement that is induced by voice–face training (i.e., the face benefit) must be dissociable for speech and speaker recognition (von Kriegstein et al. 2008b).
34.4.3 A Multisensory Predictive Coding Framework for Auditory Communication Which computational mechanism may underlie the audiovisual model? The model posits that an internal simulation of facial features is instrumental in performing recognition tasks on auditoryonly speech. In this view, the brain internally represents a multisensory environment that enables robust sensory perception in auditory-only conditions. This is comparable to external face simulations used to improve speech recognition especially in the hearing-impaired; speech recognition during telephone conversations can be improved by external video simulations of an artificial
694
The Neural Bases of Multisensory Processes
“talking face” (Siciliano et al. 2002). Such external simulation helps hearing-impaired listeners to understand what is said. This creation of an artificial talking face uses a phoneme recognizer and a face synthesizer to recreate the facial movements based on the auditory input. The audiovisual model for auditory communication predicts that the human brain routinely uses a similar mechanism: Auditory-only speech processing and speaker recognition is improved by internal simulation of a talking face. How can such a model be explained in computational modeling terms? Recent theoretical neuroscientific work has suggested that recognition can be modeled using a predictive coding framework (Friston 2005). This framework assumes that efficient online recognition of sensory signals is accomplished by a cortical hierarchy that is tuned to the prediction of sensory signals. It is assumed that high levels of the hierarchy (i.e., further away from the sensory input) provide predictions about the representation of information at a lower level of the hierarchy (i.e., closer to the sensory input). Each level contains a forward or generative model for the causes of the sensory input and uses this model to generate predictions and constraints for the interpretation of the sensory input. Higher levels send predictions to the lower level, whereas the lower level sends prediction errors to the higher level. One prerequisite to make such a mechanism useful is that the brain learns regularities within the environment to efficiently predict the future sensory input. Furthermore, these regularities should be adaptable to allow for changes in the regularities of the environment. Therefore predictive coding theories have been formulated in a Bayesian framework. In this framework, predictions are based on previous sensory evidence and can have varying degrees of certainty. In visual and sensory–motor processing, “internal forward models” have been used to explain how the brain encodes complex sensory data by relatively few parameters (Wolpert et al. 1995; Knill et al. 1998; Rao and Ballard 1999; Bar 2007; Deneve et al. 2007). Although predictive coding theories usually emphasize interaction between high and low levels, a similar interaction might occur between sensory modalities. For example, the brain might use audiovisual forward models, which encode the physical, causal relationship between a person talking and its consequences for the visual and auditory input (von Kriegstein et al. 2008b). Critically, these models encode the causal dependencies between the visual and auditory trajectories. Perception is based on the “inversion” of models, that is, the brain identifies causes (e.g., Mr. Smith says “Hello”) that explain the observed audiovisual input best. The changes in behavioral performance after a brief voice–face experience suggest that the human brain can quickly and efficiently learn “a new person” by adjusting key parameters in existing internal audiovisual forward models. Once parameters for an individual person are learned, auditory speech processing is improved because the brain learned parameters of an audiovisual forward model with strong dependencies between internal auditory and visual trajectories. The use of these models is reflected in an increased activation of face-processing areas during auditory tasks. The audiovisual speaker model enables the system to simulate visual trajectories (via the auditory trajectories) when there is no visual input. The talking face simulation works best if the learned coupling between auditory and visual input is strong and veridical. The visual simulation is fed back to auditory areas thereby improving auditory recognition by providing additional constraints. This mechanism can be used iteratively until the inversion of the audiovisual forward model converges on a percept. In summary, this scheme suggests that forward models encode and exploit dependencies in the environment and are used to improve recognition in unisensory conditions by simulating the causes of the sensory input. Note that this chapter focuses on the visual part of this simulation process. It is currently unclear whether motor processes also play a role for this online simulation and whether the simulation proposed here has a relation to simulation accounts underlying the motor theory of speech perception (Fischer and Zwaan 2008; D’Ausilio et al. 2009).
34.5 CONCLUSIONS AND FUTURE DIRECTIONS In contrast to a modality-specific view on unimodal perception, recent research suggests that not only auditory areas but also visual face-sensitive areas are behaviorally relevant for the sensory
A Multisensory Perspective on Human Auditory Communication
695
analysis of auditory communication signals (von Kriegstein et al. 2005; von Kriegstein and Giraud 2006; von Kriegstein et al. 2006; von Kriegstein et al. 2008b). Speech recognition is supported by selective recruitment of the face-sensitive STS, which is known to be involved in orofacial movement processing (Puce et al. 1998; Thompson et al. 2007). Speaker recognition is supported by selective recruitment of the FFA, which is involved in face-identity processing (Eger et al. 2004; Rotshtein et al. 2005; Bouvier and Engel 2006). These findings challenge auditory-only models for speech processing, because they imply that during large parts of ecologically valid social interactions, not only auditory but also visual areas are involved to solve auditory tasks. For example, during a phone conversation with personally familiar people (e.g., friends or colleagues), face sensitive areas will be employed to optimally understand what the person is saying and to identify the other by his/her voice. The same applies to less familiar people, given a brief prior face-to-face interaction. The results have been explained by an audiovisual model couched in a predictive coding framework (von Kriegstein and Giraud 2006; von Kriegstein et al. 2008b). This model assumes that the brain routinely simulates talking faces in response to auditory input and that this internal audio visual simulation is used to actively predict and thereby constrain the possible interpretations of the auditory signal. This mechanism leads to improved recognition in situations where only the auditory modality is available. Whether the audiovisual simulation scheme is a general principle of how unisensory tasks are performed when one or more of the usual input modalities are missing is unclear. I assume that the same principle also applies to other voice–face information that is correlated in the auditory and visual domains, such as recognition of emotion from voice and face (de Gelder and Vroomen 1995; Massaro and Egan 1996). Furthermore, the principle might even be applicable to noncommunication sensory signals with a (veridical or illusionary) common cause such as the recognition of movement trajectories of computer-animated dot patterns and moving sound sources (Seitz et al. 2006). Neuroscientific research has focused on responses in visual sensory areas in auditory-only conditions after a brief voice–face sensory experience. However, visual sensory areas could also play a role for speakers for which no specific voice–face sensory experience was made. For example, the speaker-independent effect of foreign phoneme training (Hardison 2003; Hazan et al. 2005; Hirata and Kelly, 2010) could be based on extrapolating the speaker-specific face model to other speakers. Similar mechanisms might occur during development of the speech perception system in children. The use of internal face models for speech and speaker recognition might be especially important in situations where there is uncertainty about the input modality. There are multiple sources for uncertainty in human auditory communication. For example, a low level of experience with a second language will likely result in a high level of uncertainty about the trajectory of the speech signal. Furthermore, a high level of background noise will result in a high level of uncertainty about the speech input. The use of an internal face simulation mechanism could increase robustness of perception in these situations.
REFERENCES Abrams, D. A., T. Nicol, S. Zecker, and N. Kraus. 2008. Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J Neurosci 28: 3958–3965. American Standards Association. 1960. Acoustical Terminology SI. New York: Association AS. Arnal, L. H., B. Morillon, C. A. Kell, and A. L. Giraud. 2009. Dual neural routing of visual facilitation in speech processing. J Neurosci 29: 13445–13453. Ay, N., J. Flack, and D. C. Krakauer. 2007. Robustness and complexity co-constructed in multimodal signalling networks. Philos Trans R Soc Lond B Biol Sci 362: 441–447. Bar, M. 2007. The proactive brain: Using analogies and associations to generate predictions. Trends Cogn Sci 11: 280–289. Baron-Cohen, S., H. A. Ring, E. T. Bullmore, S. Wheelwright, C. Ashwin, and S. C. R. Williams. 2000. The amygdala theory of autism. Neurosci Biobehav Rev 24: 355–364. Barsalou, L. W. 2008. Grounded cognition. Annu Rev Psychol 59: 617–645.
696
The Neural Bases of Multisensory Processes
Behrmann, M., and G. Avidan. 2005. Congenital prosopagnosia: Face-blind from birth. Trends Cogn Sci 9:180–187. Belin, P., S. Fecteau, and C. Bedard. 2004. Thinking the voice: Neural correlates of voice perception. Trends Cogn Sci 8: 129–135. Belin, P., R. J. Zatorre, and P. Ahad. 2002. Human temporal-lobe response to vocal sounds. Brain Res Cogn Brain Res 13: 17–26. Belin, P., R. J. Zatorre, P. Lafaille, P. Ahad, and B. Pike. 2000. Voice-selective areas in human auditory cortex. Nature 403: 309–312. Besle, J., C. Fischer, A. Bidet-Caulet, F. Lecaignard, O. Bertrand, and M. H. Giard. 2008. Visual activation and audiovisual interactions in the auditory cortex during speech perception: Intracranial recordings in humans. J Neurosci 28: 14301–14310. Bizley, J. K., K. M. Walker, B. W. Silverman, A. J. King, and J. W. Schnupp. 2009. Interdependent encoding of pitch, timbre, and spatial location in auditory cortex. J Neurosci 29: 2064–2075. Boatman, D., R. P. Lesser, and B. Gordon. 1995. Auditory speech processing in the left temporal lobe: An electrical interference study. Brain Lang 51: 269–290. Boatman, D. F., R. P. Lesser, N. E. Crone, G. Krauss, F. A. Lenz, and D. L. Miglioretti. 2006. Speech recognition impairments in patients with intractable right temporal lobe epilepsy. Epilepsia 47: 1397–1401. Boemio, A., S. Fromm, A. Braun, and D. Poeppel. 2005. Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat Neurosci 8: 389–395. Bouvier, S. E., and S. A. Engel. 2006. Behavioral deficits and cortical damage loci in cerebral achromatopsia. Cereb Cortex 16: 183–191. Bruce, V., and A. Young. 1986. Understanding face recognition. Br J Psychol 77: 305–327. Burton, A. M., V. Bruce, and P. J. B. Hancock. 1999. From pixels to people: A model of familiar face recognition. Cogn Sci 23: 1–31. Calder, A. J., and A. W. Young. 2005. Understanding the recognition of facial identity and facial expression. Nat Rev Neurosci 6: 641–651. Calvert, G. A., E. T. Bullmore, M. J. Brammer, R. Campbell, S. C. Williams, P. K. McGuire, P. W. Woodruff, S. D. Iversen, and A. S. David. 1997. Activation of auditory cortex during silent lipreading. Science 276: 593–596. Campanella, S., and P. Belin. 2007. Integrating face and voice in person perception. Trends Cogn Sci 11: 535–543. Campbell, R., J. Zihl, D. Massaro, K. Munhall, and M. M. Cohen. 1997. Speechreading in the akinetopsic patient, L.M. Brain 120 (Pt 10): 1793–1803. Chandrasekaran, C., A. Trubanova, S. Stillittano, A. Caplier, and A. A. Ghazanfar. 2009. The natural statistics of audiovisual speech. PLoS Comput Biol 5: e1000436. Clopper, C. G., and D. B. Pisoni. 2004. Some acoustic cues for the perceptual categorization of American English regional dialects. J Phon 32: 111–140. Cook, S., and J. Wilding. 1997. Earwitness testimony: 2. Voices, faces and context. Appl Cogn Psychol 11: 527–541. Cook, S., and J. Wilding. 2001. Earwitness testimony: Effects of exposure and attention on the face overshadowing effect. Br J Psychol 92: 617–629. D’Ausilio, A., F. Pulvermuller, P. Salmas, I. Bufalari, C. Begliomini, and L. Fadiga. 2009. The motor somatotopy of speech perception. Curr Biol 19: 381–385. de Gelder, B., and J. Vroomen. 1995. The perception of emotions by ear and by eye, 289–311. Los Angeles: Psychology Press. Deneve, S., J. R. Duhamel, and A. Pouget. 2007. Optimal sensorimotor integration in recurrent cortical networks: A neural implementation of Kalman filters. J Neurosci 27: 5744–5756. Duchaine, B., and K. Nakayama. 2005. Dissociations of face and object recognition in developmental proso pagnosia. J Cogn Neurosci 17: 249–261. Duchaine, B. C., H. Parker, and K. Nakayama. 2003. Normal recognition of emotion in a prosopagnosic. Perception 32: 827–838. Eger, E., P. G. Schyns, and A. Kleinschmidt. 2004. Scale invariant adaptation in fusiform face-responsive regions. Neuroimage 22: 232–242. Ellis, H. D., D. M. Jones, and N. Mosdell. 1997. Intra- and inter-modal repetition priming of familiar faces and voices. Br J Psychol 88 (Pt 1): 143–156. Fant, G. 1960. Acoustic theory of speech production. Paris: Mouton. Fischer, M. H., and R. A. Zwaan. 2008. Embodied language: A review of the role of the motor system in language comprehension. Q J Exp Psychol (Colchester) 61: 825–850.
A Multisensory Perspective on Human Auditory Communication
697
Fox, C. J., S. Y. Moon, G. Iaria, and J. J. Barton. 2009. The correlates of subjective perception of identity and expression in the face network: An fMRI adaptation study. Neuroimage 44: 569–580. Friederici, A. D. 2002. Towards a neural basis of auditory sentence processing. Trends Cogn Sci 6: 78–84. Friston, K. 2005. A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci 360: 815–836. Gainotti, G., A. Barbier, and C. Marra. 2003. Slowly progressive defect in recognition of familiar people in a patient with right anterior temporal atrophy. Brain 126: 792–803. Gick, B., and D. Derrick. 2009. Aero-tactile integration in speech perception. Nature 462: 502–504. Giraud, A. L., A. Kleinschmidt, D. Poeppel, T. E. Lund, R. S. Frackowiak, and H. Laufs. 2007. Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56: 1127–1134. Grey, J. M. 1977. Multidimensional perceptual scaling of musical timbres. J Acoust Soc Am 61: 1270–1277. Griffiths, T. D., S. Uppenkamp, I. Johnsrude, O. Josephs, and R. D. Patterson. 2001. Encoding of the temporal regularity of sound in the human brainstem. Nat Neurosci 4: 633–637. Gruter, M., T. Gruter, V. Bell, P. W. Halligan, J. Horst, Sperling et al. 2007. Hereditary prosopagnosia: The first case series. Cortex 43: 734–749. Hall, D. A., C. Fussell, and A. Q. Summerfield. 2005. Reading fluent speech from talking faces: Typical brain networks and individual differences. J Cogn Neurosci 17: 939–953. Hardison, D. M. 2003. Acquisition of second-language speech: Effects of visual cues, context, and talker variability. Appl Psycholinguist 24: 495. Hauk, O., Y. Shtyrov, and F. Pulvermuller. 2008. The time course of action and action–word comprehension in the human brain as revealed by neurophysiology. J Physiol Paris 102: 50–58. Haxby, J. V., E. A. Hoffman, and M. I. Gobbini. 2000. The distributed human neural system for face perception. Trends Cogn Sci 4: 223–233. Haxby, J. V., E. A. Hoffman, and M. I. Gobbini. 2002. Human neural systems for face recognition and social communication. Biol Psychiatry 51: 59–67. Hazan, V., A. Sennema, M. Iba, and A. Faulkner. 2005. Effect of audiovisual perceptual perception and production of training on the consonants by Japanese learners of English. Speech Commun 47: 360–378. Hickok, G., and D. Poeppel. 2000. Towards a functional neuroanatomy of speech perception. Trends Cogn Sci 4: 131–138. Hickok, G., and D. Poeppel. 2007. The cortical organization of speech processing. Nat Rev Neurosci 8: 393–402. Hirata, Y., and S. D. Kelly. 2010. Effects of lips and hands on auditory learning of second language speech sounds. Lang Hear Res 2: 298–310. Humphreys, G. W., N. Donnelly, and M. J. Riddoch. 1993. Expression is computed separately from facial identity, and it is computed separately for moving and static faces: Neuropsychological evidence. Neuropsychologia 31: 173–181. Ishai, A., J. V. Haxby, and L. G. Ungerleider. 2002. Visual imagery of famous faces: Effects of memory and attention revealed by fMRI. Neuroimage 17: 1729–1741. Iverson, P., and C. L. Krumhansl. 1993. Isolating the dynamic attributes of musical timbre. J Acoust Soc Am 94: 2595–2603. Johnson, W. F., R. N. Emde, K. R. Scherer, and M. D. Klinnert. 1986. Recognition of emotion from vocal cues. Arch Gen Psychiatry 43: 280–283. Kanwisher, N., J. McDermott, and M. M. Chun. 1997. The fusiform face area: A module in human extrastriate cortex specialized for face perception. J Neurosci 17: 4302–4311. Kim, R. S., A. R. Seitz, and L. Shams. 2008. Benefits of stimulus congruency for multisensory facilitation of visual learning. PLoS ONE 3: e1532. Kleinhans, N. M., L. C. Johnson, T. Richards, R. Mahurin, J. Greenson, G. Dawson et al. 2009. Reduced neural habituation in the amygdala and social impairments in autism spectrum disorders. Am J Psychiatry 166: 467–475. Knill, D., D. Kersten, A. Yuille, and W. Richards. 1998. Introduction: A Bayesian formulation of visual perception. In Perception as Bayesian Inference, 1–21. Cambridge, MA: Cambridge Univ. Press. Lakatos, S., S. McAdams, and R. Causse. 1997. The representation of auditory source characteristics: Simple geometric form. Percept Psychophys 59: 1180–1190. Lander, K., G. Humphreys, and V. Bruce. 2004. Exploring the role of motion in prosopagnosia: Recognizing, learning and matching faces. Neurocase 10: 462–470. Lang, C. J., O. Kneidl, M. Hielscher-Fastabend, and J. G. Heckmann. 2009. Voice recognition in aphasic and non-aphasic stroke patients. J Neurol 256: 1303–1306. Lavner, Y., I. Gath, and J. Rosenhouse. 2000. The effects of acoustic modifications on the identification of familiar voices speaking isolated vowels. Speech Commun 30: 9–26.
698
The Neural Bases of Multisensory Processes
Leff, A. P., T. M. Schofield, K. E. Stephan, J. T. Crinion, K. J. Friston, and C. J. Price. 2008. The cortical dynamics of intelligible speech. J Neurosci 28: 13209–13215. Liberman, A. M., and I. G. Mattingly. 1985. The motor theory of speech perception revised. Cognition 21: 1–36. Marslen-Wilson, W. D., and L. K. Tyler. 2007. Morphology, language and the brain: The decompositional substrate for language comprehension. Philos Trans R Soc Lond B Biol Sci 362: 823–836. Martin, A., and A. Caramazza. 2003. Neuropsychological and neuroimaging perspectives on conceptual knowledge: An introduction. Cogn Neuropsychol 20: 195–212. Massaro, D. W., and P. B. Egan. 1996. Perceiving affect from the voice and the face. Psychon Bull Rev 3: 215–221. McAdams, S., S. Winsberg, S. Donnadieu, G. Desoete, and J. Krimphoff. 1995. Perceptual scaling of synthesized musical timbres—Common dimensions, specificities, and latent subject classes. Psychol Res Psychol Forsch 58: 177–192. McConachie, H. R. 1976. Developmental prosopagnosia. A single case report. Cortex 12: 76–82. Mechelli, A., C. J. Price, K. J. Friston, and A. Ishai. 2004. Where bottom-up meets top-down: Neuronal interactions during perception and imagery. Cereb Cortex 14: 1256–1265. Menon, V., D. J. Levitin, B. K. Smith, A. Lembke, B. D. Krasnow, D. Glazer et al. 2002. Neural correlates of timbre change in harmonic sounds. Neuroimage 17: 1742–1754. Nelken, I., and O. Bar-Yosef. 2008. Neurons and objects: The case of auditory cortex. Front Neurosci 2: 107–113. Neuner, F., and S. R. Schweinberger. 2000. Neuropsychological impairments in the recognition of faces, voices, and personal names. Brain Cogn 44: 342–366. O’Toole, A. J., D. A. Roark, and H. Abdi. 2002. Recognizing moving faces: A psychological and neural synthesis. Trends Cogn Sci 6: 261–266. Obleser, J., and F. Eisner. 2008. Pre-lexical abstraction of speech in the auditory cortex. Trends Cogn Sci 13(1): 14–19. Overath, T., S. Kumar, K. von Kriegstein, and T. D. Griffiths. 2008. Encoding of spectral correlation over time in auditory cortex. J Neurosci 28: 13268–13273. Patterson, R. D., S. Uppenkamp, I. S., Johnsrude, and T. D. Griffiths. 2002. The processing of temporal pitch and melody information in auditory cortex. Neuron 36: 767–776. Pekkola, J., V. Ojanen, T. Autti, I. P. Jaaskelainen, R. Mottonen et al. 2005. Primary auditory cortex activation by visual speech: An fMRI study at 3 T. Neuroreport 16: 125–128. Pelphrey, K. A., J. P. Morris, C. R. Michelich, T. Allison, and G. McCarthy. 2005. Functional anatomy of biological motion perception in posterior temporal cortex: An FMRI study of eye, mouth and hand movements. Cereb Cortex 15: 1866–1876. Penagos, H., J. R. Melcher, and A. J. Oxenham. 2004. A neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging. J Neurosci 24: 6810–6815. Pitcher, D., L. Garrido, V. Walsh, and B. C. Duchaine. 2008. Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions. J Neurosci 28: 8929–8933. Poeppel, D. 2003. The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time.’ Speech Commun 41: 245–255. Price, C., G. Thierry, and T. Griffiths. 2005. Speech-specific auditory processing: Where is it? Trends Cogn Sci 9: 271–276. Price, C. J. 2000. The anatomy of language: Contributions from functional neuroimaging. J Anat 197(Pt 3): 335–359. Puce, A., T. Allison, S. Bentin, J. C. Gore, and G. McCarthy. 1998. Temporal cortex activation in humans viewing eye and mouth movements. J Neurosci 18: 2188–2199. Pulvermuller, F., M. Huss, F. Kherif, F. M. D. P. Martin, O. Hauk, and Y. Shtyrov. 2006. Motor cortex maps articulatory features of speech sounds. Proc Natl Acad Sci U S A 103: 7865–7870. Rao, R. P., and D. H. Ballard. 1999. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2: 79–87. Ross, L. A., D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe. 2007. Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex 17: 1147–1153. Rotshtein, P., R. N. Henson, A. Treves, J. Driver, and R. J. Dolan. 2005. Morphing Marilyn into Maggie dissociates physical and identity face representations in the brain. Nat Neurosci 8: 107–113. Sathian, K., A. Zangaladze, J. M. Hoffman, and S. T. Grafton. 1997. Feeling with the mind’s eye. Neuroreport 8: 3877–3881.
A Multisensory Perspective on Human Auditory Communication
699
Scherer, K. R. 1986. Vocal affect expression: A review and a model for future research. Psychol Bull 99: 143–165. Schweinberger, S. R., D. Robertson, and J. M. Kaufmann. 2007. Hearing facial identities. Q J Exp Psychol (Colchester) 60: 1446–1456. Scott, S. K. 2005. Auditory processing—Speech, space and auditory objects. Curr Opin Neurobiol 15: 197–201. Scott, S. K., C. McGettigan, and F. Eisner. 2009. A little more conversation, a little less action—Candidate roles for the motor cortex in speech perception. Nat Rev Neurosci 10: 295–302. Scott, S. K., C. C. Blank, S. Rosen, and R. J. Wise. 2000. Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123(Pt 12): 2400–2406. Seitz, A. R., and H. R. Dinse. 2007. A common framework for perceptual learning. Curr Opin Neurobiol 17: 148–153. Seitz, A. R., R. Kim, and L. Shams. 2006. Sound facilitates visual learning. Curr Biol 16: 1422–1427. Sergent, J., S. Ohta, and B. MacDonald. 1992. Functional neuroanatomy of face and object processing. A positron emission tomography study. Brain 115(Pt 1): 15–36. Shams, L., and A. R. Seitz. 2008. Benefits of multisensory learning. Trends Cogn Sci 12: 411–417. Sheffert, S. M., and E. Olson. 2004. Audiovisual speech facilitates voice learning. Percept Psychophys 66: 352–362. Sheffert, S. M., D. B. Pisoni, J. M. Fellowes, and R. E. Remez, 2002. Learning to recognize talkers from natural, sinewave, and reversed speech samples. J Exp Psychol Hum Percept Perform 28: 1447–1469. Siciliano, C., G. Williams, J. Beskow, and A. Faulkner. 2002. Evaluation of a multilingual synthetic talking face as a communication aid for the hearing-impaired. Speech Hear Lang Work Prog 14: 51–61. Smith, D. R. R., R. D. Patterson, R. Turner, H. Kawahara, and T. Irino. 2005. The processing and perception of size information in speech sounds. J Acoust Soc Am 117: 305–318. Smith, E. L., M. Grabowecky, and S. Suzuki. 2007. Auditory–visual crossmodal integration in perception of face gender. Curr Biol 17: 1680–1685. Sumby, W. H., and I. Pollack. 1954. Visual contribution to speech intelligibility in noise. J Acoust Soc Am 26: 212–215. Thomas, E. R., and J. Reaser. 2004. Delimiting perceptual cues used for the ethnic labeling of African American and European American voices. J Socioling 8: 54–87. Thompson, J. C., M. Clarke, T. Stewart, and A. Puce. 2005. Configural processing of biological motion in human superior temporal sulcus. J Neurosci 25: 9059–9066. Thompson, J. C., J. E. Hardee, A. Panayiotou, D. Crewther, and A. Puce. 2007. Common and distinct brain activation to viewing dynamic sequences of face and hand movements. Neuroimage 37: 966–973. Tsukiura, T., H. Mochizuki-Kawai, and T. Fujii. 2006. Dissociable roles of the bilateral anterior temporal lobe in face–name associations: An event-related fMRI study. Neuroimage 30: 617–626. Van Lancker, D. R., and J. G. Canter. 1982. Impairment of voice and face recognition in patients with hemispheric damage. Brain Cogn 1: 185–195. Van Lancker, D. R., J. Kreiman, and J. Cummings. 1989. Voice perception deficits: Neuroanatomical correlates of phonagnosia. J Clin Exp Neuropsychol 11: 665–674. van Wassenhove, V., K. W. Grant, and D. Poeppel. 2005. Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci U S A 102: 1181–1186. Vigneau, M., V. Beaucousin, P. Y. Herve, H. Duffau, F. Crivello, O. Houde et al. 2006. Meta-analyzing left hemisphere language areas: phonology, semantics, and sentence processing. Neuroimage 30: 1414–1432. von Kriegstein, K., and A. L. Giraud. 2004. Distinct functional substrates along the right superior temporal sulcus for the processing of voices. Neuroimage 22: 948–955. von Kriegstein, K., and A. L. Giraud. 2006. Implicit iultisensory associations influence voice recognition. PLoS Biol 4: e326. von Kriegstein, K., A. Kleinschmidt, and A. L. Giraud. 2006. Voice recognition and cross-modal responses to familiar speakers’ voices in prosopagnosia. Cereb Cortex 16: 1314–1322. von Kriegstein, K., R. D. Patterson, and T. D. Griffiths. 2008a. Task-dependent modulation of medial geniculate body is behaviorally relevant for speech recognition. Curr Biol 18: 1855–1859. von Kriegstein, K., E. Eger, A. Kleinschmidt, and A. L. Giraud. 2003. Modulation of neural responses to speech by directing attention to voices or verbal content. Brain Res Cogn Brain Res 17: 48–55. von Kriegstein, K., A. Kleinschmidt, P. Sterzer, and A. L. Giraud. 2005. Interaction of face and voice areas during speaker recognition. J Cogn Neurosci 17: 367–376. von Kriegstein, K., D. R. Smith, R. D. Patterson, D. T. Ives, and T. D. Griffiths. 2007. Neural representation of auditory size in the human voice and in sounds from other resonant sources. Curr Biol 17: 1123–1128.
700
The Neural Bases of Multisensory Processes
von Kriegstein, K., O. Dogan, M. Gruter, A. L. Giraud, C. A. Kell, T. Gruter et al. 2008b. Simulation of talking faces in the human brain improves auditory speech recognition. Proc Natl Acad Sci U S A 105: 6747–6752. von Kriegstein, K., D. R. Smith, R. D. Patterson, S. J. Kiebel, and T. D. Griffiths. 2010. How the human brain recognizes speech in the context of changing speakers. J Neurosci 30: 629–638. Warren, J. D., A. R. Jennings, and T. D. Griffiths. 2005. Analysis of the spectral envelope of sounds by the human brain. Neuroimage 24: 1052–1057. Watkins, K. E., A. P. Strafella, and T. Paus. 2003. Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia 41: 989–994. Wolpert, D. M., Z. Ghahramani, and M. I. Jordan. 1995. An internal model for sensorimotor integration. Science 269: 1880–1882. Yehia, H., P. Rubin, and E. Vatikiotis-Bateson. 1998. Quantitative association of vocal-tract and facial behavior. Speech Commun 26:23–43. Young, A. W., F. Newcombe, E. H. de Haan, M. Small, and D. C. Hay. 1993. Face perception after brain injury. Selective impairments affecting identity and expression. Brain 116 (Pt 4): 941–959. Zangaladze, A., C. M. Epstein, S. T. Grafton, and K. Sathian. 1999. Involvement of visual cortex in tactile discrimination of orientation. Nature 401: 587–590.
Section IX Naturalistic Multisensory Processes: Flavor
35
Multimodal Chemosensory Interactions and Perception of Flavor John Prescott
CONTENTS 35.1 Introduction........................................................................................................................... 703 35.2 Chemosensory Interactions and Integration..........................................................................704 35.3 Associative Learning and Integration................................................................................... 705 35.4 Cross-Modal Chemosensory Binding.................................................................................... 706 35.5 Attentional Processes in Binding.......................................................................................... 708 35.6 Analysis and Synthesis in Perception of Flavor.................................................................... 708 35.7 Investigating Cognitive Processes in Flavor Perception........................................................ 710 35.8 Hedonic Implications of Chemosensory Integration . .......................................................... 712 References....................................................................................................................................... 714
35.1 INTRODUCTION Writing in the early nineteenth century, the gastronomic pioneer, Brillat-Savarin was “tempted to believe that smell and taste are in fact but a single sense, whose laboratory is the mouth and whose chimney is the nose” (Brillat-Savarin 1825). Much of the subsequent history of perception research in the chemical senses has, in contrast, been characterized by a focus on discrete sensory channels, and their underlying anatomy and physiology. However, there has recently been renewed interest in examining flavor as a functional perceptual system. This has been borne to some extent out of a realization that in our everyday food experiences, we respond, perceptually and hedonically, not to discrete tastes, odors, and tactile sensations, but to flavors constructed from a synthesis of these sensory signals (Prescott 2004b). This refocus regarding flavor is very much in line with the ecological approach to perception that had been advocated by Gibson (1966). Gibson argued that the primary purpose of perception is to seek out objects in our environment, particularly those that are biologically important. As such, the physiological origin of sensory information is less salient than that the information can be used in object identification. Effectively, then, the key to successful perception is that sensory information is interpreted as qualities that belong to the object itself. Within this context, flavor can be seen as a functionally distinct sense that is cognitively “constructed” from the integration of distinct physiologically defined sensory systems (such as olfaction and gustation) that are “functionally united when anatomically separated” (Gibson 1966, p. 137) in order to identify and respond to objects that are important to our survival, namely, foods.
703
704
The Neural Bases of Multisensory Processes
35.2 CHEMOSENSORY INTERACTIONS AND INTEGRATION Cross-modal sensory integration is frequently inferred from the influence of one modality on responses to another. Commonly, this is an enhanced (sometimes supra-additive) response to information from one sensory system due to concurrent input from another modality (Calvert et al. 1999). For example, in a noisy environment, speech comprehension is improved if we see the speaker’s lip movements (Sumby and Polack 1954). Even information that is irrelevant to a task enhances neural response to task-relevant stimuli and augments behavioral performance (Stein et al. 1988). There is similarly evidence that tastes and odors, when encoded together as a flavor, interact to modify the perception of one another. The most obvious expression of odor–taste interactions is the widely observed attribution of qualities that are more usually associated with basic taste qualities to odors (Burdach et al. 1984). When asked to describe the odor of caramel or vanilla, most people will use the term “sweetsmelling”; similarly, “sour” is used for the odor of vinegar (Stevenson and Boakes 2004). In one descriptive analysis of characteristics of a wide range of odors (Dravnieks 1985), 65% of assessors gave “sweetness” as an appropriate descriptor for the odor of vanillin, whereas 33% described the odor of hexanoic acid as being sour. These descriptions appear to have many of the qualities of synesthesia, in which a stimulus in one sensory modality reliably elicits a consistent corresponding stimulus in another modality (Martino and Marks 2001; Stevenson et al. 1998). Whereas in other modalities, synesthesia is a relatively uncommon event, the possession of taste properties by odors is almost universal, particularly in the case of commonly consumed foods. In fact, for some odors, taste qualities may represent the most consistent description used. Stevenson and Boakes (2004) reported data showing that, over repeat testing, ratings of taste descriptors for odors (e.g., sweetness of banana odor) were at least as reliable as ratings of the primary quality of the odor (i.e., banana). This commonplace phenomenon could be dismissed as merely imprecise language (since highly specific odor descriptors are elusive) or even metaphor, given that the odor name is likely to refer to an object, which might also be sweet or sour. However, there are measurable consequences of such odor taste qualities, in that these odors, when added to tastants in solution, can modify the taste intensity. The most frequent finding is the ability of food odors such as strawberry or vanilla to enhance the sweetness of sucrose solutions (Frank and Byram 1988; Frank et al. 1989). This phenomenon is both taste- and odor-specific. For example, the sweet-smelling odor of strawberry will enhance a sweet taste, but the odor of bacon will not. Conversely, a nonsweet taste, for example, saltiness, will not be enhanced by strawberry (Frank and Byram 1988). Stevenson et al. (1999) showed that the smelled sweetness of an odorant was the best predictor of that odorant’s ability to enhance a sweet taste when they were presented together in solution. Similarly, the ability of food odors to enhance saltiness in solution has been shown as highly correlated with the extent to which the foods themselves were judged to be salty (Lawrence et al. 2009). Subsequently, these findings were extended by studies showing that odors added to tastants can also suppress taste intensity. Prescott (1999) found that odors judged to be low in smelled sweetness (peanut butter, oolong tea) suppressed sweetness when added to sucrose in solution, in contrast to raspberry odor, which enhanced it. Stevenson et al. (1999) reported that sweet-smelling caramel odor not only enhanced the sweetness of sucrose in solution but also suppressed the sourness of a citric acid solution. Importantly, this latter effect parallels the pattern of interactions seen with binary taste mixtures, in that the addition of sucrose would similarly suppress the sourness of citric acid. Such findings provide evidence that odor taste properties reflect a genuine perceptual phenomenon. The ability of odors possessing smelled taste qualities to influence tastes has also been demonstrated in paradigms using measures other than intensity ratings. Dalton et al. (2000) assessed orthonasal (sniffed) detection thresholds for the odorant benzaldehyde, which has a cherry/almond quality, while subjects held a sweet taste (saccharin) in the mouth. Detection thresholds for the odor was significantly reduced compared with benzaldehyde alone, or in combination with either water
Multimodal Chemosensory Interactions and Perception of Flavor
705
or a nonsweet taste (monosodium glutamate, a savory quality). The most plausible interpretation of these findings is that the smelled sweetness of benzaldehyde and tasted sweetness of saccharin were being integrated at subthreshold levels. Similar odor threshold effects have also been found using a somewhat different experimental protocol in which both the odorant and tastant were presented together in solution (Delwiche and Heffelfinger 2005). Reciprocal effects of odors on tastes are also found. These include increases in the detection accuracy of a sweet taste at around threshold in the presence of an orthonasally presented congruent odorant (strawberry) as compared to one that was not sweet (ham) (Djordjevic et al. 2004), as well as a similar effect using a priming procedure, in which the odorant preceded the taste presentation (Prescott 2004b), showing that a sweet-smelling odor produced a greater change in detectability, relative to no odor, than did another, nonsweet odorant. Similar priming effects at suprathreshold levels have been demonstrated behaviorally in a study in which subjects were asked to identify a taste quality sipped from a device that also simultaneously presented an odor—either congruent or incongruent—orthonasally. Speed of naming of tastes during presentation of congruent odor/taste pairs (sweet smelling cherry odor/sucrose; sour smelling grapefruit odor/citric acid) was faster relative to incongruent pairs (cherry odor/citric acid; grapefruit odor/sucrose), or neutral/control pairs (either butanol or no odor plus either sucrose or citric acid) (White and Prescott 2007).
35.3 ASSOCIATIVE LEARNING AND INTEGRATION The importance of taste-related odor properties for understanding sensory integration in flavors derives principally from the fact that these effects are thought only to arise once the odor and taste have been repeatedly experienced together as a mixture in the mouth, most typically in the context of foods or beverages. This process has been repeatedly demonstrated experimentally. Novel odors that have little or no smelled sweetness, sourness, or bitterness when sniffed take on these qualities when repeatedly paired in solution with sweet, sour, or bitter tastes, respectively (Prescott 1999; Stevenson et al. 1995, 1998, 1999; Yeomans et al. 2006, 2009). Recent studies have expanded these findings beyond associative relationships of odors with tastes. Thus, odors paired with high fat milks themselves became fattier smelling and were able to increase the perceived fattiness of the milks, when added to the milks subsequent to conditioning (Sundquist et al. 2006). Such acquired perceptual similarity has been seen as an example of a “learned synesthesia,” in which qualities in one sensory system (olfaction) are able to evoke qualities in another (taste) only as a result of their frequent co-occurrence (Stevenson et al. 1998). The nature of the change that the odor undergoes has been explained in terms of increasing congruency (similarity) with the taste, in that they possess qualities in common, as a result of coexposure (Frank et al. 1993; Schifferstein and Verlegh 1996). Hence, the sweetness of a taste such as sucrose is seen to be more congruent with the sweet-smelling odor of caramel than it is with the odor of bacon, which typically has no sweet smell. It is only after this coexposure that the odor enhances a (now) congruent taste (Prescott 1999; Stevenson et al. 1999). Thus, Frank et al. (1993) found that the degree of enhancement produced by an odor for a particular taste was significantly correlated with ratings of the perceived similarity or congruency of the odorant and tastant. This suggests, therefore, that whether an odor/taste combination is seen as congruent is dependent on prior association of the components as a combination. Given the associative origin of these effects in the context of foods and beverages, we might expect cross-cultural differences in the extent to which particular odors and tastes are judged as congruent. For example, the odor of pumpkin is likely to smell sweeter in those cultures where it is incorporated into desserts (e.g., United States) as compared to cultures where it is savoury. Consistent with this, it has been reported that French and Vietnamese vary in their judgments of odor/taste harmony—that is, the extent to which an odor and taste are seen as congruent (Nguyen et al. 2002). One explanatory model for these effects proposes that each experience of an odor always invokes a search of memory for prior encounters with that odor. If, in the initial experience of the odor, it
706
The Neural Bases of Multisensory Processes
was paired with a taste, a cross-modal configural stimulus—that is, a flavor—is encoded in memory. Subsequently sniffing the odor alone will evoke the most similar odor memory—the flavor— that will include both the odor and the taste component. Thus, for example, sniffing caramel odor activates memorial representations of caramel flavors, which includes a sweet taste component. This results either in perceptions of smelled taste properties such as sweetness or, in the case of a mixture, a perceptual combination of the memorial odor representation with the physically present taste in solution (Stevenson and Boakes 2004; Stevenson et al. 1998).
35.4 CROSS-MODAL CHEMOSENSORY BINDING In vision, aspects of a scene or object include features such as form, color, or movement, combined to form a coherent perception. The neural processing of form can be shown to be independent of that of color, but our perception is always that the two visual phenomena are bound seamlessly together. To understand flavor perception, it is similarly crucial to know the mechanisms responsible for binding tastes, odors, and tactile sensations into one coherent, cross-modal percept. Studies of interactions between visual, auditory, and somatosensory systems have demonstrated the importance of spatial and/or temporal contiguity in facilitating cross-modal sensory integration (Calvert et al. 1998; Driver and Spence 2000; Spence and Squire 2003). In flavors, the different stimulus elements are associated temporally. However, although both gustatory and somatosensory receptors are spatially located in the mouth, olfactory receptors are not. The question then arises of how odors become bound to taste and touch. Central to this process is the olfactory location illusion, in which the odor components of a food appear to originate in the mouth (Rozin 1982). Thus, we never have a sense that the oranginess of orange juice is being perceived within the nose, even if we are aware that it is an odor. This illusion is both strong and pervasive, despite the fact that we are frequently presented with evidence of the importance of the olfactory component in flavors, for example, through a blocked nose during a head cold. One common manifestation of this phenomenon is the interchangeability of chemosensory terms such as flavor and taste in common usage— that is, we routinely fail to make a distinction between olfactory and taste qualities within flavors. The location illusion itself may depend on both the spatial and temporal contiguity of the discrete sensory inputs. von Bekesy (1964) illustrated the likely importance of temporal factors as potential determinants of odor/taste integration by showing that the perceived location of an odor (mouth vs. nose) and the extent to which an odor and taste were perceived as one sensation or two could be manipulated by varying the time delay between the presentation of the odor and taste. With a time delay of zero (simultaneous presentation), the apparent locus of the odor was the back of the mouth and the odor/taste mixture was perceived as a single entity. When the odor preceded the taste, the sensation was perceived as originating in the nose (see Figure 35.1). Although this report is consistent with models of binding across other sensory modalities, von Bekesy (1964) did not provide sufficient details to judge the reliability of his conclusions. The number of other studies addressing this issue is also very limited. A demonstration that odor-induced taste enhancement can occur whether the odor is presented orthonasally or retronasally, providing that the odor and taste are presented simultaneously (Sakai et al. 2001) does suggest a key role for temporal synchrony in facilitating integration. Pfieffer et al. (2005) manipulated both spatial and temporal contiguity for the odor and taste while assessing the threshold for benzaldehyde odor (almond/cherry) in the presence of a subthreshold sweet taste, failing to find convincing evidence of their manipulations. However, a recent preliminary finding suggests that synchronicity judgments of odor and taste may be less sensitive to onset discrepancies than other multimodal stimulus pairs, including audiovisual stimuli, and odors and tastes, each paired with visual stimuli (Kobayakawa et al. 2009). One interpretation of such a finding, if confirmed, together with the data of Sakai et al. (2001), would be that odor–taste binding operates under less stringent requirements for spatiotemporal synchrony than multisensory integration within other sensory systems. In turn, binding under conditions in which there is a tolerance for asynchrony might reflect the high adaptive significance of chemosensory binding. Alternatively,
Multimodal Chemosensory Interactions and Perception of Flavor
707
Olfactory nerves Olfactory receptors Smell ahead in time Taste ahead
FIGURE 35.1 Temporal and spatial determinants of odor/taste integration. Combination of smell and taste into a single sensation. A varying time difference between stimuli moves locus of sensation from tip of the nose back to the throat and forward again to tip of the tongue. (Reprinted from von Bekesy, G., J. Appl. Physiol., 19, 369–373, 1964. Copyright, used with permission from The American Physiological Society.)
at least in the case of temporal asynchrony, congruency between the odor and taste may be crucial. Hence, it has been demonstrated that judgments of audiovisual asynchrony are more difficult when the different modalities are bound by a common origin (Spence and Parise 2010). The olfactory location illusion is effectively an equivalent phenomenon to the auditory/visual “ventriloquism effect” in that, like the ventriloquist’s voice, the location of the odor is captured by other sensory inputs. The extent to which either concurrent taste or somatosensation, or both, is chiefly responsible for the capture and referral of olfactory information to the oral cavity is not known. However, the somatosensory system is more strongly implicated since it provides more detailed spatial information than does taste (Lim and Green 2008). Moreover, in neuroimaging studies, odors that are available to bind with tastes—that is, those presented retronasally (via the mouth) —have been shown to activate the mouth area of the primary somatosensory cortex, whereas the same odors presented via the nose do not (Small et al. 2005). This distinction, which occurs even when subjects are unaware of route of stimulation, suggests a likely neural correlate of the binding process, and supports the idea that somatosensory input is the underlying mechanism. In fact, our tastes experiences may themselves be multimodal. Under most circumstances, taste and tactile sensations in the mouth are so well integrated that we cannot begin to disentangle them, and there is growing evidence that our everyday experiences of taste are themselves multisensory, in that they involve somatosensory input (Green 2003; Lim and Green 2008). Taste buds are innervated by somatosensory fibers (Whitehead et al. 1985) and various categories of somatosensory stimuli are also capable of inducing taste sensations. Thus, it has been noted that about 25% of fungiform papillae respond to tactile stimulation by fine wires with a taste quality (Cardello 1981). More recently, tastes have been shown to be elicited by heated and cooled probes placed on areas innervated by cranial nerves VII and IX, which subserve taste (Cruz and Green 2000), and by the application of the prototypical “pure” irritant, capsaicin, to circumvallate papillae (Green and Hayes 2003). Further evidence points to the ability of tactile stimulation to capture taste, presumably by providing superior spatial information and enhancing localization (Delwiche et al. 2000; Lim and Green 2008; Todrank and Bartoshuk 1991). Tactile information may therefore have an important role in binding tastes, perhaps together with odors, both to one another and to a physical stimulus such as a food. The binding of odors to tastes and tactile stimuli may also rely on processing information about the origins of odor stimulation. Orthonasally presented odors are more readily identified and have lower thresholds than the same odors presented retronasally via the mouth (Pierce and Halpern
708
The Neural Bases of Multisensory Processes
1996; Voirol and Daget 1986), and there is a strong suggestion that the two routes of stimulation are processed with some independence. Thus, neuroimaging studies show different activation patterns in cortical olfactory areas as a result of route of administration (Small et al. 2005). From an adaptive point of view, this makes sense. Olfaction has been described (Rozin 1982) as the only dual sense because it functions both to detect volatile chemicals in the air (orthonasal sniffing) and to classify objects in the mouth as foods or not, and each of these roles has unique adaptive significance. Since the mouth acts as the gateway to the gut, our chemical senses can be seen as part of a defense system to protect our internal environment—once something is placed in the mouth, there is high survival value in deciding whether consumption is appropriate. Sensory qualities (tastes, retronasal odors, tactile qualities) occurring together in the mouth are therefore bound into a single perception, which identifies a substance as a food (cf. Gibson 1966).
35.5 ATTENTIONAL PROCESSES IN BINDING Even though an odor’s sniffed “taste” qualities and its ability to enhance that taste in solution are highly correlated (Stevenson et al. 1999), demonstrating, for example, that a sweet-smelling odor can enhance the sweetness of sucrose in solution appears to operate under some constraints. This became evident from findings that whether an odor enhances taste was dependent on task requirements. Thus, Frank et al. (1993) found that although strawberry odor enhanced the sweetness of sucrose in solution when the subjects were asked to judge only sweetness, the enhancement was not evident when other sensory qualities of these mixtures, such as sourness and fruitiness, were rated as well. In addition, the sweetness of the strawberry/sucrose mixtures was suppressed when the subjects rated total intensity of the mixture and then partitioned their responses into sweet, salty, sour, bitter, and/or other tastes. Interestingly, these effects were also noted for some taste mixtures, in which the elements are often judged as similar (e.g., sour/bitter taste mixtures), but not others with dissimilar components (e.g., sweet/bitter mixtures; Frank et al. 1993). Similarly, significantly less sweetness enhancement was found when subjects rated the odor as well as taste intensity of flavors (sweetness plus strawberry or vanilla) than when they rated sweetness alone (Clark and Lawless 1994). In attempting to explain such effects, Frank and colleagues (Frank 2003; Frank et al. 1993; van der Klaauw and Frank 1996) suggested that, given perceptual similarity between an odor and taste, the conceptual “boundaries” that the subject sets for a given complex stimulus will reflect the task requirements. In the case of an odor/taste mixture in which the elements share a similar quality, combining those elements is essentially optional. This explanation invokes the notion that integration of perceptually similar dimensions is determined by the attentional focus demanded in the task. These effects of instructional sets are analogous to those seen in studies of cross-modal integration of vision and hearing. For example, focusing on the overall similarity of visual or auditory stimulus pairs, representing different stimulus dimensions, versus focusing on their component dimensions, can influence whether the pairs are treated as interacting or separable dimensions (Melara et al. 1992). This suggests the possibility that the apparent influence of the number of rating scales on odor/taste interactions results from the impact of these scales on how attention is directed toward the odor and taste. In keeping with this view, van der Klaauw and Frank (1996) were able to eliminate taste enhancement by directing subjects’ attention to the appropriate attributes in a taste/odor mixture, even when they were only required to rate sweetness.
35.6 ANALYSIS AND SYNTHESIS IN PERCEPTION OF FLAVOR These attentional effects appear to correspond to the differing modes of interaction that occur within sensory modalities. The blending of odors to form entirely new odors is a commonplace occurrence in flavor chemistry and perfumery (at least for odor mixtures with greater than two components; see Laing and Willcox 1983), and hence is referred to as synthetic interaction (analogous to the blending
709
Multimodal Chemosensory Interactions and Perception of Flavor
of light wavelengths). By contrast, the mixing of tastes is typically seen as an analytic process, because individual taste qualities do not fuse to form new qualities and, like simultaneous auditory tones, can be distinguished from one another in mixtures. A further category of interaction, namely, fusion—the notion of sensations combined to form a single percept, rather than combining synthetically to form a new sensation—has also been proposed and applied to flavor perception (McBurney 1986). The notion of fusion in flavor perception implies that the percept remains analyzable into its constituent elements even when otherwise perceived as a whole. Thus, although our initial response is to apple flavor—an effortless combining of all of its sensory qualities into a single percept—we can, if required, switch between a synthetic approach to flavor and an analysis of the flavor elements. Hence, apple flavor can be both a synthetic percept and, with minimal effort, a collection of tastes (sweet; sour), textures (crisp; juicy) and odor notes (lemony; acetone-like; honey) (see Figure 35.2). A more precise way of conceptualizing flavor therefore is that cross-modal sensory signals are combined to produce a percept, rather than combining synthetically—in the way that odors themselves do—to form a new sensation. During normal food consumption, we typically respond to flavors synthetically—an approach reinforced by the olfactory illusion and by the extent to which flavor components are congruent. As noted earlier, this implies a sharing of perceptual qualities, for example, sweetness of a taste and of an odor, derived from prior experience of these qualities together. Conversely, analytic approaches to complex food or other flavor stimuli (e.g., wines) are often used by trained assessors to provide a descriptive profile of discrete sensory qualities, as distinct from an assessment of the overall flavor. Asking assessors to become analytical appears to produce the same inhibitory effects on odor–taste interactions noted in studies by Frank et al. (1993) and others. In one study using both trained descriptive panelists and untrained consumers (Bingham et al. 1990), solutions of the sweet-smelling odorant maltol plus sucrose were rated as sweeter than a solution of sucrose alone by the untrained consumers. In contrast, no such enhancement was found in the ratings of those trained to adopt an analytical approach to the sensory properties of this mixture. In experimental paradigms, whether an odor/taste mixture is perceived analytically or synthetically can be determined by the responses required of the subject. Multiple ratings of appropriate
Apple flavor (synthetic)
Elements of apple flavor (analytic) Flouriness Juiciness
Acetone odor 100 80 60 40
Grassy odor Lemon odor
20 Firmness
0
Crispness
Stalky odor
Honey flavor
Acid taste
Stewed flavor Sweet taste
Apple A Apple B Apple C
FIGURE 35.2 Synthetic and analytic views of a flavor. In each case, sensory signals are identical, but perception differs—whole flavor of apple versus a collection of sensory qualities on which different apples may vary.
710
The Neural Bases of Multisensory Processes
attributes force an analytical approach, whereas a single rating of a sensory quality that can apply to both congruent odors and tastes (e.g., the tasted sweetness of sucrose and the smelled sweetness of strawberry odor) encourages synthesis of the common quality from both sensory modalities. The components of these flavors may not be treated separately when judged in terms of sweetness or other single characteristics. When instructions require separation of the components, however, this can be done—the components of a flavor are evaluated individually, and sweetness enhancement is eliminated. In other words, rating requirements lead to different perceptual approaches (analytical or synthetic) that, in turn, influence the degree of perceptual integration that occurs. A recent study of odor mixtures has indicated that an analytical approach is similarly able to influence the integration of the individual mixture components, as shown in a reduction in the extent to which subjects perceived a unique quality distinct from those of the components (Le Berre et al. 2008).
35.7 INVESTIGATING COGNITIVE PROCESSES IN FLAVOR PERCEPTION Thus, the concept of fusion suggests that flavor perception is highly dependent on both past experience with specific odor/taste combinations (the origin of congruence) and cognitive factors that influence whether the flavor elements are combined or not. The most influential model of visual binding proposes that individual visual features are only loosely associated during early stages of processing, most likely by a common spatial location, but are bound to form a coherent perception as a result of attention directed toward combining these features as aspects of the same object or scene (Treisman 1998, 2006). Similarly, the configural account of odor/taste perceptual learning (Stevenson and Boakes 2004) implies that when attention is directed toward a flavor, it is attended to as a single compound or configuration, rather than a collection of elements. A configural explanation for the ability of an odor to later summate with the taste to produce enhanced sweetness implies an attentional approach that combines the odor and taste, rather than identifying them as separate elements in the flavor. In other words, for a complete binding of flavor features via configural learning, synthesis of the elements via attending to the whole flavor is critical. The limited evidence that exists suggests that the binding and joint encoding of odors, tastes, and tactile sensations is automatic. This is indicated both by the finding that perceptual changes in odors after pairing with tastes appears not to require conscious awareness on the part of the subject of the particular odor– taste contingencies (Stevenson et al. 1998) and data suggesting that a single coexposure of an odor and taste can result in transfer of the taste properties to the odor (Prescott et al. 2004). Thus, such learning should be sensitive to manipulations in which attention is directed toward the identity of their constituent elements. One approach to examining the role of these factors has been to force subjects to adopt contrasting attentional strategies (analytic vs. synthetic) while either experiencing or judging odor/taste mixtures. If it is the case that odor/taste interactions can be influenced by the extent to which an analytical or synthetic perceptual approach is taken during rating, then this suggests the possibility that the extent to which the odors and tastes become integrated (as shown by increased perceptual similarity) might similarly be determined by the way in which the components of the flavor are associated during their joint exposure. In turn, any influence of odors on those tastes in solution may similarly be modulated. Hence, an exposure strategy that emphasizes the distinctiveness of the elements in the odor/taste mixture (an analytical perceptual strategy) should inhibit increases in the taste properties of the odor, and the subsequent ability of the odor to influence tastes in solution. In contrast, treating the elements as a synthetic whole is likely to encourage the blurring of the perceptual boundaries, fostering subsequent odor/taste interactions. Consistent with this, pre-exposure of the elements of the specific odor–taste flavor compounds that were later repeatedly associated—in Pavlovian terms, unconditional stimulus or conditional stimulus pre-exposure—eliminated any change in the odors’ perceptual qualities following the pairing (Stevenson and Case 2003). Thus, pre-exposed odors later paired with sweet or sour tastes did
711
Multimodal Chemosensory Interactions and Perception of Flavor
not become sweeter or sourer smelling, whereas taste-paired odors that had not been pre-exposed did. In contrast, initial attempts to disrupt configural integrity by directing attention toward the elemental nature of the compound stimulus during associative pairing were unsuccessful. Neither training subjects to distinguish the individual odor and taste components of flavors prior to learning (Stevenson and Case 2003) nor predisposing subjects to adopt an analytical strategy by requiring intensity ratings of these odor and taste components separately during their joint exposure (Prescott et al. 2004) were initially successful in influencing whether odors paired with a sweet tastes became sweeter smelling. This is probably attributable to methodological reasons. If it is the case that odors and tastes are automatically coencoded as a flavor in the absence of task demands that focus attention on the elements, then experimental designs in which odor and taste elements appear together without such an attentional strategy are likely to predispose toward synthesis. Hence, the analytical strategy used by Stevenson and Case (2003) was likely to be ineffective since they asked subjects during the exposure to rate overall liking for the odor–taste compound, an approach that may have encouraged integration of the elements. The analytical manipulation in Prescott et al.’s (2004) study may not have influenced the development of smelled sweetness because it took place after the initial pairing of the sweet taste and odor that occurred before the formal associative process—that is, as the preconditioning measure in the pre–post design. As noted earlier, a second study in Prescott et al.’s (2004) report demonstrated that a single odor–sweet taste coexposure can produce an odor that smells sweet. More recently, it has been demonstrated that when such methodological considerations are addressed, prior analytical training in which attention is explicitly directed toward the individual elements in an odor and sweet taste mixture does inhibit the development of a sweet-smelling odor (Prescott and Murphy 2009; see Figure 35.3a). In this study, subjects only ever received a particular
(b) Sucrose Water
30 20 10 0 –10 –20 –30
Synthetic
Analytic Group
Mean change in tasted sweetness
Mean change in odor smelled sweetness
(a)
35 30 25 20 15 10 5 0 –5
Synthetic
Analytic
Group
FIGURE 35.3 Changes in perceptual characteristics of odors and flavors as a function of odor–taste coexposure and attentional strategy. (a) Mean ratings of smelled sweetness of odors increase after repeat paired with a sweet taste in solution, but only for a group using a strategy in which odor and taste elements are treated synthetically. In contrast, coexposure to same odor–taste mixture when odor and taste are attended to analytically as distinct elements produces no such increase in smelled sweetness. (Reprinted from Prescott, J., and Murphy, S., Q. J. Exp. Psychol., 62 (11), 2133–2140, 2009. Copyright, with permission from Taylor & Francis.) (b) Mean ratings of sweetness of a flavor composed of sucrose in solution together with an odor that has previously been conditioned with this taste so that it smells sweet. Despite this, enhancement is evident only in a group that treated elements synthetically during their association. (Adapted from Prescott, J. et al., Chem. Senses, 29, 331–334, 2004. With permission.)
712
The Neural Bases of Multisensory Processes
odor taste combination under conditions in which they had been trained to respond to the combination in explicitly synthetic or analytical ways. Moreover, the fact that the training used different odor/taste combinations than were later used in the conditioning procedure suggests that an attentional approach (analytical or synthetic) was being induced in the subjects during training that was then applied to new odor/taste combinations during conditioning. The findings from this study have important theoretical implications, in that they are clearly consistent with configural accounts of perceptual odor–taste learning and flavor representation (Stevenson and Boakes 2004; Stevenson et al. 1998). Under conditions where attention is directed toward individual stimulus elements during conditioning, the separate representation of these elements may be incompatible with learning of a configural representation. This explanation is supported by the demonstration that an analytical approach also acted to inhibit a sweet-smelling odor’s ability to enhance a sweet taste when the odor/taste combination were evaluated in solution after repeated pairing (Prescott et al. 2004; see Figure 35.3b). In other words, an analytical attentional strategy can be shown to interfere with either the development of a flavor configuration resulting from associative learning, or the subsequent ability of this configuration to combine with a physically present tastant.
35.8 HEDONIC IMPLICATIONS OF CHEMOSENSORY INTEGRATION Likes and dislikes naturally arise from the integrated perception of flavor, since we are responding to substances that we have learned to recognize as foods and that are therefore biologically, culturally, and socially valued. Initial (“gut”) responses to foods are almost always hedonic and this naturally precedes accepting or rejecting the food. Hence, perhaps unique among multisensory interactions, multisensory integration in the chemical senses is, to greater and lesser extents, a process that has an inherently hedonic dimension. As with perceptual changes in odors, hedonic properties of flavors arise from associative learning. Although our initial responses to odors may or may not be strongly positively or negatively valenced, tastes evoke emotions that are innate (Prescott 1998; Steiner et al. 2001). Because of this hedonic valence, repeated pairing odors with tastes not only produces a transfer of perceptual properties, leading to odor–taste properties, but also a change in the hedonic character of the odor, and hence also of the flavor. Thus, repeat pairing of a novel odor that is hedonically neutral with a liked sweet taste typically produces an increase in liking for that odor; conversely, pairing with a bitter taste produces a disliked odor (Baeyens et al. 1990; Zellner et al. 1983). This form of learning, known as evaluative conditioning (EC; Levey and Martin 1975), is procedurally identical to odor–taste perceptual learning. Nevertheless, evaluative and perceptual associative conditioning can be distinguished by the fact that conditioned increases in the taste properties of odors can occur without consistent changes in liking (Stevenson et al. 1995, 1998) and also by the reliance of odor– taste evaluative, but not perceptual, learning on the motivational state of the individual. Hence, EC is reduced or eliminated under conditions of satiation, whereas perceptual learning is unaffected (Yeomans and Mobini 2006). EC, but not perceptual learning, also relies on the relative hedonic value of the tastes. Although even relatively weak bitterness per se is universally negative (Steiner et al. 2001), in adults there is variation in the extent to which sweetness is hedonically positive (Pangborn 1970). However, when this is controlled for, by selecting “sweet likers”—commonly defined as those whose hedonic responses tend to increase with increasing sweetener concentration—odors paired with sweet tastes reliably become more liked (Yeomans et al. 2006, 2009). A configural or holistic learning model of the type discussed earlier in relation to perceptual changes in odors paired with tastes, also accounts for odor–taste evaluative learning by proposing that the configuration includes a hedonic component “supplied” by the taste, which is evoked when the odor or flavor is experienced (De Houwer et al. 2001). This model is supported for EC by an identical finding for analytical versus synthetic attention as that shown with perceptual learning. That is, training to identify the elemental nature of the odor–taste compound during
Multimodal Chemosensory Interactions and Perception of Flavor
713
learning also eliminates the transfer of hedonic properties from the taste to the odor (Prescott and Murphy 2009), suggesting that the formation of an odor–taste configuration that includes hedonic values has been inhibited. Recent evidence also suggests that, even after learning, the hedonic value of a flavor can be altered by the extent to which an analytical approach is taken to the flavor. Comparisons between acceptability ratings alone and the same ratings followed by a series of analytical ratings of flavor sensory qualities found a reduction of liking in the latter condition (Prescott et al. 2011), suggesting that analytical approaches are inhibitory to liking even once that liking has been established. The explanation for this effect is that, as with the similar effects on perceptual learning reported by Prescott et al. (2004), an analytical attentional strategy is induced by knowledge that the flavour is to be perceptually analyzed, reducing the configuration process responsible for the transfer of hedonic properties. This finding joins a number of others indicating that analytical cognitions are antagonistic toward the expression of likes and dislikes (Nordgren and Dijksterhuis 2008). An additional consequence of EC has been demonstrated in studies that have measured the behavioral consequences of pairing an odor with a tastant that may be valued metabolically. A considerable body of animal (Myers and Sclafani 2001) and human (Kern et al. 1993; Prescott 2004a; Yeomans et al. 2008b) literature exists showing that odor–taste pairing leading to learned preferences is highly effective when a tastant that provides valued nutrients is ingested. This process can be shown to be independent of the hedonic value of the tastant—for example, by comparing conditioning of odors using sweet tastants that provide energy (e.g., sucrose) with those that do not (Mobini et al. 2007). As with EC generally, this form of postingestive conditioning is sensitive to motivational state and is maximized when conditioning and evaluation of learning take place under relative hunger (Yeomans and Mobini 2006). It has also been recently demonstrated that a novel flavor paired with ingested monosodium glutamate (MSG) not only increased in rated liking, even when tested without added MSG, but also, relative to a non-MSG control, produced behavioral changes including increases in ad libitum food intake and rated hunger after an initial tasting of the flavor (Yeomans et al. 2008). Finally, one interesting behavioral consequence of odor–taste perceptual integration has been a demonstration that a sweet-smelling odor significantly increased pain tolerance relative to a noodor control (Prescott and Wilkie 2007). Given that the effect was not seen in an equally pleasant, but not sweet-smelling, odor, the conclusion drawn was that the odor sweetness was acting in an equivalent manner to sweet tastes, which have been shown to have this same effect on pain (Blass and Hoffmeyer 1991). Although the presumption is that such effects are also the result of the same learned integration that produces the sweet smell and the ability to modify taste perceptions, the crucial demonstration of this has yet to be carried out. It does suggest, however, that the process of elicitation of a flavor representation by an odor may have broad behavioral as well as perceptual and hedonic consequences. There have been some recent attempts to explore the practical implications of odor–taste learning, opening opportunities to perhaps exploit its consequences. It has been shown, for example, that the enhancement of tastes by congruent odors seen in model systems (i.e., solutions) also occurs in foods, with bitter- and sweet-smelling odors enhancing their respective congruent tastes in milk drinks (Labbe et al. 2006). Also consistent with data derived from model systems was a failure in these studies for a sweet-smelling odor to enhance the sweetness of an unfamiliar beverage. Most recently, an examination of the potential for odors from a range of salty foods to enhance saltiness in solution (Lawrence et al. 2009) raised the possibility that such odors could be used to effectively reduce the sodium content of foods, without the typical concurrent loss of acceptability that occurs (Girgis et al. 2003). Similarly, the finding that odors can take on fatlike properties after associative pairing with fats (Sundquist et al. 2006) might allow odors to partially substitute for actual fat content in foods. These studies therefore point to an exciting prospect, in which research aimed at understanding multisensory processes in flavor perception may lead to applications that ultimately have important public health consequences.
714
The Neural Bases of Multisensory Processes
REFERENCES Baeyens, F., P. Eelen, O. Van den Bergh, and G. Crombez. 1990. Flavor–flavor and color–flavor conditioning in humans. Learning and Motivation 21: 434–445. Bingham, A. F., G. G. Birch, C. De Graaf, J. M. Behan, and K. D. Perring. 1990. Sensory studies with sucrose– maltol mixtures. Chemical Senses 15(4): 447–456. Blass, E. M., and L. B. Hoffmeyer. 1991. Sucrose as an analgesic for newborn infants. Pediatrics 87(2): 215–218. Brillat-Savarin, J.-A. 1825. The physiology of taste, 1994 ed. London: Penguin Books. Burdach, K. J., J. H. A. Kroeze, and E. P. Koster. 1984. Nasal, retronasal, and gustatory perception: An experimental comparison. Perception & Psychophysics 36(3): 205–208. Calvert, G. A., M. J. Brammer, E. T. Bullmore, R. Campbell, S. D. Iversen, and A. S. David. 1999. Response amplification in sensory-specific cortices during crossmodal binding. NeuroReport 10: 2619–2623. Calvert, G. A., M. J. Brammer, and S. D. Iversen. 1998. Crossmodal identification. Trends in Cognitive Sciences 2(7): 247–253. Cardello, A. V. 1981. Comparison of taste qualities elicited by tactile, electrical and chemical stimulation of single human taste papillae. Perception & Psychophysics 29: 163–169. Clark, C. C., and H. T. Lawless. 1994. Limiting response alternatives in time–intensity scaling: An examination of the halo-dumping effect. Chemical Senses 19(6): 583–594. Cruz, A., and B. G. Green. 2000. Thermal stimulation of taste. Nature 403: 889–892. Dalton, P., N. Doolittle, H. Nagata, and P. A. S. Breslin. 2000. The merging of the senses: Integration of subthreshold taste and smell. Nature Neuroscience 3: 431–432. De Houwer, J., S. Thomas, and F. Baeyens. 2001. Associative learning of likes and dislikes: A review of 25 years of research on human evaluative conditioning. Psychological Bulletin 127(6): 853–869. Delwiche, J. F., and A. L. Heffelfinger. 2005. Cross-modal additivity of taste and smell. Journal of Sensory Studies 20: 512–525. Delwiche, J. F., M. L. Lera, and P. A. S. Breslin. 2000. Selective removal of a target stimulus localized by taste in humans. Chemical Senses 25: 181–187. Djordjevic, J., R. J. Zatorre, and M. Jones-Gotman. 2004. Effects of perceived and imagined odors on taste detection. Chemical Senses 29: 199–208. Dravnieks, A. 1985. Atlas of odor character profiles. Philadelphia, PA: American Society for Testing and Materials. Driver, J., and C. Spence. 2000. Multisensory perception: Beyond modularity and convergence. Current Biology 10: R731–R735. Frank, R. A. 2003. Response context affects judgments of flavor components in foods and beverages. Food Quality and Preference 14: 139–145. Frank, R. A., and J. Byram. 1988. Taste-smell interactions are tastant and odorant dependent. Chemical Senses 13(3): 445–455. Frank, R. A., K. Ducheny, and S. J. S. Mize. 1989. Strawberry odor, but not red color, enhances the sweetness of sucrose solutions. Chemical Senses 14(3): 371–377. Frank, R. A., N. J. van der Klaauw, and H. N. J. Schifferstein. 1993. Both perceptual and conceptual factors influence taste–odor and taste–taste interactions. Perception & Psychophysics 54(3): 343–354. Gibson, J. J. 1966. The senses considered as perceptual systems. Boston: Houghton Mifflin Company. Girgis, S., B. Neal, J. Prescott et al. 2003. A one-quarter reduction in the salt content of bread can be made without detection. European Journal of Clinical Nutrition 57(4): 616–620. Green, B. G. 2003. Studying taste as a cutaneous sense. Food Quality and Preference 14: 99–109. Green, B. G., and J. E. Hayes. 2003. Capsaicin as a probe of the relationship between bitter taste and chemesthesis. Physiology & Behavior 79: 811–821. Kern, D. L., L. McPhee, J. Fisher, S. Johnson, and L. L. Birch. 1993. The postingestive consequences of fat condition preferences for flavors associated with high dietary fat. Physiology & Behavior 54: 71–76. Kobayakawa, T., H. Toda, and N. Gotow. 2009. Synchronicity judgement of gustation and olfaction. Paper presented at the Association for Chemoreception Sciences, Sarasota, FL. Labbe, D., L. Damevin, C. Vaccher, C. Morgenegg, and N. Martin. 2006. Modulation of perceived taste by olfaction in familiar and unfamiliar beverages. Food Quality and Preference 17: 582–589. Laing, D. G., and M. E. Willcox. 1983. Perception of components in binary odour mixtures. Chemical Senses 7(3–4): 249–264. Lawrence, G., C. Salles, C. Septier, J. Busch, and T. Thomas-Danguin. 2009. Odour–taste interactions: A way to enhance saltiness in low-salt content solutions. Food Quality and Preference 20(3): 241–248.
Multimodal Chemosensory Interactions and Perception of Flavor
715
Le Berre, E., T. Thomas-Danguin, N. Beno, G. Coureaud, P. Etievant, and J. Prescott. 2008. Perceptual processing strategy and exposure influence the perception of odor mixtures. Chemical Senses 33: 193–199. Levey, A. B., and I. Martin. 1975. Classical conditioning of human ‘evaluative’ responses. Behavior Research & Therapy 13: 221–226. Lim, J., and B. G. Green. 2008. Tactile interaction with taste localization: Influence of gustatory quality and intensity. Chemical Senses 33: 137–143. Martino, G., and L. E. Marks. 2001. Synesthesia: Strong and weak. Current Directions in Psychological Science 10(2): 61–65. McBurney, D. H. 1986. Taste, smell, and flavor terminology: Taking the confusion out of fusion. In Clinical measurement of taste and smell, ed. H. L. Meiselman and R. S. Rivkin, 117–125. New York: Macmillan. Melara, R. D., L. E. Marks, and K. E. Lesko. 1992. Optional processes in similarity judgments. Perception & Psychophysics 51(2): 123–133. Mobini, S., L. C. Chambers, and M. R. Yeomans. 2007. Interactive effects of flavour–flavour and flavour–consequence learning in development of liking for sweet-paired flavours in humans. Appetite 48: 20–28. Myers, K. P., and A. Sclafani. 2001. Conditioned enhancement of flavor evaluation reinforced by intragastric glucose: I. Intake acceptance and preference analysis. Physiology & Behavior 74: 481–493. Nguyen, D. H., D. Valentin, M. H. Ly, C. Chrea, and F. Sauvageot. 2002. When does smell enhance taste? Effect of culture and odorant/tastant relationship. Paper presented at the European Chemoreception Research Organisation conference, Erlangen, Germany. Nordgren, L. F., and A. P. Dijksterhuis. 2008. The devil is in the deliberation: Thinking too much reduces preference consistency. Journal of Consumer Research 36: 39–46. Pangborn, R. M. 1970. Individual variation in affective responses to taste stimuli. Psychonomic Science 21(2): 125–126. Pfieffer, J. C., T. A. Hollowood, J. Hort, and A. J. Taylor. 2005. Temporal synchrony and integration of subthreshold taste and smell signals. Chemical Senses 30: 539–545. Pierce, J., and B. Halpern. 1996. Orthonasal and retronasal odorant identification based upon vapor phase input from common substances. Chemical Senses 21(5): 529–543. Prescott, J. 1998. Comparisons of taste perceptions and preferences of Japanese and Australian consumers: Overview and implications for cross-cultural sensory research. Food Quality and Preference 9(6): 393–402. Prescott, J. 1999. Flavour as a psychological construct: Implications for perceiving and measuring the sensory qualities of foods. Food Quality and Preference 10: 349–356. Prescott, J. 2004a. Effects of added glutamate on liking for novel food flavors. Appetite 42(2): 143–150. Prescott, J. 2004b. Psychological processes in flavour perception. In Flavour perception, ed. A. J. Taylor and D. Roberts, 256–277. London: Blackwell Publishing. Prescott, J., V. Johnstone, and J. Francis. 2004. Odor/taste interactions: Effects of different attentional strategies during exposure. Chemical Senses 29: 331–340. Prescott, J., and S. Murphy. 2009. Inhibition of evaluative and perceptual odour–taste learning by attention to the stimulus elements. Quarterly Journal of Experimental Psychology 62: 2133–2140. Prescott, J., K.-O. Kim, and S. M. Lee. 2011. Analytic approaches to evaluation modify hedonic responses. Food Quality and Preference 22: 391–393. Prescott, J., and J. Wilkie. 2007. Pain tolerance selectively increased by a sweet-smelling odor. Psychological Science 18(4): 308–311. Rozin, P. 1982. “Taste–smell confusions” and the duality of the olfactory sense. Perception & Psychophysics 31(4): 397–401. Sakai, N., T. Kobayakawa, N. Gotow, S. Saito, and S. Imada. 2001. Enhancement of sweetness ratings of aspartame by a vanilla odor presented either by orthonasal or retronasal routes. Perceptual and Motor Skills 92: 1002–1008. Schifferstein, H. N. J., and P. W. J. Verlegh. 1996. The role of congruency and pleasantness in odor-induced taste enhancement. Acta Psychologica 94: 87–105. Small, D. M., J. C. Gerber, Y. E. Mak, and T. Hummel. 2005. Differential neural responses evoked by orthonasal versus retronasal odorant perception in humans. Neuron 47: 593–605. Spence, C., and S. Squire. 2003. Multisensory integration: Maintaining the perception of synchrony. Current Biology 13: R519–R521. Spence, C., and C. Parise. 2010. Prior-entry: A review. Consciousness and Cognition 19: 364–379. Stein, B. E., W. S. Huneycutt, and M. A. Meredith. 1988. Neurons and behavior: The same rules of multisensory integration apply. Brain Research 448: 355–358.
716
The Neural Bases of Multisensory Processes
Steiner, J. E., D. Glaser, M. E. Hawilo, and K. C. Berridge. 2001. Comparative expression of hedonic impact: Affective reactions to taste by human infants and other primates. Neuroscience & Biobehavioral Reviews 25: 53–74. Stevenson, R. J., and R. A. Boakes. 2004. Sweet and sour smells: The acquisition of taste-like qualities by odors. In Handbook of multisensory processes, ed. G. Calvert, C. B. Spence, and B. Stein, 69–83. Cambridge, MA: MIT Press. Stevenson, R. J., R. A. Boakes, and J. Prescott. 1998. Changes in odor sweetness resulting from implicit learning of a simultaneous odor–sweetness association: An example of learned synesthesia. Learning and Motivation 29: 113–132. Stevenson, R. J., and T. I. Case. 2003. Preexposure to the stimulus elements, but not training to detect them, retards human odour–taste learning. Behavioural Processes 61: 13–25. Stevenson, R. J., J. Prescott, and R. A. Boakes. 1995. The acquisition of taste properties by odors. Learning & Motivation 26: 1–23. Stevenson, R. J., J. Prescott, and R. A. Boakes. 1999. Confusing tastes and smells: How odors can influence the perception of sweet and sour tastes. Chemical Senses 24: 627–635. Sumby, W. H., and I. Polack. 1954. Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26: 212–215. Sundquist, N., R. J. Stevenson, and I. R. J. Bishop. 2006. Can odours acquire fat-like properties? Appetite 47: 91–99. Todrank, J., and L. M. Bartoshuk. 1991. A taste illusion: Taste sensation localised by touch. Physiology & Behavior 50: 1027–1031. Treisman, A. 1998. Feature binding, attention and object perception. Philosophical Transactions of the Royal Society, London B 353: 1295–1306. Treisman, A. 2006. How the deployment of attention determines what we see. Visual Cognition 14: 411–443. van der Klaauw, N. J., and R. A. Frank. 1996. Scaling component intensities of complex stimuli: The influence of response alternatives. Environment International 22(1): 21–31. Voirol, E., and N. Daget. 1986. Comparative study of nasal and retronasal olfactory perception. LebensmittelWissenschaft und-Technologie 19: 316–319. von Bekesy, G. 1964. Olfactory analogue to directional hearing. Journal of Applied Physiology 19: 369–373. White, T. L., and J. Prescott. 2007. Chemosensory cross-modal Stroop effects: Congruent odors facilitate taste identification. Chemical Senses 32: 337–341. Whitehead, M. C., C. S. Beeman, and B. A. Kinsella. 1985. Distribution of taste and general sensory nerve endings in fungiform papillae of the hamster. American Journal of Anatomy 173: 185–201. Yeomans, M. R., N. Gould, S. Mobini, and J. Prescott. 2008a. Acquired flavor acceptance and intake facilitated by monosodium glutamate in humans. Physiology & Behavior 93: 958–66. Yeomans, M. R., M. Leitch, N. J. Gould, and S. Mobini. 2008b. Differential hedonic, sensory and behavioral changes associated with flavor–nutrient and flavor–flavor learning. Physiology & Behavior 93: 798–806. Yeomans, M. R., and S. Mobini. 2006. Hunger alters the expression of acquired hedonic but not sensory qualities of food-paired odors in humans. Journal of Experimental Psychology: Animal Behavior Processes 32(4): 460–466. Yeomans, M. R., S. Mobini, T. D. Elliman, H. C. Walker, and R. J. Stevenson. 2006. Hedonic and sensory characteristics of odors conditioned by pairing with tastants in humans. Journal of Experimental Psychology: Animal Behavior Processes 32(3): 215–228. Yeomans, M. R., J. Prescott, and N. G. Gould. 2009. Individual differences in responses to tastes determine hedonic and perceptual changes to odours following odour/taste conditioning. Quarterly Journal of Experimental Psychology 62(8): 1648–1664. Zellner, D. A., P. Rozin, M. Aron, and C. Kulish. 1983. Conditioned enhancement of human’s liking for flavor by pairing with sweetness. Learning and Motivation 14: 338–350.
36
A Proposed Model of a Flavor Modality Dana M. Small and Barry G. Green
CONTENTS 36.1 Introduction........................................................................................................................... 717 36.2 Flavor is Taste, Touch, and Smell.......................................................................................... 717 36.3 Oral Referral.......................................................................................................................... 720 36.3.1 Olfactory Referral...................................................................................................... 720 36.3.2 Taste Referral: Localization of Taste by Touch......................................................... 724 36.3.3 Shared Qualities between Olfaction and Taste.......................................................... 725 36.4 The Proposed Model.............................................................................................................. 725 36.4.1 Odor Objects.............................................................................................................. 725 36.4.1.1 Synthesis..................................................................................................... 725 36.4.1.2 Experience.................................................................................................. 727 36.4.2 Flavor Objects............................................................................................................ 727 36.4.3 Encoding of Flavor Objects....................................................................................... 728 36.5 Neural Mechanisms............................................................................................................... 729 36.5.1 The Binding Mechanism........................................................................................... 729 36.5.2 Neural Correlates of Flavor Object............................................................................ 731 36.6 Alternative Models................................................................................................................ 733 36.7 Summary............................................................................................................................... 733 References....................................................................................................................................... 733
36.1 INTRODUCTION The perception of flavor occurs when a food or drink enters the mouth. Although the resulting perception depends on inputs from multiple sensory modalities, it is experienced as a unitary percept of a food or beverage. In this chapter the psychophysical characteristics and neural substrates of flavor perception are reviewed within the context of a proposed model of a flavor modality in which the diverse sensory inputs from the mouth and nose become integrated. More specifically, it is argued that a binding mechanism in the somatomotor mouth area of the cortex brings taste, touch, and smell together into a common spatial register and facilitates their perception as a coherent “flavor object.” We propose that the neural representation of the flavor object is a distributed pattern of activity across the insula, overlying operculum (including the somatomotor mouth region), orbito frontal, piriform, and anterior cingulate cortex.
36.2 FLAVOR IS TASTE, TOUCH, AND SMELL When we “taste,” we also touch the food or drink in our mouths and sense its odor, via retronasal olfaction (Figure 36.1). The term flavor describes this multimodal experience. The gustatory sense (i.e., taste) refers specifically to the sensations of sweet, sour, salty, bitter, savory (Bartoshuk 1991; 717
718
The Neural Bases of Multisensory Processes
Volatiles in orthonasal route Volatiles in retronasal route
Olfactory epithelium Ortho
Retro
Nares Ortho Tongue
Retro
FIGURE 36.1 Orthonasal vs. retronasal olfaction. Schematic depiction of two routes of olfactory perception: orthonasal and retronasal. Odors sensed orthonasally enter the body through the nose (nares) and travel directly to olfactory epithelium in nasal cavity. Odors sensed retronasally enter the mouth during eating and drinking. Volatiles are released from food or drink and subsequently pass through the nasophyarynx at back of oral cavity to enter nasal cavity and reach olfactory epithelium. (From Kringelbach, M.L., Berridge, K.C., eds., Oxford handbook: Pleasures of the brain, 2009. With permission from Oxford University Press, Inc.)
Chandrashekar et al. 2006), and perhaps fat (Chale-Rush et al. 2007; Gilbertson 1998; Gilbertson et al. 1997). Each of the five major taste qualities serves to signal a specific class of nutrients or potential threats: sweet signals energy in the form of calories, salty signals electrolytes, sour signals low pH, savory (umami) signals proteins, and since most poisonous substances are bitter, bitterness signals potential toxins (Scott and Plata-Salaman 1991). Thus, the sense of taste helps identify physiologically beneficial nutrients and potentially harmful stimuli. Because taste receptors lie side by side in the oral cavity with thermoreceptors, mechanoreceptors, and nociceptors, everything that is tasted induces tactile and thermal sensations, and sometimes also chemesthetic sensations (e.g., burning and stinging; Green 2003; Simon et al. 2008). In addition, some taste stimuli can themselves evoke somatosensory sensations. For example, in moderate to high concentrations, salts and acids can provoke chemesthetic sensations of burning, stinging, or pricking (Green and Gelhard 1989; Green and Lawless 1991). Consequently, even presumably “pure taste” stimuli can have an oral somatosensory component. The taste signal itself is carried from taste receptor cells in the oral cavity by cranial nerves VII, IX, and X to the nucleus of the solitary tract in the brainstem, where taste inputs are joined by oral somatosensory projections from the spinal trigeminal nucleus. The precise locations of the trigeminal projections vary across species, but there is evidence (including in humans) of overlap with gustatory areas (Whitehead 1990; Whitehead and Frank 1983), and of tracts that run within the nucleus of the solitary tract that may facilitate cross-modal integration (Travers 1988; Figure 36.2). Somatosensory input also reaches the nucleus of the solitary tract via the glossopharyngeal nerve, which contains taste-sensitive, as well as mechano- and thermosensitive neurons (Bradley et al. 1992). Overlapping representation of gustatory and somatosensory information also occurs in the
719
A Proposed Model of a Flavor Modality
PO
ACC AI LOFC LOFC MOFC
OB
MI
Thalamus
VI Amyg Piri
I VII X IX V
NST
Taste Somatosensory Olfactory
FIGURE 36.2 Oral sensory pathways. A glass brain schematic depiction of taste (black circles), somatosensory (white circles), and olfactory (gray circles) pathways. Anatomical locations are only approximate and connectivity is not exhaustive. Information from taste receptors on tongue is conveyed via the chorda tympani (VII), glossophyarngeal nerve (IX), and vagus nerve (X) to rostral nucleus of the solitary tract (NST), which then projects to thalamus. From here, taste information projects to mid insula (MI) and anterior insula and overlying frontal operculum (AI). AI also projects to ventral insula (VI), medial orbitofrontal cortex (MOFC), and lateral orbitofrontal cortex (LOFC). Somatosensory input reaches NST via glossopharyngeal nerve (IX) and trigeminal nerve (V), which then project to thalamus. Oral somatosensory information is then relayed to opercular region of postcentral gyrus (PO). Olfactory information is conveyed via cranial nerve I to olfactory bulb, which projects to primary olfactory cortex, including piriform cortex (piri). Piriform cortex, in turn, projects to VI and orbitofrontal cortex. Anterior cingulated cortex (ACC) and amygdala (Amyg) are also strongly interconnected with insula and orbital regions representing taste, smell, and oral somatosensation. (From Kringelbach, M.L., Berridge, K.C., eds., Oxford handbook: Pleasures of the brain, 2009. With permission from Oxford University Press, Inc.)
thalamus (Pritchard et al. 1989) and at the cortical level (Cerf-Ducastel et al. 2001; Pritchard et al. 1986). For example, the primary gustatory cortex contains nearly as many somatosensory-specific as taste-specific neurons, in addition to bimodal neurons responding to both somatosensory and taste stimulation (Kadohisa et al. 2004; Plata-Salaman et al. 1992, 1996; Smith-Swintosky et al. 1991; Yamamoto et al. 1985). �������������������������������������������������������������������� In sum, taste and oral somatosensation have distinct receptor mechanisms, but their signals converge at virtually every level of the neuroaxis, suggestive of extensive interaction. Although taste and oral somesthesis provide critical information about the physicochemical nature of ingested stimuli, it is the olfactory component of food that is required for flavor identification (Mozell et al. 1969). The acts of chewing and swallowing release volatile molecules into the oral cavity, which during exhalation traverse the epipharynx (also referred to as the nasopharynx) and stimulate receptors on the olfactory epithelium. This process is referred to as retronasal olfaction (Figure 36.1), in contrast to orthonasal olfaction, which occurs during inhalation through the nose. Both orthonasal and retronasal olfactory signals are carried via cranial nerve I to the olfactory bulb, which projects to the anterior olfactory nucleus, the olfactory tubercle, the piriform cortex, several amygdaloid subnuclei, and rostral entorhinal cortex and thalamus. These areas, in turn, project to additional amygdala subnuclei, the entorhinal, insula, and orbitofrontal cortex (OFC) (de Olmos et al. 1978; Price 1973; Turner et al. 1978; Figure 36.2). Thus, olfactory information is carried to the
720
The Neural Bases of Multisensory Processes
brain by distinct pathways and does not converge with gustation and oral somatosensation until higher-order cortical regions, such as the insula and the OFC. In summary, the perception of flavor depends on multiple distinct inputs that interact at several levels in the central nervous system. How these interactions act to “bind” the signals into coherent perceptions of flavor is currently unknown. Here, we propose a model in which the somatomotor mouth area orchestrates this binding via a process that results in referral of olfactory sensations to the oral cavity. It is worth noting that flavor percepts can also be influenced by visual inputs (Koza et al. 2005) and by beliefs and expectations (de Araujo et al. 2003), which are factors that represent top-down modulation of flavor. However, these types of cognitive effects are outside the scope of the present chapter.
36.3 ORAL REFERRAL Consistent with the concept of “proximity” proposed by Gestalt psychologists (Kohler 1929), it is well known that stimuli that appear to originate from a common location are interpreted as having a common source (Stein 1998). In the case of flavor, sensory mechanisms exist that cause all of the perceptual components of flavor (taste, smell, and touch) to appear to arise from the oral cavity (Green 2003; Hollingworth and Poffenberger 1917; Lim and Green 2008; Murphy et al. 1977; Prescott 1999; Small and Prescott 2005; Todrank and Bartoshuk 1991). Here we argue that the function of these referral mechanisms is to bring the sensory components of flavor into a common spatial register that facilitate their binding into a “flavor object.” This process may also be aided by the fact that odors and tastes can share common sensory characteristics (e.g., some odors are perceived as sweet) that blur the qualitative boundary between taste and smell and facilitate integration (Auvray and Spence 2008). Although several authors have proposed the existence of an object-based flavor system (Auvray and Spence 2008; Green 2003; Prescott 1999; Small 2008; Small and Prescott 2005), the neurophysiology of the hypothesized system remains relatively unexplored. The model proposed here holds that oral referral is required for the perception of flavor objects, and neural mechanisms that mediate referral and flavor learning are posited. Because oral referral is central to the model, we begin our discussion with the various forms of referral that have been described.
36.3.1 Olfactory Referral As noted above there are two ways to smell: during inhalation through the nose (orthonasal olfaction) and during exhalation through the mouth (retronasal olfaction) (Figure 36.1). Orthonasally sensed odors appear to emanate from objects in the external world, whereas retronasally sensed odors often appear to emanate from the oral cavity (Heilmann and Hummel 2004; Hummel et al. 2006; Murphy et al. 1977; Rozin 1982) and may be confused with taste (Ashkenazi and Marks 2004; Hollingworth and Poffenberger 1917; Murphy et al. 1977; Murphy and Cain 1980). Although scientists have been aware of the misattribution of smell as taste for some time (Tichener 1909), the first systematic study was made by Murphy et al. (1977). In that study, Murphy and her colleagues investigated the nature of taste–odor interactions by asking subjects to estimate the intensity of tastes, odors, and their mixtures. The results showed that the perceived intensity of a taste–odor mixture roughly equalled the sum of the perceived intensities of the unmixed components, indicating that tastes and odors were perceptually independent. However, subjects attribute approximately 80% of the mixture’s intensity to the gustatory modality (Murphy et al. 1977). Specifically, taste intensity ratings were higher when the nostrils were open compared to when they were pinched (a stylized version of this finding is represented in Figure 36.3). Since the odor they used, ethyl butyrate, smells sweet, they suggested the effect resulted from a combination of shared sensory properties (sweet) and the misattribution of the retronasal olfactory sweet component to the taste system due to the referral of the odor to the oral cavity. This and subsequent studies also ruled out the possibility that
721
A Proposed Model of a Flavor Modality Perceived taste magnitude
Nostrils open
15
10 No ethyl butyrate "0.00133%" "0.00685%"
5
"0.037%" Line of identity
0
0
5
10 Nostrils closed
15
FIGURE 36.3 Taste–odor confusion. This figure is a stylized representation of data reported in Figure 4 of Murphy and colleagues (1977) (rendered with permission from Dr. Claire Murphy) and represents first experimental demonstration of taste–odor confusion. Graph depicts perceived taste magnitude of mixtures of ethyl butyrate and saccharin sipped when nostrils were open versus taste of magnitude of mixtures sipped when nostrils were closed (open symbols). The parameter is concentration of odorant ethyl butyrate. Closed circles represent judgments of stimuli that contained no ethyl butyrate, only saccharin. Dashed line is the line of identity.
referral could be attributed to the activation of taste cells by odors, because the chemicals that produce taste-like smells (e.g., strawberry smells sweet) do not taste sweet when sampled in the mouth with the nares occluded (Murphy and Cain 1980; Sakai et al. 2001; Schifferstein and Verlegh 1996; Stevenson et al. 2000b). Thus, the sweet quality of an odor occurs in the absence of the activation of taste receptor cells, but when sensed retronasally may nevertheless be attributed to taste. Indeed, it has been argued that orthonasal and retronasal olfaction represent two distinct modalities. Inspired by a comment made by a friend that “I really love the taste (of Limburger cheese) if only I can get it by my nose,” Rozin (1982) first proposed that olfaction is a dual-sense modality, with one component (orthonasal olfaction) specialized for sensing objects in the world and the other (retronasal olfaction) specialized for sensing objects in the mouth. Building upon Gibson’s proposal that “tasting” and “smelling” are distinct perceptual systems that cut across receptor classes, Rozin suggested that “the same olfactory stimulation may be perceived and evaluated in two qualitatively different ways, depending on whether it was referred to the mouth or the external world.” In support of this view, he found that subjects frequently reported disliking the smell, but liking the taste, of certain foods (e.g., fish, eggs, and cheese). He also demonstrated that subjects had great difficulty correctly identifying flavor stimuli that had first been learned via the orthonasal route. These data are therefore consistent with the notion that olfactory stimuli arising from the mouth have different sensory–perceptual properties than those originating in the external world. Rozin suggested that these perceptual processes might be achieved by differential gating of inputs triggered by the presence of a palpable object in the mouth, or by the direction of movement of the odor across the olfactory mucosa. Alternatively, he posited that it may be that odor information is not gated but rather is combined with available oral inputs into an emergent percept in which the olfactory component loses its separate identity. After the publication of Rozin’s hypothesis, several investigators argued that the differences between orthonasal and retronasal olfaction were primarily quantitative rather than qualitative. This argument was based on evidence that retronasal stimulation by the same physical stimulus
722
The Neural Bases of Multisensory Processes
tends to result in lower perceived intensity than orthonasal stimulation (Pierce and Halpern 1996; Voirol and Dagnet 1986). Although it is clear that quantitative differences are present, there is also more recent evidence supporting the duality hypothesis (Bender et al. 2009; Heilmann and Hummel 2001; Hummel et al. 2006; Koza et al. 2005; Landis et al. 2005; Small et al. 2005; Sun and Halpern 2005; Welge-Lussen et al. 2009). Of particular note, Hummel and his colleagues devised a method for delivering odorants in the vapor phase via either the ortho- or retronasal routes (Figure 36.4). Critically, the method allows assessment of retronasal olfaction without stimulation of the oral cavity (Heilmann and Hummel 2004). Two tubes are inserted into the subject’s nose under endoscopic guidance so that one tube ends at the external nares (to achieve orthonasal delivery) and the other tube at the epipharynx (to achieve retronasal delivery). The tubes are, in turn, connected to a computer-controlled olfactometer that delivers pulses of odorant embedded in an odorless airstream. Using an electronic nose to measure the stimulus in the airspace below the olfactory epithelium, the authors demonstrated that the maximum concentration and duration of the signal was equivalent after delivery by either route (Hummel et al. 2006). Despite similar signals and the absence of oral stimulation, the olfactory localization illusion was, in part, maintained (Figure 36.5). Subjects were more likely to report that the retronasal odors came from the back of the throat, whereas orthonasal odors appeared to come from the nose. The mechanism(s) behind the olfactory referral illusion remain unknown. However, this study ruled out intensity differences as a cue, because the odors were titrated to equate perceived intensity. The finding also suggests that oral stimulation is not required for at least some referral to occur, since the procedure involved neither a gustatory nor somatosensory stimulus. However, in a subsequent investigation in which subjects were asked to indicate if the odor were delivered orthonasally or retronasally (rather than localize it to the nose or mouth), trigeminal (chemesthetic) stimulation was found to be an important factor for making the discrimination (Frasnelli et al. 2008). More work is therefore needed to determine the degree to which odors can be referred to the mouth based on the direction of flow of the olfactory stimulus.
FIGURE 36.4 (See color insert.) An MRI image showing tubing placement using methods described by Heilmann and Hummel (2004). This sagittal brain section reveals placement of nasal cannulae at external nares to achieve orthonasal delivery, and at nasopharynx to achieve retronasal delivery. Tubes appear white and odor delivery is represented by small white dots. (Reproduced from Small, D.M. et al., Neuron, 47, 593– 605, 2005. With permission from Elsevier.)
723
A Proposed Model of a Flavor Modality Session 1
Session 2
Orthonasal
Throat
30 15
*** *
Retronasal 30 15 0
–15
–15
Nose
0
–30
H2S
CO2
–30
***
H2S
***
CO2
FIGURE 36.5 Odorant localization. Preliminary data from 20 subjects showing that orthonasal odor is perceived as coming from front of nasal cavity and retronasal odor as coming from back of nasal/oral cavity (throat). This perception occurred despite constant airflow through both routes at all times and no change in air pressure or flow rate during switching between odor and pure air. Odorants were one pure olfactant [hydrogen sulfide (H2S)] and one olfactory/chemesthetic stimulus with a significant trigeminal component [carbon dioxide (CO2)]. Results represent mean ratings from 20 subjects. Error bars represent standard error of the mean. Positive numbers indicate that subjects perceived odor at back of nasal/oral cavity (throat area), whereas negative numbers indicate subjects perceived odor at front of the nose; the higher the numbers, the more certain were subjects about their decision (range of scale from −50 to 0, and from 0 to 50). Data were obtained in two sessions separated by at least 1 day. Stimuli of 200-ms duration were presented using airdilution olfactometry (birhinal olfactometer OM6b; Burghart Instruments, Wedel, Germany). Thus, stimulation was the same as used in fMRI study (t-test: *p < .05; ***p < .001). (Reproduced from Small, D.M. et al., Neuron, 47, 593–605, 2005. With permission from Elsevier.)
A possible mechanism by which such referral might occur is the direction of odorant flow across the olfactory epithelium. Indeed, since the data supplied from the electronic nose indicated that the physical stimulus arriving at the epithelium can be identical (at least for the measured parameters), the primary difference between the routes in Hummel’s experiments was the direction of odorant flow. Hummel and colleagues therefore suggested there may be a distinct organization of olfactory receptor neurons in the back versus the more anterior portions of the nasal cavity. This hypothesis is consistent with Mozell’s chromotagraphic model of olfaction, which postulates that the pattern of odorant binding to receptors can lead to different odor perceptions (Mozell 1970). Further support for the chromatographic model comes from a study in the laboratory of Sobel et al. (1999), which showed that subtle differences in airflow patterns between the left and right nostrils can lead to different perceptual experiences. Although neither taste nor oral somatosensation appear to be required for at least some degree of referral to occur (Heilmann and Hummel 2004; Hummel et al. 2006; Small et al. 2005), further study is needed to determine if stimulation of these modalities may nevertheless contribute to referral. In summary, the olfactory localization illusion, coupled with the fact that flavor identity is conveyed primarily by olfaction, leads to the perception that flavors come from the mouth. Despite the fact that this illusion has a profound impact on flavor perception, the mechanisms that produce it remain unknown. Possible mechanisms include spatiotemporal differences in odorant binding across the olfactory epithelium during retro- versus orthonasal stimulation, and/or capture by tactile and/or gustatory stimulation.
724
The Neural Bases of Multisensory Processes
36.3.2 Taste Referral: Localization of Taste by Touch Not only are retronasal odor sensations referred to the mouth and attributed to taste; taste sensations can be referred to the location of tactile stimulation on the tongue (Green 2003; Lim and Green 2008; Todrank and Bartoshuk 1991; Figure 36.6). This illusion was first demonstrated by Todrank and Bartoshuk (1991), who were motivated by the observation that during eating, taste sensations seem to originate throughout the tongue even though the taste buds are located at specific loci (tip, side, and back of the tongue and soft palate). The authors postulated that this effect might depend on a capture-illusion similar to the ventriloquist effect, whereby one sensory modality dominates the other (Tastevin 1937). Specifically, it was hypothesized that taste localization is dominated by touch in a manner akin to the phenomenon of thermal referral (Green 1977), in which touch dominates localization of thermal sensation. To test this prediction, Todrank and Bartoshuk asked subjects to report the intensity of taste as a stimulus was painted onto the tongue along a path that traversed
“Veridical” condition dH2O
Taste stimulus
1 cm “Referral” condition Taste stimulus
dH2O
1 cm
FIGURE 36.6 Taste localization by touch. Stimulus configuration used to measure referral of taste sensations to site of tactile stimulation. On each trial, experimenter touched three saturated cotton swabs simultaneously to anterior edge of tongue, producing identical tactile stimulation at each site. In veridical condition (top), taste stimulus was delivered only on middle swab, with deionized water on two outer swabs. In referral condition, taste stimulus was delivered only on two outer swabs, with deionized water on middle swab. In both conditions, subjects’ task was to ignore any tastes on outer swabs and to rate intensity of taste perceived at middle swab. Significant taste sensations were reported at middle swab in referral condition for all four taste stimuli tested (sucrose, NaCl, citric acid, and quinine). (From Green, B.G., Food Qual. Prefer., 14, 99–109, 2002. With permission.)
A Proposed Model of a Flavor Modality
725
regions of high and low taste bud density. When the path began in a region of low taste bud density, taste sensations started out weak. As the path intersected regions of greater taste bud density, taste sensations became stronger. However, when the path returned to low density regions the sensation remained nearly as intense as it was in the high density region. The authors interpreted this result to mean that the taste sensation was “captured” by the tactile stimulation of the swab and dragged into the insensitive area. More recent work has corroborated this interpretation by finding that tastes can be localized to a spatially adjacent tactile stimulus (Green 2003; Lim and Green 2008; Figure 36.6). Although it is also true that tastes can be localized independently from touch (Delwiche et al. 2000; Lim and Green 2008; Shikata et al. 2000), we believe that referral of taste to touch helps to create a coherent “perceptive field” onto which odors are also referred, thus providing the foundation for a unitary flavor percept.
36.3.3 Shared Qualities between Olfaction and Taste In addition to oral referral mechanisms, shared qualities between olfaction and taste promote the integration of tastes and smells in flavors. Odors often have taste-like characteristics (Dravnieks 1985; Harper et al. 1968), which may be acquired by experience (Stevenson 2001; Stevenson and Boakes 2004; Stevenson et al. 2000a; Stevenson and Prescott 1995; Stevenson et al. 1999). It has been proposed that the existence of these shared qualities, coupled with olfactory referral, blurs the boundary between taste and smell, which in turn facilitates the sensation of a unitary percept (Auvray and Spence 2008; McBurney 1986). In summary, there are at least three mechanisms that promote the integration of discrete sensory inputs that are stimulated during feeding and drinking into a unitary flavor percept or object: olfactory referral, taste referral to touch, and shared taste–odor qualities.
36.4 THE PROPOSED MODEL The central tenant of the proposed model is that oral referral mechanisms play a critical role in encoding flavor by helping to fuse multisensory inputs into a perceptual gestalt. This idea builds upon, and has direct parallels with, the coding of “odor objects” as described by Haberly (2001) and by Wilson and Stevenson (2003), and “odor–taste learning” described by Stevenson and Boakes (2004). Therefore, a brief discussion of odor objects follows.
36.4.1 Odor Objects Wilson and Stevenson (2003) suggest that although the peripheral olfactory system may be organized to emphasize analytical processing (Buck and Axel 1991), the primary function of olfactory cortex is the experience-dependent synthesis of odorant components into unique, identifiable odor objects. Critically, the neural representation of the odor object is distinct from the representation of its sensory components, and it is the encoding of the entire pattern of activity that forms a perceptual gestalt. Wilson and Stevenson base this argument on what they view as two cardinal features of olfactory perception: that it is (1) synthetic and (2) experience-bound. 36.4.1.1 Synthesis With regard to synthesis, Wilson and Stevenson (2003) propose that odor elements combine to produce novel odor qualities within which the odor elements are no longer discernible, and thus that olfaction is a synthetic modality akin to color vision. Recognizing that these perceptual features of olfaction are at odds with the analytical organization of the peripheral olfactory system, Wilson and Stevenson argued that an experience-dependent synthesis of odor information from the periphery occurs (Haberly 2001) that creates an emergent neural code in the cortex. Specifically,
726
The Neural Bases of Multisensory Processes
they proposed that neurons in anterior piriform cortex receive signals about odorant features from the olfactory bulb (analytical elements) and initially function as coincident feature detectors (Figure 36.7). The response properties of the cortical neurons then rapidly shift as stimulation continues, resulting in an experience- and odorant-dependent neural signature within an ensemble of neurons, the “odor object.” In support of this view, recent work from Wilson’s laboratory examined neural and perceptual responses to a set of odorant mixture “morphs”—odor mixtures with one or more components of a 10-component (stock) mixture either removed or replaced (Barnes et al. 2008). Electrohphysiological recordings from the rodent brain showed that the neural ensemble activity in the piriform cortex, but not in the olfactory bulb, remained correlated when one of the components was missing, resulting in rats being unable to discriminate the nine-element mixture from the stock mixture. However, when a component was replaced, the piriform ensemble activity decorrelated and discrimination was possible. This suggests that neural ensembles in rodent piriform cortex code odor quality and perform pattern completion to support perceptual stability of odor objects. Similarly, in humans, Gottfried and colleagues used functional magnetic resonance imaging (fMRI) to demonstrate a double dissociation of odor coding in the piriform cortex, with the posterior piriform sensitive to the physiochemical features of odors (i.e., alcohol vs. aldehyde) and not the odor quality (e.g., vegetable vs. fruit), and the anterior piriform sensitive to odor quality and not physiochemical features (Gottfried et al. 2006b). This result indicates that it is the odor object, and not the physical stimulus, that is represented past the initial cortical relay. Since it is likely that conscious perception of odors in humans requires the OFC (Li et al. 2008), it is reasonable to conclude that olfactory perceptions are based on odor objects.
(a)
Isoamyl acetate O
O
Ethyl pentanoate O
(b)
Isoamyl acetate O
O
O
Ethyl pentanoate O O
ORN
ORN
Glomerulus M/T
Glomerulus M/T
aPCX
aPCX
E7 AA AA E7 TRENDS in Neurosciences
FIGURE 36.7 (See color insert.) Synthetic processing in anterior piriform cortex. This figure depicts model of olfactory processing proposed by Wilson and Stevenson. Recent olfactory sensory physiology is consistent with a view of olfactory bulb mitral cells serving a largely feature-detection role in odor processing and neurons in anterior piriform cortex (aPCX) serving as synthetic processors, capable of learning unique combinations of feature input associated with specific odors. (a) In response to a novel odor, neurons of piriform cortex function largely as coincidence detectors for coactive feature input from mitral and tufted (M/T) cells [color-coded for type of feature input they receive from olfactory receptor neurons (ORN)]. As coincidence detectors, they might not be efficient at discriminating different odors within their receptive fields. (b) After rapid perceptual learning and plasticity of association and/or afferent synapses, single neurons of piriform cortex respond to odors as a whole, which enables enhanced discrimination between odors within their receptive fields and allows maintained responsiveness to partially degraded inputs. Odorants in this example are isoamyl acetate (AA) and ethyl pentanoate (E7), although the model also applies to mixtures of multiple odorants. (Figure and caption are reproduced from Wilson, D.A., and Stevenson, R.J., Trends Neurosci., 26, 243–247, 2003. With permission from Elsevier and from Don Wilson.)
A Proposed Model of a Flavor Modality
727
However, the development of unique neural codes representing odors and odor mixtures does not necessarily mean that odor objects are perceptually synthetic. Although studies of odor identification in mixtures by Laing et al. (Laing and Francis 1989; Livermore and Laing 1996) have been cited as evidence of synthesis (Wilson and Stevenson 2003), those results actually show a degree of analytical processing that led Livermore and Laing (1996) to conclude that “. . . olfaction is neither entirely analytic nor synthetic, but . . . contains elements of both” (p. 275). Thus, even though both “expert” and novice subjects have difficulty identifying more than two or three odors in a mixture (Livermore and Laing 1996), the ability to perceive at least some components rules out a purely synthetic process. We therefore favor the view of Jinks and Laing (2001) that olfactory perception is “configurational” in a manner similar to facial perception in vision (Rakover and Teucher 1997). As those authors described it, configurational processing is based on perceptual fusion rather than perceptual synthesis of odor qualities, which creates a gestalt in which “limited analysis” of mixture components is possible. This view is also consistent with Gottfried’s conclusion that emerging data in olfactory neuroscience support the conclusion “that the brain has simultaneous access to the elemental and configural representations” (Gottfried 2009). As will be shown below, this concept has also been applied to flavor perception. 36.4.1.2 Experience There are many examples of experience dependence in the olfactory system (Dade et al. 1998; Dalton et al. 2002; Li et al. 2006; Wilson et al. 2006). One particularly elegant example of olfactory perceptual learning comes from Li and colleagues, who presented subjects with odor enantiomer pairs (mirror image molecules) that were initially indistinguishable (Li et al. 2008). Subsequently, they associated one member of the enantiomer pair with a shock. This resulted in perceptual learning in which subjects became able to distinguish the members of the pair and, consistent with Wilson and Stevenson’s theory, this was accompanied by a divergence in neural response to the odor pair in the anterior piriform cortex. A second example of the role of experience in shaping olfactory perception, which is particularly relevant to this chapter, is that when an odor is experienced with a taste, the odor later comes to smell more like the taste with which it was experienced (Stevenson and Prescott 1995). This has been termed the acquisition of taste-like properties by odors, and is described in depth in Chapter 35 by Prescott. It is likely that this form of perceptual learning plays an important role in the formation of the flavor objects.
36.4.2 Flavor Objects As noted above, flavor perception has been described as resulting from a process of sensory fusion (Auvray and Spence 2008; McBurney 1986). One can introspect and identify the olfactory component of a flavor (e.g., strawberry) as well as the taste component of a flavor (sweet and sour); however, since some percepts (sweet) are communal, the boundary between what is odor and what is taste is not always discernible. Thus, consistent with our view of odor objects, we propose that the elements of flavor are discernible yet fused. Unlike olfaction, which may promote configural processes, taste appears to be primarily analytic (Breslin 2000); tastes do not mix to produce novel percepts. Flavor percepts therefore arise from the binding of neural processing in a distributed pattern of distinct elements that maintain their individual quality to varying degrees (e.g., tastes more so than odor objects). In addition, there is evidence that the response selectivity of bimodal (odor- and tastesensitive) neurons is shaped by the coactivation of unimodal inputs (Rolls 2007). It is therefore proposed that, like the creation of odor objects, the creation of flavor objects depends on a distributed pattern of neural activity that is sculpted by experience. What might this pattern of neural activity look like? It is argued that that it is a distributed circuit including the neural representation of the odor object, unimodal taste cells, unimodal oral somato sensory cells, multimodal cells, and a “binding mechanism” (Figure 36.9). We propose that it is the
728
The Neural Bases of Multisensory Processes
activation of the binding mechanism that mediates oral referral, and that the binding mechanism is required to fuse flavor components into a flavor object. As such, retronasal olfaction has a privileged role in the formation of flavor objects. That is, unless a flavor has been experienced retronasally, it is not incorporated into a flavor object. A prediction that follows from this line of reasoning is that if Stevenson’s basic, taste–odor learning paradigm is repeated, but the conditioning trials are performed with orthonasal rather than retronasal odor stimulation, then the odors should not acquire taste-like properties. This experiment has yet to be carried out.
36.4.3 Encoding of Flavor Objects Upon binding of the associated distributed responses, a flavor object is created and must be encoded in memory. Although it is clear that the interaction between tastes and odors is experience-dependent, the nature of the learning is currently unknown. There are several possibilities. First, odor objects, consisting of the activity of unimodal olfactory cells, could—via associative learning—come to acquire the ability to activate taste cells (Rescorla 1981). In this case, the connection between a unimodal taste-responsive neuron and a unimodal smell-responsive neuron that fire together is strengthened, so that the experience of the odor alone is able to cause the taste-responsive neuron to fire. Based on perception, it is clear that this process would have to be asymmetrical, because although some odors have taste-like characteristics, no tastes have odor-like characteristics. Such an organization is unlikely because bimodal taste–odor neurons with congruent response profiles have been identified, and clearly play a role in flavor processing (Rolls and Baylis 1994). A more likely mechanism would therefore be Hebbian learning (Cruikshank and Weinberger 1996; Hebb 1949), by which odors would acquire the ability to selectively activate bimodal neurons that are simulta neously stimulated by taste cells. This type of model has been proposed by Rolls, who argues that unimodal inputs shape bimodal and multimodal cells by experience, and that the perception of flavor is encoded by the bimodal cells (Rolls et al. 1996). However, a fundamental problem with this model is that bimodal taste–odor neurons (with congruent responses to odors and tastes) fire in response to presentation of unimodal odors and unimodal tastes (Rolls and Baylis 1994), yet the perception of flavor only occurs in response to odors. The only mechanism that can reconcile flavor perception with the known physiology is one in which the multimodal inputs from the oral cavity are encoded together as a flavor object via configural learning (Stevenson et al. 2000a, 2000b). This is not to say that associative learning does not occur in the flavor modality, as it clearly does (Yeomans et al. 2006). Rather, the argument is that the initial encoding of the flavor object proceeds via configural learning. In contrast to associative and Hebbian learning, which are based on strengthening of connections of elements, configural learning involves the encoding of the entire pattern of stimulation (Pearce 2002). In other words, when a mixture is sampled by mouth, a unitary flavor is perceived rather than independent tastes and odors, and it is this unitary percept that is encoded in memory. The empirical foundation for the assertion that the encoding of flavor objects requires configural processes comes from evidence that the enhancement of taste-like properties by odors is highly resistant to extinction and counterconditioning (Harris et al. 2004; Stevenson et al. 2000a, 2000b). If odor–taste exposures strengthen the ability to activate a sensory representation of the taste (as would be the case if associative mechanisms were at play), then repeated exposure to the odor without the taste should lead to the extinction of this association (Rescorla 1981; Rescorla and Freeberg 1978), which does not occur. Counterconditioning is the process by which the association between A and B is replaced by a new association between A and C. For example, in the first phase of the experiment a subject learns that a cue “A” is associated with receipt of food “B”. Once this association is established (e.g., seeing A causes salivation), A is then paired with a new consequence that opposes B (e.g., shock). Some stimuli, such as faces, are resistant to extinction but will display counterconditioning (Baeyens et al. 1989). Stevenson and colleagues reasoned that if the acquisition of taste-like properties by odors is based on configural encoding, counterconditioning should
A Proposed Model of a Flavor Modality
729
not be possible (Stevenson et al. 2000a). To test this possibility they subjected tastes, and odors and tastes and colors, to a counterconditioning paradigm. In a single conditioning session, subjects were exposed to taste–odor and taste–color pairs. At least 24 h later, one taste–odor and one taste–color pair underwent counterconditioning (e.g., the odor and the color were paired with new tastes). As predicted, the odor maintained its original taste and did not acquire the new taste. In contrast, an expectancy measure indicated that subjects expected the colored solution to taste like the counterconditioned taste rather than the originally conditioned taste. One caveat is that, to date, all of the odors used in studies of odor acquisition of taste-like qualities have been rated as having perceptible amounts of the target taste quality before the conditioning trials. Accordingly, it may be more accurate to view the effect of taste–odor learning as an enhancement rather than an acquisition of tastelike qualities. If so, it would not be surprising if pairing odors with other tastes failed to eliminate a taste quality that the odor possessed before the original odor–taste pairing. An obvious next question concerns the nature of odor–somatosensory learning. There are some data to suggest that odors may acquire fat sensations after pairing with a fat-containing milk (Sundqvist et al. 2006). However, fat may be sensed via taste channels (Gilbertson 1998; Gilbertson et al. 1997), and therefore may be perceived as qualities of odors via the same mechanism as other taste qualities. Certainly, sniffed odors do not appear to invoke sensations of texture and temperature. It is likely, therefore, that although configural and synthetic processes may occur during taste– odor perceptual learning, oral somatosensory contributions to the unitary flavor percept may not be learned, and undergo sensory fusion rather than synthesis. Notably, whereas a pure strawberry odor may result in the perception of sweetness, a pure sweet solution, or the texture of a berry, never evokes the perception of strawberry. Together with referral, these observations further support the view that olfaction has a privileged role in the flavor modality. Specifically, food identity, and thus perception of flavor objects, depends primarily on the olfactory channel (Mozell et al. 1969). Although many different foods can be characterized as predominantly sweet, predominantly salty, smooth, or crunchy, in nature there is only one food that is predominantly “strawberry” and one food that is predominantly “peach.” Such an arrangement has clear advantages because it enables organisms to learn to identify many different potential food sources and to associate them with the presence of nutrients (e.g., sugars) or toxins. Moreover, the duality of the olfactory modality allows key sensory signals about the sources of nutrients or toxins to be incorporated into the odor percept during eating and drinking (retronasal olfaction), which then enables them to be sensed at a distance (orthonasal olfaction). Indeed, although humans do not normally use their noses to sniff out food sources, the ability to use orthonasal olfaction to locate a food source is preserved (Porter et al. 2007).
36.5 NEURAL MECHANISMS 36.5.1 The Binding Mechanism According to the proposed model, a neural substrate that orchestrates perceptual binding should exist. Since we propose that binding depends on referral, the substrate should be selectively responsive to retronasal odors. Also, activation of the binding mechanism should be independent of experience, but necessary for configurational learning to take place. Although there is no direct evidence for a region that causes or controls such processes, there is evidence that such a mechanism might exist in the somatomotor mouth area of the cortex. This evidence comes from an fMRI study investigating the effect of odorant route (ortho- vs. retro-) on evoked neural responses (Small et al. 2005). In brief, four odors were presented to subjects orthonasally and retronasally according to the procedure devised by Heilman and Hummel described above, while subjects underwent fMRI scanning. Three of these odors were nonfood odors (lavender, farnesol, and butanol), and one was a food odor (chocolate). When the responses associated with orthonasal delivery were compared to responses associated with retronasal delivery (and vice versa), there was very little differential
730
The Neural Bases of Multisensory Processes
neural response if responses were collapsed across odorant type. The only significant finding was that the oral somatomotor mouth area responded preferentially to retronasal compared to orthonasal odors, regardless of odor identity (Figure 36.8). The response in this region was therefore suggested to reflect olfactory referral to the oral cavity, which was documented to occur during retronasal, but not orthonasal, stimulation. It is not possible to know from this study whether the response in the somatomotor mouth area was the result or the cause of referral. However, there are several factors that point to this region as the likely locus of the binding mechanism. First, the somatomotor mouth region was the only area to show a significant differential response to retronasal compared to orthonasal stimulation. Second, responses there were independent of whether the odor represented a food or a nonfood stimulus. Third, the perception of flavor consistently results in greater responses in this region than does the perception of a tasteless solution. (Cerf-Ducastel and Murphy 2001; de Araujo and Rolls 2004; Marciani et al. 2006), indicating that it is active when flavor percepts are experienced. Fourth, since it is argued that stimulus integration and configural encoding are dependent on oral referral, it follows that the binding mechanism should be localized in the cortical representation of the mouth. We also note that the location of the binding mechanism in the somatomotor mouth area is consistent with Auvery and Spence’s suggestion that the formation of the flavor perceptual modality is dependent on a higher-order cortical binding mechanism (Auvray and Spence 2008). In addition to the initial binding, it is further predicted that neural computations in the somatomotor mouth area play a “permissive” role in enabling the sculpting of multimodal neurons. Specifically, it is proposed that unimodal taste and unimodal smell neurons located in the piriform and anterior dorsal insula sculpt the profiles of bimodal taste/smell neurons located in the ventral anterior insula and the caudal OFC only when there is concurrent activation of the binding substrate (and associated oral referral). This model is consistent with the observations of subthreshold taste–odor summation. Whereas subthreshold summation between orthonasally sensed odor and taste appears, like taste enhancement, to be dependent on perceptual congruency (Dalton et al. 2000), subthreshold summation between retronasally sensed odors and tastes occurs for both congruent and incongruent pairs (Delwiche and Heffelfinger 2005). This suggests that experience is not required for summation of subthreshold taste and retronasal olfactory signals. This observation is consistent with the proposed model because all retronasal odors are predicted to give rise to a response in the somatomotor mouth area. In contrast, orthonasal olfactory experiences do not activate the somatomotor mouth
4
SMM
3 2 1 0
FIGURE 36.8 Preferential activation of somatomotor mouth area by retronasal compared to orthonasal sensation of odors. Functional magnetic resonance imaging data from a study (Small et al. 2005) using the Heilmann and Hummel (2004) method of odorant presentation to study brain response to orthonasal and retronasal odors. Image represents a sagittal section of brain showing response in somatomotor mouth area to retronasal vs. orthonasal sensation of same odors superimposed upon averaged anatomical scans. (Adapted with permission from Small, D.M. et al., Neuron 47, 593–605, 2005.)
731
A Proposed Model of a Flavor Modality
area and are therefore not referred to the mouth. As a result, orthonasal olfactory inputs can only integrate with other oral sensations by reactivating odor objects, which have been previously associated with flavor objects. The role of the somatomotor mouth area in oral referral and in the creation of the flavor modality could be tested in a variety of ways. For example, one could record single-unit responses in the somatomotor mouth area and the OFC in a taste–odor learning paradigm. In humans, one could examine taste–odor learning in patients with specific damage to the somatomotor mouth region or in healthy controls by using transcranial magnetic stimulation to induce temporary “lesions.” The prediction in both cases would be that lesions disrupt oral referral and the enhancement of taste-like properties by odors. Another possibility would be to use a combination of fMRI and network connectivity models such as dynamic causal modeling (Friston et al. 2003; Friston and Price 2001) to test whether response in the somatomotor mouth area to flavors influences responses in regions such as the OFC, and to test whether the magnitude of this influence changes as a function of learning.
36.5.2 Neural Correlates of Flavor Object The binding mechanism in the somatomotor mouth area is proposed to comprise unimodal and multimodal representations of taste, smell, and oral somatosensation that arise when a stimulus is in the mouth. However, the current paucity of data on flavor processing necessitates a hypothetical rather than an empirical description of the proposed network, which is depicted in Figure 36.9. Certainly, odor object representations in the piriform cortex (Gottfried et al. 2006b; Wilson and Stevenson 2004) and OFC (Gottfried et al. 2006a; Schoenbaum and Eichenbaum 1995a, 1995b) are likely to be key components of the flavor network. In addition, regions with overlapping representation of taste, odor, and oral somatosensation are likely to be critical. In humans, there is evidence from functional neuroimaging studies of overlapping responses to taste, smell, and oral somatosensation in the insula and overlying operculum (Cerf-Ducastel and Murphy 2001; de Araujo and Rolls 2004; Marciani et al. 2006; Poellinger et al. 2001; Savic et al. 2000; Small et al. 1999, 2003; Verhagen et al. 2004; Zald et al. 1998), and in the OFC (Francis et al. 1999; Frank et al. 2003; Gottfried et al. 2002a, 2002b, 2006a; Marciani et al. 2006; O’Doherty et al. 2000; Poellinger et al. 2001; Savic et al. 2000; Small et al. 1997, 1999, 2003; Sobel et al. 1998; Zald et al. 1998; Zald and
S GSO Insula GSO GSO G GSO O
Orbital frontal Piriform
GS
O
FIGURE 36.9 Proposed flavor network. A “glass” brain drawing depicting proposed flavor network as gray circles. G, gustation; S, somatosensation; O, olfaction. Arrows indicate point of entry for sensory signal. Dashed line box with GS represents gustatory (G) and somatosensory (S) relays in thalamus. Hatched region indicates insular cortex. Bolded gray circle with S (somatosensory) indicates somatomotor mouth area. Note that gustatory and somatosensory information are colocalized, except in somatomotor mouth area. Unitary flavor percept is formed only when all nodes (gray circles) receive inputs. No single sensory channel (gustatory, olfactory, or somatosensory) can invoke flavor object in isolation.
732
The Neural Bases of Multisensory Processes
Pardo 1997; Zatorre et al. 1992). In accordance with these findings in humans, single-cell recording studies in monkeys have identified both taste- and smell-responsive cells in the insula/operculum (Scott and Plata-Salaman 1999) and OFC (Rolls and Baylis 1994; Rolls et al. 1996). Although not considered traditional chemosensory cortex, the anterior cingulate cortex receives direct projections from the insula and the OFC (Carmichael and Price 1996; Vogt and Pandya 1987), responds to taste and smell (de Araujo and Rolls 2004; de Araujo et al. 2003; Marciani et al. 2006; O’Doherty et al. 2000; Royet et al. 2003; Savic et al. 2000; Small et al. 2001, 2003; Zald et al. 1998; Zald and Pardo 1997), and shows supra-additive responses to congruent taste–odor pairs (Small et al. 2004). Therefore, it is possible that this region contributes to flavor processing. Moreover, a metaanalysis of all independent studies of taste and smell confirmed large clusters of overlapping activation in the insula/operculum, OFC, and anterior cingulate cortex (Verhagen and Engelen 2006). There is also evidence for supra-additive responses to the perception of congruent but not incongruent taste–odor solutions in the anterodorsal insula/frontal operculum, anteroventral insula/ caudal OFC, frontal operculum, and anterior cingulate cortex (McCabe and Rolls 2007; Small et al. 2004). Such supra-additive responses are thought to be a hallmark of multisensory integration (Calvert 2001; Stein 1998). The fact that the supra-additive responses in these regions are experience-dependent strongly supports the possibility that these areas are key nodes of the distributed representation of the flavor object. In support of this possibility, an unpublished work suggests that there are differential responses to food versus nonfood odors, and that such responses occur in the insula, operculum, anterior cingulate cortex, and OFC (Small et al., in preparation). Finally, neuroimaging studies with whole brain coverage frequently report responses in similar regions of the cerebellum (Cerf-Ducastel and Murphy 2001; Savic et al. 2002; Small et al. 2003; Sobel et al. 1998; Zatorre et al. 2000) and amygdala (Anderson et al. 2003; Gottfried et al. 2002a, 2002b, 2006b; Small et al. 2003, 2005; Verhagen and Engelen 2006; Winston et al. 2005; Zald et al. 1998; Zald and Pardo 1997) to taste and smell stimulation, although neither region shows supra-additive responses to taste and smell (Small et al. 2004). We have elected not to include these regions in the proposed network, but acknowledge that there is at least some empirical basis for further investigation of their role in flavor processing. One important but still unresolved question regarding the neurophysiology of flavor perception is whether the process by which an odor object becomes part of a flavor percept results in changes to the odor object (Wilson and Stevenson 2004). Preliminary work suggests that the taste-like properties of food odors are encoded in the same region of insula that encodes sweet taste, and not in the piriform cortex or OFC (Veldhuizen et al. 2010). Subjects underwent fMRI scanning while being exposed to a weak sweet taste (sucrose), a strong sweet taste, two sweet food odors (strawberry and chocolate), and to sweet nonfood odors (rose and lilac). A region of insular cortex was identified that responded to taste and odor sweetness. This finding is consistent with a recent report that insular lesions disrupt taste and odor-induced taste perception (Stevenson et al. 2008). Moreover, it was found that the magnitude of insular response to food, but not nonfood odors, correlated with perceived sweetness. The selectivity of the association between response and sweetness perception strongly suggests that experience with an odor in the mouth as a food or flavor modifies neural activity, and that this occurs in the insula, but not in the piriform cortex. This, in turn, suggests that odor objects represented in the piriform cortex are not modified by flavor learning. In summary, it is proposed that bimodal taste–odor neurons in the OFC and anterior insula are changed during simultaneous perception of taste and retronasally sensed odor, whereas piriform neurons are not. Thus, we hypothesize that the flavor object comprises an unmodified odor object and modified bimodal cells that become associated within a distributed pattern of activation during initial binding. Another critical question for understanding neural encoding of flavor objects is whether the entire active network is encoded or only a subset of key elements. For example, is activation of the somatomotor mouth area required to reexperience the flavor percept? If not, what are the key elements? The answers to these questions are currently unknown. However, as discussed above, it is possible that the taste signal is critical (Davidson et al. 1999; Synder et al. 2007).
A Proposed Model of a Flavor Modality
733
36.6 ALTERNATIVE MODELS Two other neural models for configural encoding of unitary flavor percepts have been proposed. First, Stevenson and Tomiczek (2007) consider the acquisition of taste-like properties in the context of synesthesia. They propose that a multimodal representation of flavors exists in a distributed network that includes the insula, amygdala, and OFC. This idea is similar to the proposed model. However, instead of emphasizing a binding mechanism related to referral, they conceive of taste– odor learning as an implicit synesthesia, with the odor as the inducer and the taste as the concurrent, or illusory, perception. The model hinges on the fact that odors have two pathways to the orbital cortex: a direct projection from the olfactory bulb and one reliant on a relay through the thalamus. It is argued that the thalamocortical pathway, which receives purely olfactory input, allows the multimodal representation activated by the direct pathway to be assigned as olfactory experience, giving rise to the perception that the odor has a taste. The second model was proposed by Verhagen and Engelen (2006), who, like us, highlight the importance of binding. However, they do not focus on oral referral and suggest a role for the hippocampus, or hippocampus-like mechanism, in binding and for the perirhinal cortex in the conscious perception of flavors (Verhagen 2007). Future research will determine which of these models—if any—is correct.
36.7 SUMMARY We propose that during tasting, retronasal olfactory, gustatory, and somatosensory stimuli form a perceptual gestalt—the “flavor object”—the elements of which maintain their individual qualities to varying degrees. The development and experience of this percept is dependent on oral referral, for which neural processing in the somatomotor mouth area is deemed critical. An as-yetunidentified neural mechanism within this region is hypothesized to bind the pattern of responses elicited by flavor stimuli. When the binding mechanism is active, unimodal inputs shape the selectivity of bimodal taste–odor neurons. Flavor objects are then encoded via configural learning as a distributed pattern of response across the somatomotor mouth area, multiple regions of insula and overlying operculum, orbitofrontal cortex, piriform cortex, and anterior cingulate cortex. It is these functionally associated regions that constitute the neural basis of the proposed flavor modality.
REFERENCES Anderson, A. K., K. Christoff, I. Stappen et al. 2003. Dissociated neural representations of intensity and valence in human olfaction. Nat Neurosci 6: 196–202. Ashkenazi, A., and L. E. Marks. 2004. Effect of endogenous attention on detection of weak gustatory and olfactory flavors. Percept Psychophys 66: 596–608. Auvray, M., and C. Spence. 2008. The multisensory perception of flavor. Conscious Cogn 17: 1016–1031. Baeyens, F., P. Eelen, O. Van den Bergh et al. 1989. Acquired affective–evaluative vale: Conservative but not interchangeable. Behav Res Ther 27: 279–287. Barnes, D. C., R. D. Hofacer, A. R. Zaman et al. 2008. Olfactory perceptual stability and discrimination. Nat Neurosci 11: 1378–1380. Bartoshuk, L. M. 1991 Taste, smell, and pleasure. In The hedonics of taste and smell, ed. R. C. Bolles, 15–28. Hillsdale, NJ: Lawrence Erlbaum Associates. Bender, G., T. Hummel, S. Negoias et al. 2009. Separate signals for orthonasal vs. retronasal perception of food but not nonfood odors. Behav Neurosci 123: 481–489. Bradley, R. M., R. H. Smoke, T. Akin et al. 1992. Functional regeneration of glossopharyngeal nerve through micromachined sieve electrode arrays. Brain Res 594: 84–90. Breslin, P. A. 2000. Human Gestation. In The neurobiology of taste and smell, ed. T. E. Finger and W. L. Singer, 423–461. San Diego, CA: Wiley-Liss, Inc. Buck, L., and R. Axel. 1991. A novel multigene family may encode odorant receptors: a molecular basis for odor recognition.[see comment]. Cell 65: 175–187. Bult, J. H., R. A. de Wijk, and T. Hummel. 2007. Investigations on multimodal sensory integration: texture, taste, and ortho- and retronasal olfactory stimuli in concert. Neurosci Lett 411: 6–10.
734
The Neural Bases of Multisensory Processes
Calvert, G. A. 2001. Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cereb Cortex 11: 1110–1123. Carmichael, S. T., and J. L. Price. 1996. Connectional networks within the orbital and medial prefrontal cortex of Macaque monkeys. J Comp Physiol Psychol 371: 179–207. Cerf-Ducastel, B., and C. Murphy. 2001. fMRI activation in response to odorants orally delivered in aqueous solutions. Chem Senses 26: 625–637. Cerf-Ducastel, B., P. F. Van de Moortele, P. MacLeod et al. 2001. Interaction of gustatory and lingual somato sensory perceptions at the cortical level in the human: A functional magnetic resonance imaging study. Chem Senses 26: 371–383. Chale-Rush, A., J. R. Burgess, and R. D. Mattes. 2007. Evidence for human orosensory (taste?) sensitivity to free fatty acids. Chem Senses 32: 423–431. Chandrashekar, J., M. A. Hoon, N. J. Ryba et al. 2006. The receptors and cells for mammalian taste. Nature 444: 288–294. Cruikshank, S. J., and N. M. Weinberger. 1996. Evidence for the Hebbian hypothesis in experience-dependent physiological plasticity of the neocortex: A critical review. Brain Res Rev 22: 191–228. Dade, L. A., M. Jones-Gotman, R. J. Zatorre et al. 1998. Human brain function during odor encoding and recognition. A PET activation study. Ann NY Acad Sci 855: 572–574. Dalton, P., N. Doolittle, and P. A. Breslin. 2002. Gender-specific induction of enhanced sensitivity to odors. Nat Neurosci 5: 199–200. Dalton, P., N. Doolittle, H. Nagata et al. 2000. The merging of the senses: Integration of subthreshold taste and smell. Nat Neurosci 3: 431–432. Davidson, J. M., R. S. T. Linforth, T. A. Hollowood et al. 1999. Effect of sucrose on the perceived flavor intensity of chewing gum. J Agric Food Chem 47: 4336–4340. de Araujo, E., and E. T. Rolls. 2004. Representation in the human brain of food texture and oral fat. J Neurosci 24: 3086–3093. de Araujo, E., E. T. Rolls, M. L. Kringelbach et al. 2003. Taste–olfactory convergence, and the representation of the pleasantness of flavour in the human brain. Eur J Neurosci 18: 2059–2068. de Olmos, J., H. Hardy, and L. Heimer. 1978. The afferent connections of the main and the accessory olfactory bulb formations in the rat: An experimental HRP-study. J Comp Neurol 181: 213–244. Delwiche, J. F., and A. L. Heffelfinger. 2005. Cross-modal additivity of taste and smell. J Sens Stud 20: 512–525. Delwiche, J. F., M. F. Lera, and P. A. S. Breslin. 2000. Selective removal of a target stimulus localized by taste in humans. Chem Senses 25: 181–187. Dravnieks, A. 1985. Atlas of odor character profiles (ASTM Data series DS61). West Conshohocken, PA: American Society for Testing and Materials. Francis, S., E. T. Rolls, R. Bowtell et al. 1999. The representation of pleasant touch in the brain and its relationship with taste and olfactory areas. Neuroreport 10: 435–459. Frank, G. K., W. H. Kaye, C. S. Carter et al. 2003. The evaluation of brain activity in response to taste stimuli—A pilot study and method for central taste activation as assessed by event-related fMRI. J Neurosci Methods 131: 99–105. Frasnelli, J., M. Ungermann, and T. Hummel. 2008. Ortho- and retronasal presentation of olfactory stimuli modulates odor percepts. Chemosens Percept 1: 9–15. Friston, K., L. Harrison, and W. D. Penny. 2003. Dynamic causal modelling. Neuroimage 19: 1273–1302. Friston, K., and C. J. Price. 2001. Dynamic representations and generative models of brain function. Brain Res Bull 54: 275–285. Gilbertson, T. A. 1998. Gustatory mechanisms for the detection of fat. Curr Opin Neurobiol 8: 447–452. Gilbertson, T. A., D. T. Fontenot, L. Liu et al. 1997. Fatty acid modulation of K+ channels in taste receptor cells: Gustatory cues for dietary fat. Am J Physiol 272: C1203–C1210. Gottfried, J. A. 2009. Function follows form: Ecological constraints on odor codes and olfactory percepts. Curr Opin Neurobiol, in press. Gottfried, J. A., R. Deichmann, J. S. Winston et al. 2002a. Functional heterogeneity in human olfactory cortex: An event-related functional magnetic resonance imaging study. J Neurosci 22: 10819–10828. Gottfried, J. A., J. O’Doherty, and R. J. Dolan. 2002b. Appetitive and aversive olfactory learning in humans studied using event-related functional magnetic resonance imaging. J Neurosci 22: 10829–10837. Gottfried, J. A., D. M. Small, and D. H. Zald. 2006a. The chemical senses. In The orbitofrontal cortex, ed. D. H. Zald and S. L. Rauch, 125–171. New York: Oxford Univ. Press. Gottfried, J. A., J. S. Winston, and R. J. Dolan. 2006b. Dissociable codes of odor quality and odorant structure in human piriform cortex. Neuron 49: 467–479.
A Proposed Model of a Flavor Modality
735
Green, B. G. 1977. Localization of thermal sensation: An illusion and synthetic heat. Percept Psychophys 22: 331–337. Green, B. G. 2002. Studying taste as a cutaneous sense. Food Qual Prefer 14: 99–109. Green, B. G. 2003. Studying taste as a cutaneous sense. Food Qual Prefer 14: 99–109. Green, B. G., and B. Gelhard. 1989. Salt as an oral irritant. Chem Senses 14: 259–271. Green, B. G., and H. T. Lawless. 1991. The psychophysics of somatosensory chemoreception in the nose and mouth. In Smell and taste in health and disease, ed. T.V. Getchell, R. L. Doty, L. M. Bartoshuk, and J. B. Snow, 235–253. New York: Raven Press. Haberly, L. B. 2001. Parallel-distributed processing in olfactory cortex: New insights from morphological and physiological analysis of neuronal circuitry. Chem Senses 26: 551–576. Harper, R., D. G. Land, N. M. Griffiths et al. 1968. Odor qualities: A glossary of usage. Br J Psychol 59: 231–252. Harris, J. A., F. L. Shand, L. Q. Carroll et al. 2004. Persistence of preference for a flavor presented in simulta neous compound with sucrose. J Exp Psychol Anim Behav Processes 30: 177–189. Hebb, D. O. 1949. The organization of behavior. New York: Wiley. Heilmann, S., and T. Hummel. 2004. A new method for comparing orthonasal and retronasal olfaction. Behav Neurosci 118: 412–419. Hollingworth, H. L., and A. T. Poffenberger. 1917. The sense of taste. New York: Moffat, Yard. Hummel, T., S. Heilmann, B. N. Landis et al. 2006. Perceptual differences between chemical stimuli presented through the ortho- or retronasal route. Flavor Fragrance J 21: 42–47. Jinks, A., and D. G. Laing. 2001. The analysis of odor mixtures by humans: Evidence for a configurational process. Physiol Behav 72: 51–63. Kadohisa, M., E. T. Rolls, and J. V. Verhagen. 2004. Orbitofrontal cortex: Neuronal representation of oral temperature and capsaicin in addition to taste and texture. Neuroscience 127: 207–221. Kohler, W. 1929. Gestalt psychology. New York: Horace Liveright. Koza, B. J., A. Cilmi, M. Dolese et al. 2005. Color enhances orthonasal olfactory intensity and reduces retronasal olfactory intensity. Chem Senses 30: 643–649. Kringelbach, M. L., and K. C. Berridge. 2009. Oxford handbook: Pleasures of the brain. Oxford: Oxford Univ. Press. Laing, D. G., and G. W. Francis. 1989. The capacity of humans to identify odors in mixtures. Physiol Behav 46: 809–814. Landis, B. N., J. Frasnelli, J. Reden et al. 2005. Differences between orthonasal and retronasal olfactory functions in patients with loss of the sense of smell. Arch Otolaryngol Head Neck Surg 131: 977–981. Li, W., J. D. Howard, T. B. Parrish et al. 2008. Aversive learning enhances perceptual and cortical discrimination of indiscriminable odor cues. Science 319: 1842–1845. Li, W., E. Luxenberg, T. Parrish et al. 2006. Learning to smell the roses: Experience-dependent neural plasticity in human piriform and orbitofrontal cortices. Neuron 52: 1097–1108. Lim, J., and B. G. Green. 2008. Tactile interaction with taste localization: Influence of gustatory quality and intensity. Chem Senses 33: 137–143. Livermore, A., and D. G. Laing. 1996. Influence of training and experience on the perception of multicomponent odor mixtures. J Exp Psychol Hum Percept Perform 46: 809–814. Marciani, L., J. C. Pfeiffer, J. Hort et al. 2006. Improved methods for fMRI studies of combined taste and aroma stimuli. J Neurosci Methods 158: 186–194. McBurney, D. H. 1986. Taste, smell and flavor terminology: Taking the confusion out of confusion. In Clinical measurement of taste and smell, ed. H. L. Meiselman and R. S. Rivkin, 117–124. New York: Macmillan. McCabe, C., and E. T. Rolls. 2007. Umami: A delicious flavor formed by convergence of taste and olfactory pathways in the human brain. Eur J Neurosci 25: 1855–1864. Mozell, M. M. 1970. Evidence for a chromatographic model of olfaction. J Gen Physiol 56: 46–63. Mozell, M. M., B. P. Smith, P. E. Smith et al. 1969. Nasal chemoreception in flavor identification. Arch Otolaryngol 90: 367–373. Murphy, C., W. S. Cain, and L. M. Bartoshuk. 1977. Mutual action of taste and olfaction. Sens Processes 1: 204–211. Murphy, C. A., and W. S. Cain. 1980. Taste and olfaction: Independence vs interaction. Physiol Behav 24: 601–605. O’Doherty, J., E. T. Rolls, S. Francis et al. 2000. Sensory-specific satiety-related olfactory activation of the human orbitofrontal cortex. Neuroreport 11: 399–403. Pearce, J. M. 2002. Evaluation and development of a connectionist theory of configural learning. Anim Learn Behav 30: 73–95.
736
The Neural Bases of Multisensory Processes
Pierce, J., and B. P. Halpern. 1996. Orthonasal and retronasal odorant identification based upon vapor phase input from common substances. Chem Senses 21: 529–543. Plata-Salaman, C. R., T. R. Scott, and V. L. Smith-Swintosky. 1992. Gustatory neural coding in the monkey cortex: l-Amino acids. J Neurophysiol 67: 1552–1561. Plata-Salaman, C. R., V. L. Smith-Swintosky, and T. R. Scott. 1996. Gustatory neural coding in the monkey cortex: Mixtures. J Neurophysiol 75: 2369–2379. Poellinger, A., R. Thomas, P. Lio et al. 2001. Activation and habituation in olfaction—An fMRI study. Neuroimage 13: 547–560. Porter, J., B. Craven, R. M. Khan et al. 2007. Mechanisms of scent-tracking in humans. Nat Neurosci 10: 27–29. Prescott, J. 1999. Flavour as a psychological construct: Implications for perceiving and measuring the sensory qualities of foods. Food Qual Prefer 10: 349–356. Price, J. L. 1973. An autoradiographic study of complementary laminar patterns of termination of afferent fibers to the olfactory cortex. J Comp Neurol 150: 87–108. Pritchard, T. C., R. B. Hamilton, J. R. Morse et al. 1986. Projections of thalamic gustatory and lingual areas in the monkey, Macaca fascicularis. J Comp Neurol 244: 213–228. Pritchard, T. C., R. B. Hamilton, and R. Norgren. 1989. Neural coding of gustatory information in the thalamus of Macaca mulatta. J Neurophysiol 61: 1–14. Rakover, S. S., and B. Teucher. 1997. Facial inversion effects: Parts and whole relationship. Percept Psychophys 59: 752–761. Rescorla, R. A. 1981 Simultaneous associations. In Predictability, Correlation, and Contiguity, ed. P. Harzen and M. D. Zeilner, 47–80. Chichester: Wiley. Rescorla, R. A., and L. Freeberg. 1978. The extinction of within-compound flavor associations. Learn Motiv 9: 411–427. Rolls, E. T. 2007. Sensory processing in the brain related to the control of food intake. Proc Nutr Soc 66: 96–112. Rolls, E. T., and L. L. Baylis. 1994. Gustatory, olfactory, and visual convergence within the primate orbitofrontal cortex. J Neurosci 14: 5437–5452. Rolls, E. T., H. D. Critchley, and A. Treves. 1996. Representation of olfactory information in the primate orbitofrontal cortex. J Neurophysiol 75: 1982–1996. Royet, J. P., J. Plailly, C. Delon-Martin et al. 2003. fMRI of emotional responses to odors: Influence of hedonic valence and judgment, handedness, and gender. Neuroimage 20: 713–728. Rozin, P. 1982. “Taste–smell confusions” and the duality of the olfactory sense. Percept Psychophys 31: 397–401. Sakai, N., T. Kobayakawa, N. Gotow et al. 2001. Enhancement of sweetness ratings of aspartame by a vanilla odor presented either by orthonasal or retronasal routes. Percept Mot Skills 92: 1002–1008. Savic, I., B. Gulyas, and H. Berglund. 2002. Odorant differentiated pattern of cerebral activation: comparison of acetone and vanillin. Hum Brain Mapp 17: 17–27. Savic, I., B. Gulyas, M. Larsson et al. 2000. Olfactory functions are mediated by parallel and hierarchical processing. Neuron 26: 735–745. Schifferstein, H. N. J., and P. W. J. Verlegh. 1996. The role of congruency and pleasantness in odor-induced taste enhancement. Acta Psychol 94: 87–105. Schoenbaum, G., and H. Eichenbaum. 1995a. Information coding in the rodent prefrontal cortex: I. Singleneuron activity in orbitofrontal cortex compared with that in piriform cortex. J Neurophysiol 74: 733–750. Schoenbaum, G., and H. Eichenbaum. 1995b. Information coding in the rodent prefrontal cortex: II. Ensemble activity in orbitofrontal cortex. J Neurophysiol 74: 751–762. Scott, T. R., and C. R. Plata-Salaman. 1991 Coding of Taste Quality. In Smell and taste in health and disease, ed. T. V. Getchel. New York: Raven Press. Scott, T. R., and C. R. Plata-Salaman. 1999. Taste in the monkey cortex. Physiol Behav 67: 489–511. Shikata, H., D. B. McMahon, and P. A. Breslin. 2000. Psychophysics of taste lateralization on anterior tongue. Percept Psychophys 62: 684–694. Simon, S. A., I. de Araujo, J. R. Stapleton et al. 2008. Multisensory processing of gustatory stimuli. Chemosens Percept, in press. Small, D. M. 2008. Flavor and the formation of category-specific processing in olfaction. Chemosens Percept 1: 136–146. Small, D. M., J. Gerber, Y. E. Mak et al. 2005. Differential neural responses evoked by orthonasal versus retronasal odorant perception in humans. Neuron 47: 593–605.
A Proposed Model of a Flavor Modality
737
Small, D. M., M. D. Gregory, Y. E. Mak et al. 2003. Dissociation of neural representation of intensity and affective valuation in human gustation. Neuron 39: 701–711. Small, D. M., M. Jones-Gotman, R. J. Zatorre et al. 1997. Flavor processing: More than the sum of its parts. Neuroreport 8: 3913–3917. Small, D. M., and J. Prescott. 2005. Odor/taste integration and the perception of flavor. Exp Brain Res 166: 345–357. Small, D. M., J. Voss, Y. E. Mak et al. 2004. Experience-dependent neural integration of taste and smell in the human brain. J Neurophysiol 92: 1892–1903. Small, D. M., D. H. Zald, M. Jones-Gotman et al. 1999. Human cortical gustatory areas: A review of functional neuroimaging data. Neuroreport 10: 7–14. Small, D. M., R. J. Zatorre, A. Dagher et al. 2001. Changes in brain activity related to eating chocolate: From pleasure to aversion. Brain 124: 1720–1733. Smith-Swintosky, V. L., C. R. Plata-Salaman, and T. R. Scott. 1991. Gustatory neural coding in the monkey cortex: stimulus quality. J Neurophysiol 66: 1156–1165. Sobel, N., R. M. Khan, A. Saltman et al. 1999. Olfaction: The world smells different to each nostril. Nature 402: 35. Sobel, N., V. Prabhakaran, J. E. Desmond et al. 1998. Sniffing and smelling: Separate subsystems in the human olfactory cortex. Nature 392: 282–286. Sobel, N., V. ������������������������������������������������������������������������������������������������ Prabhakaran������������������������������������������������������������������������������������� , C. A. Hartley et al. 1998. Odorant-induced �������������������������������������������������������� and sniff-induced activation in the cerebellum of the human. J Neurosci 18: 8990–9001. Stein, B. E. 1998. Neural mechanisms for synthesizing sensory information and producing adaptive behaviors. Exp Brain Res 123: 124–135. Stevenson, R. J. 2001. Associative learning and odor quality perception: How sniffing an odor mixture can alter the smell of its parts. Learn Motiv 32: 154–177. Stevenson, R. J., and R. A. Boakes. 2004 Sweet and sour smells: Learned synesthesia between the senses of taste and smell. In The handbook of multisensory processes, ed. G. A. Calvert, C. Spence, and B. E. Stein, 69–83. Boston: MIT Press. Stevenson, R. J., R. A. Boakes, and J. P. Wilson. 2000a. Counter-conditioning following human odor-taste and color-taste learning. Learn Motiv 31: 114–127. Stevenson, R. J., R. A. Boakes, and J. P. Wilson. 2000b. Resistance to extinction of conditioned odor perceptions: Evaluative conditioning is not unique. J Exp Psychol Learn Mem Cogn 26: 423–440. Stevenson, R. J., L. A. Miller, and Z. C. Thayer. 2008. Impairments in the perception of odor-induced tastes and their relationship to impairments in taste perception. J Exp Psychol Hum Percept Perform 34: 1183–1197. Stevenson, R. J., and J. Prescott. 1995. The acquisition of taste properties by odors. Learn Motiv 26: 433–455. Stevenson, R. J., J. Prescott, and R. A. Boakes. 1999. Confusing tastes and smells: how odours can influence the perception of sweet and sour tastes. Chem Senses 24: 627–635. Stevenson, R. J., and C. Tomiczek. 2007. Olfactory-induced synesthesias: A review and model. Psychol Bull 133: 294–309. Sun, B. C., and B. P. Halpern. 2005. Identification of air phase retronasal and orthonasal odorant pairs. Chem Senses 30: 693–706. Sundqvist, N. C., R. J. Stevenson, and I. R. J. Bishop. 2006. Can odours acquire fat-like properties? Appetite 47: 91–99. Synder, D. J., C. J. Clark, F. A. Catalanotto et al. 2007. Oral anesthesia specifically impairs retronasal olfaction. Chem Senses 32: A15. Tastevin, J. 1937. En partant de l’experience d’Aristote. Encephale 1: 57–84, 140–158. Tichener, E. B. 1909. A textbook of psychology. New York: Macmillan. Todrank, J., and L. M. Bartoshuk. 1991. A taste illusion: Taste sensation localized by touch. Physiol Behav 50: 1027–1031. Travers, J. B. 1988. Efferent projections from the anterior nucleus of the solitary tract of the hamster. Brain Res 457: 1–11. Turner, B. H., K. C. Gupta, and M. Mishkin. 1978. The locus and cytoarchitecture of the projection areas of the olfactory bulb in Macaca mulatta. J Comp Neurol 177: 381–396. Veldhuizen, M. G., D. Nachtigal, L. Teulings et al. 2010. The insular taste cortex contributes to odor quality coding. Frontiers in Human Neuroscience 21:4. Pii: 58 Verhagen, J. V. 2007. The neurocognitive bases of human multimodal food perception: Consciousness. Brain Res Rev 53: 271–286. Verhagen, J. V., and L. Engelen. 2006. The neurocognitive bases of human multimodal food perception: Sensory integration. Neurosci Biobehav Rev 30: 613–650.
738
The Neural Bases of Multisensory Processes
Verhagen, J. V., M. Kadohisa, and E. T. Rolls. 2004. Primate insular/opercular taste cortex: Neuronal representations of the viscosity, fat texture, grittiness, temperature, and taste of foods. J Neurophysiol 92: 1685–1699. Vogt, B. A., and D. Pandya. 1987. Cingulate cortex of the rhesus monkey: II. Cortical afferents. J Comp Neurol 262: 271–289. Voirol, E., and N. Dagnet. 1986. Comparative study of nasal and retronasal olfactory perception. Food Sci Technol 19: 316–319. Welge-Lussen, A., J. Drago, M. Wolfensberger et al. 2005. Gustatory stimulation influences the processing of intranasal stimuli. Brain Res 1038: 69–75. Welge-Lussen, A., A. Husner, M. Wolfensberger et al. 2009. Influence of simultaneous gustatory stimuli on orthonasal and retronasal olfaction. Neurosci Lett 454: 124–128. Whitehead, M. C. 1990. Subdivisions and neuron types of the nucleus of the solitary tract that project to the parabrachial nucleus in the hamster. J Comp Neurol 301: 554–574. Whitehead, M. C., and M. E. Frank. 1983. Anatomy of the gustatory system in the hamster: Central projections of the chorda tympani and the lingual nerve. J Comp Neurol 220: 378–395. Wilson, D. A., M. Kadohisa, and M. L. Fletcher. 2006. Cortical contributions to olfaction: Plasticity and perception. Semin Cell Dev Biol 17: 462–470. Wilson, D. A., and R. J. Stevenson. 2003. The fundamental role of memory in olfactory perception. Trends Neurosci 26: 243–247. Wilson, D. A., and R. J. Stevenson. 2004. The fundamental role of memory in olfactory perception. Trends Neurosci 25: 243–247. Winston, J. S., J. A. Gottfried, J. M. Kilner et al. 2005. Integrated neural representations of odor intensity and affective valence in human amygdala. J Neurosci 25: 8903–8907. Yamamoto, T., N. Yuyama, T. Kato et al. 1985. Gustatory responses of cortical neurons in rats: II. Information processing of taste quality. J Neurophysiol 53: 1370–1386. Yeomans, M. R., S. Mobini, T. D. Elliman et al. 2006. Hedonic and sensory characteristics of odors conditioned by pairing with tastants in humans. J Exp Psychol Anim Behav Processes 32: 215–228. Zald, D. H., J. T. Lee, K. W. Fluegel et al. 1998. Aversive gustatory stimulation activates limbic circuits in humans. Brain 121: 1143–1154. Zald, D. H., and J. V. Pardo. 1997. Emotion, olfaction, and the human amygdala: amygdala activation during aversive olfactory stimulation. Proc Natl Acad Sci U S A 94: 4119–4124. Zatorre, R. J., M. Jones-Gotman, A. C. Evans et al. 1992. Functional localization and lateralization of human olfactory cortex. Nature 360: 339–340. Zatorre, R. J., M. Jones-Gotman, and C. Rouby. 2000. Neural mechanisms involved in odor pleasantness and intensity judgments. Neuroreport 11: 2711–2716.
37
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor Massimiliano Zampini and Charles Spence
CONTENTS 37.1 Introduction........................................................................................................................... 739 37.2 Multisensory Interactions between Visual and Flavor Perception........................................ 740 37.2.1 Role of Color Cues on Multisensory Flavor Perception............................................ 740 37.2.2 Color-Flavor Interactions: Possible Role of Taster Status.......................................... 743 37.2.3 Color–Flavor Interactions: Possible Role of Learned Associations between Colors and Flavors..................................................................................................... 745 37.2.4 Color–Flavor Interactions: Neural Correlates........................................................... 747 37.2.5 Interim Summary...................................................................................................... 748 37.3 Role of Auditory Cues in the Multisensory Experience of Foodstuffs................................. 749 37.3.1 Effect of Sound Manipulation on the Perception of Crisps....................................... 749 37.3.2 Effect of Auditory Cues on the Perception of Sparkling Water................................ 751 37.4 Conclusions............................................................................................................................ 752 References....................................................................................................................................... 753
37.1 INTRODUCTION Our perception of the objects and events that fill the world in which we live depends on the integration of the sensory inputs that simultaneously reach our various sensory systems (e.g., vision, audition, touch, taste, and smell). Perhaps the best-known examples of genuinely multisensory experiences come from our perception and evaluation of food and drink. The average person would say that the flavor of food derives primarily from its taste in the mouth. They are often surprised to discover that there is a strong “nasal” role in the perception of flavor. In fact, it has been argued that the majority of the flavor of food actually comes from its smell (e.g., Cain 1977; Murphy and Cain 1980; Rozin 1982).* Our perception of food and drink, however, is not simply a matter of combining gustatory * For example, coffee and tea are indistinguishable (with both having a bitter taste) if drunk while holding one’s nose pinched shut. Whereas the taste of a lemon only actually consists of sour, sweet, and bitter components, most of the flavor we normally associate with the taste of a lemon actually comes from the terpene aroma, one of the constituent chemicals that stimulate the olfactory mucosa via the nasopharynx (i.e., retronasal olfaction). Odor molecules may reach the receptors in the olfactory epithelium (i.e., the area located in the rear of the nasal cavity) traveling inward from the anterior nares or through the posterior nares of the nasopharynx. Most typically, orthonasal olfaction occurs during respiratory inhalation or sniffing, whereas retronasal olfaction occurs during respiratory exhalation or after swallowing. People usually report experiencing odors as originating from the external world when perceived orthonasally, and as coming from the mouth when perceived retronasally (Rozin 1982). Importantly, the latest cognitive neuroscience evidence has highlighted the fact that somewhat different neural structures are used to process these two kinds of olfactory information (Small et al. 2005, 2008; see also Koza et al. 2005).
739
740
The Neural Bases of Multisensory Processes
and olfactory food cues (although this is undoubtedly very important; Dalton et al. 2000). For instance, our evaluation of the pleasantness of a particular foodstuff can be influenced not only by what it looks, smells, and tastes like, but also what it sounds like in the mouth (think, for example, of the auditory sensations associated with biting into a potato chip or a stick of celery; see Spence and Zampini 2006, for a review). The feel of a foodstuff (i.e., its oral–somatosensory attributes) is also very important; the texture, temperature, viscosity, and even the painful sensations we experience when eating hot foods (e.g., chilli peppers) all contribute to our overall multisensory experience of foodstuffs (e.g., Bourne 1982; Lawless et al. 1985; Tyle 1993). Flavor perception is also influenced by the interactions taking place between oral texture and both olfactory and gustatory cues (see also Bult et al. 2007; Christensen 1980a, 1980b; Hollowood et al. 2002). Given the multisensory nature of our perception of food, it should come as little surprise that many studies have been conducted in order to try and understand the relative contribution of each sense to our overall evaluation of food (e.g., see Delwiche 2004; Spence 2002; Stevenson 2009; Stillman 2002). In this chapter, we review the contribution of visual and auditory cues to the multisensory perception of food. Moreover, any possible influence of visual and auditory aspects of foods and drinks might take place at different stages of the food experience. Visual cues are perceived when foodstuffs are outside of the mouth. Auditory cues are typically primarily perceived when we are actually consuming food.
37.2 MULTISENSORY INTERACTIONS BETWEEN VISUAL AND FLAVOR PERCEPTION 37.2.1 Role of Color Cues on Multisensory Flavor Perception Over the past 80 years or so, many researchers have been interested in the role of visual information in the perception of foodstuffs (Moir 1936). It seems that the visual appearance of food and drink can have a profound impact on our perception and evaluation of flavor. The role of color cues on people’s flavor perception has been investigated in many different studies, although the majority of the research has been published in food science journals rather than psychology or neuroscience (for reviews, see Clydesdale 1993; Delwiche 2004; Spence et al. 2010; Stevenson 2009). The majority of these studies have shown that people’s perception of a variety of different foods and drinks can be dramatically modified by changing the color of food or drink items (e.g., DuBose et al. 1980; Duncker 1939; Garber et al. 2000; Johnson and Clydesdale 1982; Morrot et al. 2001; Philipsen et al. 1995; Roth et al. 1988; Stillman 1993; Wheatley 1973; Zampini et al. 2007; Zellner and Durlach 2003). One of the most dramatic early empirical demonstrations of the strong link between color and the pleasure we derive from food (and/or our appetitive responses to food) was reported by Wheatley (1973). He described a situation in which a group of people ate a meal of steak, French fries, and peas under color-masking lighting conditions. Halfway through the meal, normal lighting was restored revealing that the steak was colored blue, the French fries had been colored green, and the peas were red. According to Wheatley’s description, the mere sight of the food was sufficient to induce nausea in many of his dinner guests. Such results, although anecdotal, do at least hint at the powerful influence that visual cues can have over our appetitive responses. Color has also been shown to exert a significant effect on our ability to recognize specific foodstuffs. For example, in one oft-cited study, DuBose et al. (1980) presented participants with drinks incorporating a variety of different color–flavor combinations (the flavored solutions were colored either appropriately, inappropriately, or else were presented as colorless solutions). DuBose et al. found that participants’ identification of the flavors of many of the drinks was significantly influenced by their color. In particular, the participants were less accurate in identifying the flavor of fruit-flavored beverages when they were unaware of the appropriate color. For instance, 40% of the participants reported that a cherry-flavored beverage actually tasted of orange when it had been inappropriately colored orange (compared to 0% orange-flavor responses when the drink was appropriately colored red; a similar effect was reported for the lime-flavored beverage). Many other
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
741
researchers have reported a similar visual modulation of participants’ odor discrimination/identification responses (e.g., Blackwell 1995; Davis 1981; Koza et al. 2005; Morrot et al. 2001; Stevenson and Oaten 2008; Zellner et al. 1991; Zellner and Kautz 1990; Zellner and Whitten 1999). Although the potential influence of color cues on people’s flavor identification responses is by now well documented, the evidence regarding the impact of changes in color intensity on perceived flavor intensity is rather less clear. For example, ambiguous results have been reported in studies in which the participants had to rate the intensity of the flavor of solutions that varied in the intensity of the color that had been added to the solutions (e.g., DuBose et al. 1980; Johnson and Clydesdale 1982; Johnson et al. 1983; see Clydesdale 1993, for a review). For example, DuBose et al. found that overall flavor intensity was affected by color intensity, with more intense coloring resulting in stronger flavor evaluation responses by participants for the orange-flavored, but not for the cherryflavored beverages, tested in their study. However, in other studies, the concentration of coloring in the solutions did not influence participants’ ratings, regardless of whether the solutions were appropriately or inappropriately colored (e.g., Alley and Alley 1998; Frank et al. 1989; Zampini et al. 2007). Researchers have also investigated the effect of varying the intensity of the color on the perceived intensity of tastes and odors separately. For instance, the addition of a red coloring to cherryand strawberry-flavored sucrose solutions has been found to increase the perceived sweetness of these solutions in certain studies (Johnson and Clydesdale 1982; Johnson et al. 1983). Maga (1974) hypothesized that the influence of colors on sweetness perception in humans might be particularly strong for colors that are typically associated with the natural ripening of fruits (e.g., yellow, red; see also Lavin and Lawless 1998; Strugnell 1997). By contrast, researchers have reported that the addition of color has no effect on the perceived saltiness of foods such as soups (Gifford and Clydesdale 1986; Gifford et al. 1987; Maga 1974), perhaps because (in contrast to sweet foods) there are no particular colors associated with the salt content of a food (i.e., salt is ubiquitous to many different kinds, and hence colors, of food; see Maga 1974 and Lavin and Lawless 1998, on this point). In one of the earliest studies to have been published in this area, Pangborn (1960) reported that people reported green-colored pear nectar as being less sweet than colorless pear nectar. However, Pangborn and Hansen (1963) failed to replicate these results. Although they found that green coloring had no effect on the perceived sweetness of pear nectar, its addition did give rise to an overall increase in sensitivity to sweetness when color was added to the solutions. Similarly, for the pairing of color with odor, Zellner and Kautz (1990) reported that solutions were rated as having a more intense odor when color had been added to the solutions than when it was absent, regardless of the appropriateness of the color–odor match. In fact, Zellner and Kautz noted that the participants in their study simply refused to believe that colored and uncolored solutions of equal odor intensity were actually equally strong. The explanation for these contradictory results regarding the influence of variations in color intensity on the perception of taste, odor, and flavor (i.e., odor + taste) intensity is far from obvious (see Shankar et al. 2010). For example, Chan and Kane-Martinelli (1997) reported that the perceived flavor intensity for certain foods (such as chicken bouillon) was higher with the commercially available color sample than when the samples were given in a higher-intensity color (see also Clydesdale 1993, on this point). Note also that if the discrepancy between the intensity of the color and the intensity of the flavor is too great, participants may experience a disconfirmation of expectation (or some form of dissonance between the visually and gustatorily determined flavor intensities) and the color and taste cues may no longer be linked (e.g., Clydesdale 1993; cf. Ernst and Banks 2002; Yeomans et al. 2008). Another potentially important issue in terms of assessing interactions between color and flavor is the role of people’s awareness of the congruency of the color–flavor pairings used (Zampini et al. 2007). In fact, in most of the research that has been published to date on the effects of color cues on human flavor perception, the participants were not explicitly informed that the flavors of the solutions they were evaluating might not be paired with the appropriately colored solutions (e.g., see
742
The Neural Bases of Multisensory Processes
DuBose et al. 1980; Johnson and Clydesdale 1982; Morrot et al. 2001; Oram et al. 1995; Philipsen et al. 1995; Roth et al. 1988; Stillman 1993; Zellner and Durlach 2003). One might therefore argue that the visual modulation of flavor perception reported in many of these previous studies simply reflects a decisional bias introduced by the obvious variation in the color cues (cf. the literature on the effectiveness of the color of medications on the placebo effect; e.g., de Craen et al. 1996; see also Engen 1972), rather than a genuine perceptual effect (i.e., whereby the color cues actually modulate the perception of flavor itself; although see also Garber et al. 2001, 2008, for an alternative perspective from the field of marketing). For example, if participants found it difficult to correctly identify the flavor of the food or drink on the basis of gustatory and olfactory cues in flavor discrimination tasks, then they may simply have based their responses on the more easily discriminable color cues instead. Therefore, it might be argued that participants’ judgments in these previous studies may simply have been influenced by decisional processes instead. In their study, Zampini et al. (2007) tried to reduce any possible influence of response biases that might emerge when studying color–flavor interactions by explicitly informing their participants that the color–flavor link would often be misleading (i.e., that the solutions would frequently be presented in an inappropriate color; cf. Bertelson and Aschersleben 1998). This experimental manipulation was introduced in order to investigate whether the visual cues would still influence human flavor perception when the participants were aware of the lack of any meaningful correspondence between the color and the flavor of the solutions that they were tasting. The participants in Zampini et al.’s study were presented with strawberry, lime, orange fruit–flavored solutions or flavorless solutions, and requested to identify the flavor of each solution. Each of the different flavors was associated equiprobably with each of the different colors (red, green, orange, and colorless). This meant that, for example, the strawberry-flavored solutions were just as likely to be colored red, green, orange, as to be presented as a colorless solution. Therefore, each of the solutions might have been colored either “appropriately” or “inappropriately” (consisting of incongruently colored or colorless solutions). The participants were informed that they would often be tricked by the color of the solutions that would often not correspond to the flavor typically associated with that color. The most important finding to emerge from Zampini et al.’s (2007) study was that color information has a strong impact on flavor identification even when participants were informed that the colors of the drinks that they were testing were often misleading. In particular, flavors associated with appropriate colors (e.g., lime flavor–green color; orange flavor–orange color) or colorless were recognized far more accurately than when they were presented with an inappropriate coloring (i.e., lime-flavored drinks that were colored either red or orange; orange-flavored drinks that were colored either green or red). These results therefore show that inappropriate coloring tends to lead to impaired flavor discrimination responses, whereas appropriate coloring does not necessarily improve the accuracy of participants’ flavor discrimination responses (at least when compared to the flavor discrimination accuracy for the colorless solutions). Interestingly, however, no significant effect of color was shown for the strawberry-flavored solutions. That is, the inappropriate coloring of the strawberry-flavored solutions (i.e., when those solutions were colored green or orange) did not result in a significant reduction in the participants’ ability to recognize the actual strawberry flavor. One possible explanation for this result is that those flavors that are more strongly associated with a particular color are more difficult to identify when presented in inappropriately colored solutions (see Shankar et al. 2009). In fact, Zampini et al. (2007, Experiment 1; see Table 37.1) has shown that the link between color and a specific flavor was stronger for the orange- and green-colored solutions than for the red-colored solutions. That is, the participants in their study more often matched the orange color with the flavor of orange and the green color with the flavor of lime. By contrast, the red color was associated with strawberry, raspberry, and cherry flavors. Whatever the reason for the difference in the effect of the various colors on participants’ flavor discrimination responses, it is important to note that Zampini et al.’s (2007) results nevertheless show that people can still be misled by the inappropriate coloring of a solution even if they know that the color does not provide a reliable guide to the flavor of the solution. By contrast, the participants in
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
743
TABLE 37.1 Flavors Most Frequently Associated with Each Colored Solution in Zampini et al.’s (2007, Experiment 1) Study Color Green Orange Yellow Blue Gray Red Colorless
Most Associated Flavors Lime (69%)a Orange (91%)a Lemon (89%)a Spearmint (86%)a Black currant (53%), licorice (40%)a Strawberry (46%), raspberry (27%), cherry (27%) Flavorless (51%)a
Source: Zampini, M. et al., Food Qual. Prefer., 18, 975–984, 2007. With permission. a Significant color–flavor association tested using χ2 analysis.
the majority of previous studies in this area (e.g., DuBose et al. 1980; Johnson and Clydesdale 1982; Morrot et al. 2001; Oram et al. 1995; Philipsen et al. 1995; Roth et al. 1988; Stillman 1993; Zellner and Durlach 2003) were not explicitly informed that the flavors of the solutions might not be paired with the appropriately colored solutions. Zampini et al.’s results therefore suggest that the modulatory role of visual information on multisensory flavor perception is robust enough to override any awareness that participants might have (e.g., as informed by the experimenter) concerning the lack of congruency between the color and the flavor of the solutions that they taste. However, it would be interesting in future research to investigate whether knowing that there is no meaningful relationship between the color of the solutions and their flavor would modulate (i.e., reduce vs. enhance) the influence of colors on flavors perception, as compared to the situation in which the participants are not given any prior information about whether the colors are meaningfully related to the flavors.
37.2.2 Color-Flavor Interactions: Possible Role of Taster Status Given recent interest in the consequences of individual differences in taster status on flavor perception (see Drewnowski 2003, for a review), Zampini and his colleagues (2008) wanted to investigate whether any possible multisensory effects of visual (i.e., the colors of the solutions) and/or gustatory (i.e., the presence vs. absence of fruit acids) cues on flavor perception might be affected by the taster status of their participants. Previous research has demonstrated the existence of three subgroups of tasters (nontasters, medium tasters, and supertasters), varying in their sensitivity to 6-n-propyl thiouracil (PROP; e.g., Bartoshuk et al. 1992) as well as to a variety of other tastants (e.g., Prescott et al. 2001; Reed 2008).* Surprisingly, however, none of the previous studies that has investigated individual differences in taste perception has as yet looked at the possible influence of taster status on the visual modulation of (or dominance over) flavor perception. In Zampini et al.’s (2008) study, the taster status of each participant was initially assessed using suprathreshold PROP filter paper strips (see Bartoshuk et al. 1994). The participants had to place the PROP filter paper strips on their tongue and then rate the intensity of the sensation of bitterness * The individual differences in taste sensitivity most extensively studied are those for the bitterness intensity of PROP [and phenylthiocarbamide (PTC) in earlier work]. Supertasters, medium tasters, and nontasters rate the bitterness of PROP as very to intensely strong, moderate to strong, and weak, respectively. Research using taste solutions have identified other differences in the three taster groups (see Prescott et al. 2004). Different PROP taster groups reported different taste intensities and liking of other bitter, salty, sweet, and fat-containing substances. The three different PROP taster groups are known to possess corresponding genetic differences. In particular, studies of taste genetics have revealed the exis tence of multiple bitterness receptor genes (Kim et al. 2004; see also Bufe et al. 2005; Duffy 2007).
744
The Neural Bases of Multisensory Processes
that they experienced on a Labelled Magnitude Scale (e.g., Green et al. 1993). The participants were then classified into one of three taster groups: nontasters, medium tasters, and supertasters based on the cutoff values (non-tasters <10.90; 10.91 < medium tasters < 61.48; supertasters > 61.49; see also Essick et al. 2003, for a similar criterion). Zampini et al.’s findings revealed that the modulatory cross-modal effect of visual cues on people’s flavor identification responses were significantly more pronounced in the nontasters than in the medium tasters, who, in turn, were influenced to a greater extent by visual cues on their flavor identification responses than were the supertasters (see Figure 37.1). In particular, the nontasters (and, to a lesser extent, medium tasters) identified the flavors of the solutions significantly more accurately when they were colored appropriately than when they were colored inappropriately (or else were presented as colorless solutions). By contrast, the supertasters identified the flavors of the solutions more accurately overall, and their performance was not affected by the colors of the solutions. Zampini et al.’s (2008) results are consistent with recent accounts of sensory dominance derived from studies of cross-modal interactions between tactile, visual, and auditory stimuli (see, e.g., Alais and Burr 2004; Ernst and Banks 2002). Ernst and Banks used the maximum likelihood
Black currant
25 0
Yellow Gray Orange Red Colorless
75 50 25 0
Color of the solutions
Correct responses (%)
Medium tasters
Correct responses (%)
100 75 50 25 0
Yellow Gray Orange Red Colorless
Correct responses (%)
Correct responses (%)
Supertasters
75 50 25 0
Yellow Gray Orange Red Colorless
Color of the solutions
Yellow Gray Orange Red Colorless
75 50 25 0
Color of the solutions
75 50 25 0
Color of the solutions
Yellow Gray Orange Red Colorless
75 50 25 0
Color of the solutions
Yellow Gray Orange Red Colorless
Color of the solutions
100
100 75 50 25 0
Yellow Gray Orange Red Colorless
100
100
Color of the solutions
100
Correct responses (%)
50
100
Correct responses (%)
75
Flavorless 100
Correct responses (%)
Correct responses (%)
Nontasters
Correct responses (%)
100
Orange
Yellow Gray Orange Red Colorless
Color of the solutions
75 50 25 0
Yellow Gray Orange Red Colorless
Color of the solutions
FIGURE 37.1 Mean flavor intensity ratings for three groups of participants (nontasters, medium tasters, and supertasters) for blackcurrant, orange, and flavorless solutions presented in Zampini et al.’s (2008) study of effects of color cues on multisensory flavor perception in humans. Black columns represent solutions where fruit acids had been added and white columns represent solutions without fruit acids. Error bars represent between-participants standard errors of the means. (Reprinted from Zampini, M. et al., Food Qual. Prefer., 18, 975–984, 2007. With permission.)
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
745
estimation (MLE) approach to argue that the contribution of a given sensory input to multisensory perception is determined by weighting the sensor estimates in each sensory modality by the noise (or variance) present in that modality. It could be argued that in Zampini et al.’s study, the estimates of the flavors of the fruit-flavored solutions by the nontasters were simply more variable (i.e., their judgments were less sensitive) than those of either the medium tasters or the supertasters. As a consequence, given the presumably uniform levels of visual discriminability across these three groups of participants, the MLE account would predict that nontasters should weigh the visual cues more highly when making their responses than the medium tasters, who in turn should weigh the gustatory cues less highly than the supertasters, just as we observed. It will be an interesting question for future research to determine whether flavor discrimination responses can be modeled using the MLE approach. It is important to note here that such an analysis may also be able to reveal whether there are any underlying attentional biases (to weight information from one sensory modality more highly than information from another modality) that may be present in the different taster groups (cf. Battaglia et al. 2003). Moreover, it is interesting to consider at this point that although more than 100 studies examining visual contributions to flavor perception have been published over the past 80 years, Zampini et al.’s study represents the first attempt to take the taster status of participants into consideration when analyzing their results. The results of Zampini et al.’s study clearly demonstrate that taster status plays an important role in modulating the cross-modal contribution of visual cues to flavor perception in fruit-flavored beverages.*
37.2.3 Color–Flavor Interactions: Possible Role of Learned Associations between Colors and Flavors The influence of color on flavor perception may (and, some might say, must) be due to learned associations between specific colors and particular flavors. Some of these associations are fairly universal. For example, the flavor and color association of ripe fruits (Maga 1974; see also Morrot et al. 2001). By contrast, other color–flavor association might be context dependent and so might be different in different parts of the world (see Duncker 1939; Lucchelli et al. 1978; Shankar et al. 2010; Spence 2002; Wheatley 1973). For instance, lemons are typically yellow in Europe, whereas in Colombia they are mostly dark green. Therefore, a particular color–flavor pairing that seems congruent to people in a certain part of the world may seem incongruent to those who live elsewhere (cf. Demattè et al. 2006, 2009). Seventy years ago, Duncker (1939) considered the role of individual differences in learning such associations. The participants in his early study were presented with milk chocolate that had been colored brown or white (a new color for chocolate at the time the study was conducted) both with the same flavor. Participants who had never seen white chocolate before reported that the white chocolate had a different flavor to the brown-colored chocolate. The only participant who had come across white chocolate before taking part in the study reported that the different colored chocolates all tasted the same. Although it should be noted that this early study had a number of methodological limitations (i.e., only a small number of participants were tested, not to mention the fact that no statistical analysis of the data was reported), the results nevertheless highlight the possible importance of prior experience and knowledge in modulation color–flavor interactions. A follow-up of Duncker’s (1939) seminal study has been conducted recently by Levitan et al. (2008; see also Shankar et al. 2009). The researchers in this study investigated whether people’s prior beliefs concerning specific color–flavor associations might not affect their ability to discriminate the flavor of colored sugar-coated chocolate sweets, Smarties (Nestlé). Smarties are readily available in eight different colors but only two different flavors, as test stimuli: the orange Smarties * However, it should also be noted that the relatively small number of participants were tested in each category (i.e., four non-tasters, five medium tasters, and five supertasters), thus placing a caveat in terms of generalizing from Zampini et al.’s (2008) findings. In future studies, taster status should therefore be assessed with much larger sample sizes.
746
The Neural Bases of Multisensory Processes
that are produced for the UK market contain orange-flavored chocolate, whereas all of the other colors contain unadulterated milk chocolate. By contrast, Smarties that have been produced for other markets all contain unadulterated milk chocolate, regardless of their color. Crucially, the participants were sometimes presented with pairs of stimuli that differed in their color but not in their flavor, or with pairs of Smarties that differed in both their color and flavor, or else with Smarties pairs that differed in their flavor but not their color. In a preliminary questionnaire, a number of the participants in Levitan et al.’s (2008) study stated their belief that a certain non-orange (i.e., red and green) Smartie had a distinctive flavor (which is incorrect), whereas other participants believed (correctly) that all the non-orange Smarties tasted the same. In the first experiment, the participants were presented with all possible pairings of orange, red, and green Smarties and were asked to judge whether a given pair of Smarties differed in flavor by tasting them while either sighted or blindfolded. The results showed that people’s beliefs concerning specific color–flavor associations for Smarties exerted a significant modulatory effect on their flavor responses. In the sighted condition, those participants who believed that non-orange Smarties all taste the same were more likely to judge correctly that a red–green pairing of Smarties tasted identical in comparison to the first group, who performed at a level that was significantly below chance (i.e., they reported that the red and green Smarties tasted different on the majority of trials). In other words, those participants who thought that there was a difference between the flavors of the red and green Smarties did in fact judge the two Smarties as tasting different far more frequently when compared with participants who did not hold such a belief in the sighted condition. The results of Levitan et al.’s study are consistent with the results of the other studies presented in this section in showing that food color can have a powerful cross-modal influence on people’s perception of the flavor of food. However, Levitan et al.’s findings show that people’s beliefs about the cross-modal color–flavor associations of specific foods can modulate this influence, and that such cognitive influences can be robust and long-lasting despite extensive experience with the particular food item concerned.* In another recent study, Shankar et al. (2009) found that another variety of sugar-coated chocolate candies (multicolored M&Ms, which are all physically identical in taste) were rated as having a stronger chocolate flavor when they were labeled as “dark chocolate” than when they were labeled as “milk chocolate.” Many other studies have found a similar effect of expectations produced by labeling a stimulus before sampling on flavor perception (see Cardello 1994; Deliza and MacFie 1996; Lee et al. 2006; Yeomans et al. 2008; Zellner et al. 2004, for reviews). Shankar et al. have also investigated whether the influence of expectations on flavor perception might be driven by color information (see Levitan et al. 2008). In their study, participants were asked to evaluate how “chocolatey” they found green- or brown-colored M&Ms. Participants rated the brown M&Ms as being more “chocolatey” than the green ones. This result suggests that the color brown generates stronger expectations of “chocolate” than the green color (cf. Duncker 1939). Finally, Shankar et al. studied whether there was an interaction between the expectation generated by either color or label on multisensory flavor perception. The participants were again presented with brown- or green-colored M&Ms and informed about the “chocolate category” (i.e., either “milk chocolate” or “dark chocolate”) with each color–label combination (green–milk, brown–milk, green–dark, brown–dark) presented in a randomized order. Brown-colored M&Ms were given a higher chocolatey rating than green-colored M&Ms. Similarly, those labeled as “dark chocolate” were given higher ratings than those labeled “milk chocolate.” However, no interaction between these colors and labels was found, thus suggesting that these two factors exerted independent effects, implying that two distinct associations were being retrieved from memory and then utilized (e.g., the color–flavor association * It is interesting to note that the participants in Levitan et al.’s (2008) study were able to maintain such inappropriate beliefs about differently colored Smarties tasting different, despite the objective evidence that people perceive no difference in their flavor, and the fact that they have presumably had extensive previous exposure to the fact that these colors provide no useful information in this foodstuff.
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
747
and the label–flavor association). Shankar et al.’s findings therefore provide the first evidence that color can influence the flavor of a product whose flavor identity cannot be predicted by its color. In other words, the colors of the coatings of the M&Ms are independent of their taste (which is always chocolate). One final issue that remains unresolved here concerns the extent to which the influence of color on flavor discrimination reflects a perceptual versus a more decisional effect, or whether instead both perceptual and decisional factors may contribute to participants’ performance (see Spence et al., submitted; and Zampini et al. 2007, on this point). If it is a purely perceptual effect, the participant’s gustatory experience should be changed by viewing the color, that is, knowledge of the color might improve the sensitivity of participants’ flavor discrimination responses by reducing the variability of the multisensory flavor signal. Alternatively, however, according to the decisional account, people should always have given the same gustatory response for a given color–flavor pairing regardless of whether sighted or blindfolded. In fact, what may have changed is their decisional criteria. In Levitan et al.’s (2008) study, the participants who were uncertain of their responses for a given pair of Smarties might have biased their choice toward making different responses because they could see that they had a different color. By contrast, those participants who already knew that red and green Smarties were normally identical in taste might have been biased toward making a same response. In the case of olfaction, Engen (1972) has already shown results consistent with the claim that color can influence odor perception as a result of its effect on decisional mechanisms, but this does not, of course, necessarily rule out a role for perceptual interactions as well, at least when tested under the appropriate experimental conditions (see Zellner and Kautz 1990). However, it is possible to hypothesize that a person’s beliefs about particular foods tasting different if they have a different color may paradoxically result in them actually tasting different. Analogously, de Craen et al. (1996) discussed a number of findings showing that color cues modulate the effectiveness of medicines as well as placebo pills. Although the mechanism behind placebo effects such as these is not as yet well understood, the effects themselves are nevertheless robust (e.g., for a recent review, see Koshi and Short 2007). What is more, just as in Levitan et al.’s (2008) Smarties experiment, there is at least some evidence that different people may hold different beliefs about differently colored pills, and that these beliefs can carry over into the actual effects that the differently colored placebo pills are shown to have (Lucchelli et al. 1978). Therefore, if people’s beliefs about color and medication can affect their physical state (e.g., resulting in a genuine change in their tolerance for pain, say, or in their ability to sleep), it would seem conceivable that a person’s belief that a certain colored Smartie tasted distinctive (from a Smartie of a different color) might give rise to the effect of it, paradoxically, actually tasting different to that person, despite there being no physical difference in flavor.
37.2.4 Color–Flavor Interactions: Neural Correlates The results discussed so far on the potential influences of visual cues on flavor perception are consistent with the growing body of neurophysiological and electrophysiological data demonstrating the intimate link between visual, olfactory, and gustatory flavor information at a neuronal level (Osterbauer et al. 2005; Rolls 2004; Rolls and Baylis 1994; Small 2004; Small and Prescott 2005; Verhagen and Engelen 2006). For instance, Osterbauer and his colleagues have used functional neuroimaging to investigate how activity in the human orbitofrontal cortex (OFC) can be modulated by the presentation of particular combinations of odors and colors. The participants in this study had to smell different odors including lemon, strawberry, spearmint, and caramel that were presented by means of a computer-controlled olfactometer. The odors were presented in isolation or else together with a color. The participants wore prism glasses to see full-screen colors presented onscreen outside the magnet bore. On some occasions the odor matched the color, such as when the smell of lemon was presented with the color yellow, whereas at other times the odor and color did not match, such as when spearmint odor was presented with the color brown. Osterbauer et al.’s
748
The Neural Bases of Multisensory Processes
findings revealed that the presentation of appropriate odor–color combinations (e.g., odor of strawberry matched with red color) increased the brain activity seen in the OFC when compared with the brain activation seen in the odor-alone conditions. By contrast, there was a suppression of neural activity in the same area when inappropriate color–odor combinations were presented (e.g., when the odor of strawberry was presented with a turquoise patch of color on the monitor; see also De Araujo et al. 2003). Taken together, these results would appear to suggest that presenting an appropriate color–odor association may actually lead to increased neural activity in brain areas responsible for processing olfactory stimuli, whereas presenting inappropriate color–odor associations can suppress brain activity below that observed to the odors alone. The positive correlation between the perceived congruency of color–odor pairs and the changes in the pattern of brain activation found in Osterbauer et al.’s study (see also Skrandies and Reuther 2008), therefore, provides a neurophysiological basis for the perceptual changes elicited by changing the color of food.
37.2.5 Interim Summary Taken together, the results reviewed thus far demonstrate that visual information can have a dramatic impact on flavor perception and evaluation in humans. In particular, most of the studies have shown that it is possible to impair flavor discrimination performance by coloring fruit-flavored solutions inappropriately. The effect of color cues on human flavor perception can be explained by the fact that visual information sets up an expectation regarding the flavor that is about to be experienced. This expectation may originate from any previous experience with similar food stimuli that have contributed to build up such associations between the visual aspect and the experienced flavor (see Shankar et al. 2010; Yeomans et al. 2008). Stevenson and his colleagues (e.g., Stevenson and Boakes 2004; Stevenson et al. 1998) have suggested than any interaction taking place between gustation and olfaction might be explained in terms of associative learning processes. Their findings show that we are able to create strong links between odors and tastes that are repeatedly presented together. It is possible to hypothesize that the strong correspondences between colors and flavors may rely on a similar mechanism. The same foodstuffs are usually experienced first through their visual appearance and then through their flavor. It is possible that in our life we learn to build up a strong association between visual and flavor food properties that are systematically combined. Therefore, people who are presented first with the visual aspect of food and drinks generated a series of expectations about the flavor that those food and drinks should have. White and Prescott (2007) have put forward a similar explanation for their findings regarding the influence of odors on tastes identification, when odors were presented in advance of taste. In the previous section, a study was discussed in which participants’ beliefs on the color–flavor association based on their previous experiences significantly modulated their responses (see Levitan et al. 2008). In particular, participants who expected a difference between food products that were colored differently were more likely to report a difference than those without any such prior belief. Therefore, flavor perception might be considered as constituting a multisensory experience with somewhat different rules that those regulating other multisensory interactions. Research suggests that spatial coincidence and temporal synchrony are two of the key factors determining whether multisensory integration will take place (at the single cell level) to give rise to the rich multisensory perceptual objects that fill our everyday lives (for reviews, see Calvert et al. 2004). Given that the cross-modal influence of visual cues on flavor perception occur long before we taste foods and occur in different regions of space (i.e., food is only ever seen outside the oral cavity but tasted within it; see Hutchings 1977), it would seem reasonable to suggest that expectancy plays a greater role than the spatial and temporal rules (see Shankar et al. 2010). It might, for example, be less likely that visual–flavor interactions would be influenced by the spatial and temporal rules of multisensory integration (which might better help to explain the integration of auditory, visual, and tactile, that is, the spatial senses; it might also explain the integration of olfactory/gustatory and oral–somatosensory cues in the basic flavor percept). Therefore, we believe that the multisensory study of flavor
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
749
perception is particularly interesting for multisensory researchers precisely because the rules of integration, and cross-modal influence, are likely to be somewhat different. In the previous sections, we also discussed how individual differences can affect the nature of the cross-modal visual–flavor interactions that are observed. In particular, visual influences on multisensory flavor perception can be significantly modulated as a function of the taster status of the participant. Visual dominance effects in multisensory flavor perception are more pronounced in those participants who are less sensitive to gustatory cues (i.e., nontasters) than in supertasters, who appear to enjoy the benefit of enhanced gustatory resolution. Therefore, taster status, although often neglected in studies investigating color–flavor interactions, should certainly be considered more carefully in any future research in this area. Finally, we have reviewed the role of expectancy resulting from visual information on the overall food perception.
37.3 ROLE OF AUDITORY CUES IN THE MULTISENSORY EXPERIENCE OF FOODSTUFFS Most of the visual cues typically occur before our consumption of food and drink, whereas auditory cues are typically only available at the moment of consumption (or mastication). Therefore, one might expect the role of expectancy to be reduced when looking at the effect of sounds on the perception of food. Certainly, visual and auditory cues provide information at distinct stages of eating. In the second part of this chapter, we therefore briefly discuss the possible role that auditory cues may play in the multisensory perception of foodstuffs. Several studies have demonstrated the influential role that auditory information plays in our perception of food (for a review, see Spence and Zampini 2006). For example, it has been shown that people’s ratings of the pleasantness of many foods can be strongly influenced by the sounds produced when people bite into them (e.g., Drake 1970; Vickers 1981, 1983; Vickers and Bourne 1976). Food sounds have a particularly noticeable influence on people’s perception of crispness that is closely associated with pleasantness, especially in crunchy foods (i.e., crisps; e.g., Vickers 1983). Taken together, these results therefore suggest that the perception of the crispness of (especially) crunchy foods (e.g., crisps, biscuits, cereals, vegetables) is largely characterized by tactile, mechanical, kinesthetic, and auditory properties (e.g., Vickers 1987). Many foodstuffs produce particular sounds when we eat them. For instance, Drake (1963) reported that the sounds produced by chewing or crushing a variety of different foodstuffs varied in their amplitude, frequency, and temporal characteristics. Analysis of the auditory characteristics of different foods has shown that crispy foods are typically higher in pitch than crunchy foods (Vickers 1979). However, the role of auditory cues in the evaluation of food qualities (e.g., crispness) have been investigated by using different kinds of foodstuffs, that might have different levels of freshness (e.g., Christensen and Vickers 1981; Drake 1963; Seymour and Hamann 1988; Vickers 1984; Vickers and Bourne 1976; Vickers and Wasserman 1979). Those studies also clearly show that despite the informational richness contained in the auditory feedback provided by biting into and/or chewing food, people are typically unaware of the effect that such sounds have on their overall multisensory perception or evaluation of particular stimuli. In particular, Zampini and Spence (2004, 2005) have shown that people’s perception and evaluation of different foodstuffs (e.g., potato chips and sparkling water) can be modulated by changing the overall sound level or just the highfrequency components (see also Chen et al. 2005; Masuda et al. 2008; Varela et al. 2006).
37.3.1 Effect of Sound Manipulation on the Perception of Crisps Zampini and Spence (2004) studied the multisensory interactions between auditory, oral, tactile, mechanical, kinesthetic, and visual information in the rating of the perception of the “crispness” and “freshness” of potato chips (or crisps), to investigate whether the evaluation of the crispness and
750
The Neural Bases of Multisensory Processes
freshness of potato chips would be affected by only modifying the sounds produced during the biting action. In fact, the Pringles potato chips used in their experiment have all the same visual (i.e., shape) and oral–tactile (i.e., texture) aspects. The participants in this study had to make a single bite with their front teeth into a large number (180) of potato chips (otherwise known as crisps in the United Kingdom) with their mouth placed directly above the microphone and then to spit the crisp out (without swallowing) into a bowl placed on their lap. They then rated the crispness and freshness of each potato chip using a computer-based visual analog scale. The participants might hear the veridical sounds they made when biting into a crisp without any auditory frequency adjustment or with frequencies in the range 2–20 of the biting sounds amplified or attenuated by 12 dB. Furthermore, for each frequency manipulation, there was an attenuation of the overall volume of 0 (i.e., no attenuation), 20, or 40 dB. The results showed that the perception of both crispness and freshness were affected by the modulation of the auditory cues produced during the biting action. In particular, the potato chips were perceived as being both crisper and fresher when either the overall sound level was increased, or when just the high frequency sounds (in the range of 2–20 kHz) were selectively amplified (see Figure 37.2). (b)
(a)
Response scale
Headphones
Perceived crispness (magnitude estimation)
Crisper 100
Softer
Microphone
0 dB –20 dB
80 60
–40 dB
40 20
0
Attenuate
Normal
Amplify
Frequency manipulation (c)
Perceived freshness (magnitude estimation)
Fresher 100
Response pedals
Staler
0 dB
80
–20 dB
60
–40 dB
40 20 0
Attenuate
Normal
Amplify
Frequency manipulation
FIGURE 37.2 (a) Schematic view of apparatus and participant in Zampini et al.’s (2004) study. Door of experimental booth was closed during the experiment and response scale was viewed through the window in left-hand side wall of booth. Mean responses for soft–crisp (b) and fresh–stale (c) response scales for three overall attenuation levels (0, −20, or −40 dB) against three frequency manipulations (high frequencies attenuated, veridical auditory feedback, or high frequencies amplified) are reported. Error bars represent between-participants standard errors of means. (Reprinted from Zampini, M., and Spence, C., J. of Sens. Stud., 19, 347–363, 2004. With permission.)
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
751
Given that the crisps in Zampini and Spence’s (2004) study were very similar to each other in terms of their visual, tactile, and flavor attributes, the only perceptual aspect that varied during the task was the sound (which, of course, also contributes to flavor). Therefore, participants may have “felt” that the crisps had a different texture only guided by the sound since the other senses always received the same information. Additional evidence highlighting the powerful effect of auditory cues on the overall perception of the crisps was that the majority of the participants (15 out of 20) stated anecdotally on debriefing after the experiment that they believed the crisps to have been selected from different packages. Additionally, the majority of the participants also reported that the auditory information had been more salient than the oral tactile information, and this may also help to account for the effects reported by Zampini and Spence. In fact, one of the fundamental laws of multisensory integration that has emerged over the past few decades states that the sense that provides the more reliable (or salient) information is the one that dominates, or modulates, perception in another sensory modality (e.g., Ernst and Banks 2002; Shimojo and Shams 2001; Welch and Warren 1980). However, the sensory dominance effect can be explained by the fact that the human brain might rely on the most attended senses (Spence and Shankar 2010). The role of attention in the multisensory influence of auditory information on food perception is consistent with the results of a study in which the participants had to try and detect weak solutions of sucrose or citric acid in a mixture (Marks and Wheeler 1998). Participants were more accurate at detecting the tastant they were attending to than for the tastant they were not attending to (see also Ashkenazi and Marks 2004). Marks and Wheeler suggested that our ability to detect a particular sensory quality (e.g., tastant or flavor) may be modulated by selective attention toward (or away from) that quality. Therefore, in a similar vein, one might suggest that the effect found in crispness perception by increasing the overall loudness of the sounds produced when biting into crisps can change a participant’s perception by making the sounds more pronounced than would have been the case if this information had been derived solely from the texture in the mouth or from normal-level auditory cues. That is, participants’ attention would be directed toward this feature of the food by externally changing the relative weighting of the sensory cues that signify this. Louder sounds are also presumably more likely to capture a person’s attention than quieter sounds. However, at present, it is unclear how many of the findings taken to support an attentional account of any sensory dominance effect can, in fact, be better accounted for in terms of sensory estimates of stimulus attributes simply being more accurate (i.e., less variable) in the dominant modality than those in the other modalities (e.g., Alais and Burr 2004; Battaglia et al. 2003; Ernst and Banks 2002). Finally, it is important to note that these explanations are not mutually exclusive. For example, Zampini and Spence’s (2004) results can be accounted for either in terms of attentional capture or in terms of multisensory integration.
37.3.2 Effect of Auditory Cues on the Perception of Sparkling Water In a follow-up study, Zampini and Spence (2005) studied the possible influence of auditory cues in the perception and evaluation of carbonation of water. Our perception of the carbonation of a beverage often relies on the integration of a variety of multisensory cues from visual, oral– somatosensory, nociceptive, auditory, and even tactile cues that are provided by the bubbles (e.g., Chandrashekar et al. 2009; Vickers 1991; Yau and McDaniel 1992). Zampini and Spence (2005) examined the relationship between the auditory cues produced by sparkling water and its perceived level of carbonation both when carbonated water samples were assessed in a cup and when they were assessed in the mouth. The carbonation sounds were modified adopting the same experimental paradigm developed in their previous research on the perception of potato chips (Zampini and Spence 2004). The sparkling water samples held in participants’ hands were judged to be more carbonated when the overall sound level was increased and/or when the high-frequency components (2–20 kHz) of the water sound were amplified. Interestingly, however, a subsequent experiment failed to demonstrate any effect of these auditory manipulations on the perception of carbonation and oral irritation from water samples that were held in the mouth. Taken together, these results
752
The Neural Bases of Multisensory Processes
therefore show that auditory cues can modulate the perception of the carbonation of a water sample held in the hand, but cannot modulate people’s perception of a water sample held in the mouth. This might be because the perception of carbonation in the mouth is more dependent on oral–somatosensory and/or nociceptive inputs than on auditory cues, or alternatively, that it is more important that we correctly perceive stimuli once they have entered the oral cavity (see Koza et al. 2005). Once again, these findings are consistent with the hypothesis that the modality dominating multisensory perception (when the senses are put into conflict) is the most accurate and/or informative sense (e.g., see Ernst and Banks 2002).
37.4 CONCLUSIONS The past few years have seen a rapid growth of interest in the multisensory aspects of food perception (see Auvray and Spence 2008; Delwiche 2004; Prescott 1999, 2004; Stevenson 2009; Stevenson and Tomiczek 2007; Stillman 2002; Verhagen and Engelen 2006, for reviews). The research reviewed here highlights the profound effect that visual (i.e., color of food) and auditory cues (i.e., variations in the overall sound level and variations in the spectral distribution of energy) can have on people’s perception foodstuffs (such as potato chips and beverages). When people are asked to identify the flavors of food and beverages, their responses can be influenced by the colors of those food and beverages. In particular, the identification of specific flavors has often been shown to be less accurate when they are paired with an inappropriate color (e.g., DuBose et al. 1980; Zampini et al. 2007, 2008). Our perception of the flavor and physical characteristics of food and beverages can also be modulated by auditory cues. For instance, it is possible to change the perceived crispness of crisps or the perceived fizziness of a carbonated beverage (such as sparkling water) simply by modifying the sounds produced when eating the crisps or by the bubbles of the sparkling water (Zampini et al. 2004, 2005). It is important to note that visual and auditory information are available at different stages of eating. Typically, visual (not to mention orthonasal olfactory and, on occasion, auditory) cues are available long before our ingestion of food (and before any other sensory cues associated with the food are available). Therefore, visual cues (e.g., food colors) might be expected to create an expectancy concerning the possible flavor of the food to be eaten (Hutchings 1977; Shankar et al. 2010). By contrast, any role of expectancy might be reduced when thinking at the potential influence of auditory cues on the perception of food. In fact, the sounds produced when biting into or chewing food are available at the moment of consumption. Therefore, it is possible to hypothesize that the role of multisensory integration is somewhat different when looking at the role of visual and auditory cues on the overall food perception. Given that visual cues are typically available long before a food is consumed and outside the mouth, it is quite unlikely that visual–flavor interactions are modulated by the spatial and temporal rules (i.e., greater multisensory interaction with spatial and temporal coincidence between the stimuli; see Calvert et al. 2004, for a review). Therefore, visual influences on multisensory flavor perception are better explained by looking at the role of expectancy than at the role of the spatial and temporal rules, which might help us to understand the role of auditory cues on food perception instead. However, some sounds might produce an expectancy effect as well. For example, sound of the food package being opened will normally precede the consumption of a particular packaged food item (think only of the rattling of the crisps packet). Several researchers have demonstrated that people’s expectations regarding what they are about to consume can also have a significant effect on their perception of pleasantness of the food or drink itself (see Spence et al., in press, for a recent review). It is also important to note that the visual and auditory contribution to multisensory flavor perception typically takes place without people necessarily being consciously aware that what they are seeing or hearing is influencing their overall flavor experience (e.g., Zampini et al. 2004, 2005). In Zampini et al.’s more recent research (e.g., Zampini et al. 2007, 2008), the participants were influenced by the inappropriate colors of the beverages that they were evaluating even though they had been informed beforehand that there might be a lack of congruency between the colors that they saw and the flavors that they were tasting. This shows, therefore, that
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
753
the effect was powerful enough to override participants’ awareness that color information might mislead their identification of the flavors. The potential role of the sounds made when eating food on food perception is often ignored by people. For example, most of the participants in Zampini et al.’s (2004) study thought that the crisps were actually different (i.e., sorted from different packages or with different level of freshness and, therefore, of crispness). They seem to ignore the fact that the experimenters changed only the sounds produced when biting into the crisps and the crisps were not different. Nevertheless, the study reported here are consistent with a growing number of neurophysiological and electrophysiological studies demonstrating close visual–flavor (Osterbauer et al. 2005; Small 2004; Small and Prescott 2006; Verhagen and Engelen 2006) and audiotacile (Gobbelé et al. 2003; Kitagawa and Spence 2006; Levänen et al. 1998; Schroeder et al. 2001; von Békésy 1957)* interactions at the neuronal level. Results such as these therefore help to emphasize the limitations that may be associated with relying solely on introspection and verbal report (as is often the case in commercial consumer testing settings) when trying to measure people’s perception and evaluation of foodstuffs.
REFERENCES Alais, D., and D. Burr. 2004. The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14: 257–262. Alley, R. L., and T. R. Alley. 1998. The influence of physical state and color on perceived sweetness. Journal of Psychology: Interdisciplinary and Applied 132: 561–568. Ashkenazi, A., and L. E. Marks. 2004. Effect of endogenous attention on detection of weak gustatory and olfactory flavors. Perception & Psychophysics 66: 596–608. Auvray, M., and C. Spence. 2008. The multisensory perception of flavor. Consciousness & Cognition 17: 1016–1031. Bartoshuk, L. M., V. B. Duffy, and I. J. Miller. 1994. PTC/PROP tasting: Anatomy, psychophysics, and sex effects. Physiology & Behavior 56: 1165–1171. Bartoshuk, L. M., K. Fast, T. A. Karrer, S. Marino, R. A. Price, and D. A. Reed. 1992. PROP supertasters and the perception of sweetness and bitterness. Chemical Senses 17: 594. Battaglia, P. W., R. A. Jacobs, and R. N. Aslin. 2003. Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America A 20: 1391–1397. Bertelson, P., and G. Aschersleben. 1998. Automatic visual bias of perceived auditory location. Psychonomic Bulletin & Review 5: 482–489. Blackwell, L. 1995. Visual clues and their effects on odour assessment. Nutrition and Food Science 5: 24–28. Bourne, M. C. 1982. Food texture and viscosity. New York: Academic Press. Bufe, B., P. A. Breslin, C. Kuhn et al. 2005. The molecular basis of individual differences in phenylthiocarbamide and propylthiouracil bitterness perception. Current Biology 15: 322–327. Bult, J. H. F., R. A. de Wijk, and T. Hummel. 2007. Investigations on multimodal sensory integration: Texture, taste, and ortho- and retronasal olfactory stimuli in concert. Neuroscience Letters 411: 6–10. Cain, W. S. 1977. History of research on smell. In Handbook of perception: Vol. 6a: Tasting and smelling, ed. E. C. Carterette and M. P. Friedman, 197–229. New York: Academic Press. Calvert, G., C. Spence, and B. E. Stein. 2004. The handbook of multisensory processing. Cambridge, MA: MIT Press. Cardello, A. V. 1994. Consumer expectations and their role in food acceptance. In Measurement of food preferences, ed. H. J. H. MacFie, and D. M. H. Thomson, 253–297. London: Blackie Academic & Professional. Chan, M. M., and C. Kane-Martinelli. 1997. The effect of color on perceived flavour intensity and acceptance of foods by young adults and elderly adults. Journal of the American Dietetic Association 97: 657–659. Chandrashekar, J., D. Yarmolinsky, L. von Buchholtz et al. 2009. The taste of carbonation. Science 326: 443–445.
* However, it is important to note that, to the best of our knowledge, no neuroimaging studies have as yet been conducted to investigate the role of auditory cues on multisensory food perception (cf. Spence and Zampini 2006; Verhagen and Engelen 2006).
754
The Neural Bases of Multisensory Processes
Chen, H., C. Karlsson, and M. Povey. 2005. Acoustic envelope detector for crispness assessment of biscuits. Journal of Texture Studies 36: 139–156. Christensen, C. M. 1980a. Effects of taste quality and intensity on oral perception of viscosity. Perception & Psychophysics 28: 315–320. Christensen, C. M. 1980b. Effects of solution viscosity on perceived saltiness and sweetness. Perception & Psychophysics 28: 347–353. Christensen, C. M., and Z. M. Vickers. 1981. Relationship of chewing sounds to judgments of food crispness. Journal of Food Science 46: 574–578. Clydesdale, F. M. 1993. Color as a factor in food choice. Critical Reviews in Food Science and Nutrition 33: 83–101. Dalton, P., N. Doolittle, H. Nagata, and P. A. S. Breslin. 2000. The merging of the senses: Integration of subthreshold taste and smell. Nature Neuroscience 3: 431–432. Davis, R. 1981. The role of nonolfactory context cues in odor identification. Perception & Psychophysics 30: 83–89. De Araujo��������������������������������������������������������������������������������������������������� ��������������������������������������������������������������������������������������������������������� , I. E. T., E. T.���������������������������������������������������������������������������������� ��������������������������������������������������������������������������������� Rolls, M. L. Kringelbach, F. McGlone, and N. Phillips. 2003. �������������������� Taste–olfactory convergence, and the representation of the pleasantness of flavour, in the human brain. European Journal of Neuroscience 18: 2059–2068. de Craen, A. J. M., P. J. Roos, A. L. de Vries, and J. Kleijnen. 1996. Effect of colour of drugs: Systematic review of perceived effect of drugs and their effectiveness. British Medical Journal 313: 1624–1626. Deliza, R., and H. MacFie. 1996. The generation of sensory expectation by external cues and its effect on sensory perception and hedonic ratings: A review. Journal of Sensory Studies 11: 103–128. Delwiche, J. 2004. The impact of perceptual interactions on perceived flavour. Food Quality and Preference 15: 137–146. Demattè, M. L., D. Sanabria, and C. Spence. 2006. Crossmodal associations and interactions between olfaction and vision. Chemical Senses 31: E50–E51. Demattè, M. L., D. Sanabria, and C. Spence. 2009. Olfactory identification: When vision matters? Chemical Senses 34: 103–109. Drake, B. K. 1963. Food crunching sounds. An introductory study. Journal of Food Science 28: 233–241. Drake, B. K. 1970. Relationships of sounds and other vibrations to food acceptability. Proceedings of the 3rd International Congress of Food Science and Technology, pp. 437–445. August 9–14, Washington, DC. Drewnowski, A. 2003. Genetics of human taste perception. In Human olfaction and gustation, 2nd ed., ed. R. L. Doty, 847–860. New York: Marcel Dekker, Inc. DuBose, C. N., A. V. Cardello, and O. Maller. 1980. Effects of colourants and flavourants on identification, perceived flavour intensity, and hedonic quality of fruit-flavoured beverages and cake. Journal of Food Science 45: 1393–1399, 1415. Duffy, V. B. 2007. Variation in oral sensation: Implications for diet and health. Current Opinion in Gastroenterology 23: 171–177. Duncker, K. 1939. The influence of past experience upon perceptual properties. American Journal of Psychology 52: 255–265. Engen, T. 1972. The effect of expectation on judgments of odour. Acta Psychologica 36: 450–458. Ernst, M. O., and M. S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429–433. Essick, G. K., A. Chopra, S. Guest, and F. McGlone. 2003. Lingual tactile acuity, taste perception, and the density and diameter of fungiform papillae in female subjects. Physiology & Behavior 80: 289–302. Frank, R. A., K. Ducheny, and S. J. S. Mize. 1989. Strawberry odor, but not red color, enhances the sweetness of sucrose solutions. Chemical Senses 14: 371–377. Garber Jr., L. L., E. M. Hyatt, and Ü. Ö. Boya. 2008. ������������������������������������������������� The mediating effects of the appearance of nondurable consumer goods and their packaging on consumer behavior. In Product experience, ed. H. N. J. Schifferstein and P. Hekkert, 581–602. London: Elsevier. Garber Jr., L. L., E. M. Hyatt, and R. G. Starr Jr. 2000. The effects of food colour on perceived flavour. Journal of Marketing Theory and Practice 8: 59–72. Garber Jr., L. L., E. M. Hyatt, and R. G. Starr Jr. ���������������������������������������������������������� 2001. Placing food color experimentation into a valid consumer context. Journal of Food Products Marketing 7: 3–24. Gifford, S. R., and F. M. Clydesdale. 1986. The psychophysical relationship between colour and sodium chloride concentrations in model systems. Journal of Food Protection 49: 977–982. Gifford, S. R., F. M. Clydesdale, and R. A. Damon Jr. 1987. The psychophysical relationship between colour and salt concentration in chicken flavoured broths. Journal of Sensory Studies 2: 137–147.
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
755
Gobbelé, R., M. Schürrmann, N. Forss, K. Juottonen, H. Buchner, and R. Hari. 2003. Activation of the human posterior and tempoparietal cortices during audiotactile interaction. Neuroimage 20: 503–511. Green, B. G., G. S. Shaffer, and M. M. Gilmore. 1993. Derivation and evaluation of a semantic scale of oral sensation with apparent ratio properties. Chemical Senses 18: 683–702. Hollowood, T. A., R. S. T. Linforth, and A. J. Taylor. 2002. The effect of viscosity on the perception of flavour. Chemical Senses 27: 583–591. Hutchings, J. B. 1977. The importance of visual appearance of foods to the food processor and the consumer. In Sensory properties of foods, ed. G. G. Birch, J. G. Brennan, and K. J. Parker, 45–57. London: Applied Science Publishers. Johnson, J. L., and F. M. Clydesdale. 1982. Perceived sweetness and redness in coloured sucrose solutions. Journal of Food Science 47: 747–752. Johnson, J. L., E. Dzendolet, and F. M. Clydesdale. 1983. Psychophysical relationships between sweetness and redness in strawberry-drinks. Journal of Food Protection 46: 21–25. Kim, U. K., P. A. Breslin, D. Reed, and D. Drayna. 2004. Genetics of human taste perception. Journal of Dental Research 83: 448–453. Kitagawa, N., and C. Spence. 2006. Audiotactile multisensory interactions in information processing. Japanese Psychological Research 48: 158–173. Koshi, E. B., and C. A. Short. 2007. Placebo theory and its implications for research and clinical practice: A review of the recent literature. Pain Practice 7: 4–20. Koza, B., A. Cilmi, M. Dolese, and D. Zellner. 2005. Color enhances orthonasal olfactory intensity and reduces retronasal olfactory intensity. Chemical Senses 30: 643–649. Lavin, J., and H. T. Lawless. 1998. Effects of colour and odor on judgments of sweetness among children and adults. Food Quality and Preference 9: 283–289. Lawless, H., P. Rozin, and J. Shenker. 1985. Effects of oral capsaicin on gustatory, olfactory and irritant sensations and flavor identification in humans who regularly or rarely consume chili pepper. Chemical Senses 10: 579–89. Lee, L., S. Frederick, and D. Ariely. 2006. Try it, you’ll like it. Psychological Science 17: 1054–1058. Levänen, S., V. Jousmäki, and R. Hari. 1998. Vibration-induced auditory-cortex activation in a congenitally deaf adult. Current Biology 8: 869–872. Levitan, C. A., M. Zampini, R. Li, and C. Spence. 2008. Assessing the role of colour cues and people’s beliefs about colour–flavour associations on the discrimination of the flavour of sugar-coated chocolates. Chemical Senses 33: 415–423. Lucchelli, P. E., A. D. Cattaneo, and J. Zattoni. 1978. Effect of capsule colour and order of administration of hypnotic treatments. European Journal of Clinical Pharmacology 13: 153–155. Maga, J. A. 1974. Influence of colour on taste thresholds. Chemical Senses and Flavour 1: 115–119. Marks, L. E., and M. E. Wheeler. 1998. Attention and the detectability of weak-taste stimuli. Chemical Senses 23: 19–29. Masuda, M., Y. Yamaguchi, K. Arai, and K. Okajima. 2008. Effect of auditory information on food recognition. IEICE Technical Report 108(356): 123–126. Moir, H. C. 1936. Some observations on the appreciation of flavour in food stuffs. Chemistry and Industry 55: 145–148. Morrot, G., F. Brochet, and D. Dubourdieu. 2001. The colour of odors. Brain and Language 79: 309–320. Murphy, C., and W. S. Cain. 1980. Taste and olfaction: Independence vs. interaction. Physiology and Behavior 24: 601–605. Oram, N., D. G. Laing, I. Hutchinson et al. 1995. The influence of flavour and colour on drink identification by children and adults. Developmental Psychobiology 28: 239–246. Osterbauer, R. A., P. M. Matthews, M. Jenkinson, C. F. Beckmann, P. C. Hansen, and G. A. Calvert. 2005. Color of scents: Chromatic stimuli modulate odor responses in the human brain. Journal of Neurophysiology 93: 3434–3441. Pangborn, R. M. 1960. Influence of colour on the discrimination of sweetness. American Journal of Psychology 73: 229–238. Pangborn, R. M., and B. Hansen. 1963. The influence of colour on discrimination of sweetness and sourness in pear-nectar. American Journal of Psychology 76: 315–317. Philipsen, D. H., F. M. Clydesdale, R. W. Griffin, and P. Stern. 1995. Consumer age affects response sensory characteristics of a cherry flavoured beverage. Journal of Food Science 60: 364–368. Prescott, J. 1999. Flavour as a psychological construct: Implications for perceiving and measuring the sensory qualities of foods. Food Quality and Preference 10: 349–356.
756
The Neural Bases of Multisensory Processes
Prescott, J. 2004. Psychological processes in flavour perception. In Flavour perception, ed. A. J. Taylor and D. Roberts, 256–278. London: Blackwell Publishing. Prescott, J., N. Ripandelli, and I. Wakeling. 2001. Binary taste mixture interactions in PROP non-tasters, medium-tasters and super-tasters. Chemical Senses 26: 993–1003. Prescott, J., J. Soo, H. Campbell, and C. Roberts. 2004. Response of PROP taster groups to variations in sensory qualities within foods and beverages. Chemical Senses 26: 993–1003. Reed, D. R. 2008. Birth of a new breed of supertaster. Chemical Senses 33: 489–491. Rolls, E. T. 2004. Smell, taste, texture, and temperature multimodal representations in the brain, and their relevance to the control of appetite. Nutrition Reiews 62: S193–S204. Rolls, E. T., and L. L. Baylis. 1994. Gustatory, olfactory, and visual convergence within the primate orbitofrontal cortex. Journal of Neuroscience 14: 5437–5452. Roth, H. A., L. J. Radle, S. R. Gifford, and F. M. Clydesdale. 1988. Psychophysical relationships between perceived sweetness and colour in lemon- and lime-flavoured drinks. Journal of Food Science 53: 1116–1119, 1162. Rozin, P. 1982. “Taste–smell confusions” and the duality of the olfactory sense. Perception & Psychophysics 31: 397–401. Schroeder, C. E., R. W. Lindsley, C. Specht, A. Marcovici, J. F. Smiley, and D. C. Javitt. 2001. Somatosensory in put to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85: 1322–1327. Seymour, S. K., and D. D. Hamann. 1988. Crispness and crunchiness of selected low moisture foods. Journal of Texture Studies 19: 79–95. Shankar, M. U., C. A. Levitan, J. Prescott, and C. Spence. 2009. The influence of color and label information on flavor perception. Chemosensory Perception 2: 53–58. Shankar, M. U., C. Levitan, and C. Spence. 2010. “Grape expectations”: Does higher level knowledge mediates the interpretation of multisensory cues? Consciousness & Cognition 19: 380–390. Shimojo, S., and L. Shams. 2001. Sensory modalities are not separate modalities: Plasticity and interactions. Current Opinion in Neurobiology 11: 505–509. Skrandies, W., and N. Reuther. 2008. Match and mismatch of taste, odor, and color is reflected by electrical activity in the human brain. Journal of Psychophysiology 22: 175–184. Small, D. M. 2004. Crossmodal integration—insights from the chemical senses. Trends in Neurosciences 27: 120–123. Small, D. M., J. C. Gerber, Y. E. Mak, and T. Hummel. 2005. Differential neural responses evoked by orthonasal versus retronasal odorant perception in humans. Neuron 47: 593–605. Small, D. M., and J. Prescott 2005. Odor/taste integration and the perception of flavour. Experimental Brain Research 166: 345–357. Small, D. M., M. G. Veldhuizen, J. Felsted, Y. E. Mak, and F. McGlone. 2008. Separable substrates for anticipatory and consummatory food chemosensation. Neuron 57: 786–797. Spence, C. 2002. The ICI report on the secret of the senses. London: The Communication Group. Spence, C., C. Levitan, M. U. Shankar, and M. Zampini. 2010. Does food colour influence flavour identification in humans? Chemosensory Perception 3: 68–84. Spence, C., and M. U. Shankar. 2010. The influence of auditory cues on the perception of, and responses to, food and drink. Journal of Sensory Studies 25: 406–430. Spence, C., M. U. Shankar, and H. Blumenthal. In press. ‘Sound bites’: Auditory contributions to the perception and consumption of food and drink. To appear in Art and the senses, ed. F. Bacci and D. Melcher. Oxford: Oxford Univ. Press. Spence, C., and M. Zampini. 2006. Auditory contributions to multisensory product perception. Acta Acustica united with Acustica 92: 1009–1025. Stevenson, R. J., and R. A. Boakes. 2004. Sweet and sour smells: Learned synesthesia between the senses of taste and smell. In The handbook of multisensory processes, ed. G.A. Calvert, C. Spence, and B. E. Stein, 69–83. Cambridge, MA: MIT Press. Stevenson, R. J., R. A. Boakes, and J. Prescott. 1998. Changes in odor sweetness resulting from implicit learning of a simultaneous odor–sweetness association: An example of learned synesthesia. Learning and Motivation 29: 113–132. Stevenson, R. J. 2009. The psychology of flavour. Oxford: Oxford Univ. Press. Stevenson, R. J., and M. Oaten. 2008. The effect of appropriate and inappropriate stimulus color on odor discrimination. Perception & Psychophysics 70: 640–646. Stevenson, R. J., and C. Tomiczek. 2007. Olfactory-induced synesthesias: A review and model. Psychological Bulletin 133: 294–309. Stillman, J. 1993. Colour influences flavour identification in fruit-flavoured beverages. Journal of Food Science 58: 810–812.
Assessing the Role of Visual and Auditory Cues in Multisensory Perception of Flavor
757
Stillman, J. A. 2002. Gustation: Intersensory experience par excellence. Perception 31: 1491–1500. Strugnell, C. 1997. Colour and its role in sweetness perception. Appetite 28: 85. Tyle, P. 1993. Effect of size, shape and hardness of particles in suspension on oral texture and palatability. Acta Psychologica 84: 111–118. Varela, P., J. Chen, C. Karlsson, and M. Povey. 2006. Crispness assessment of roasted almonds by an integrated approach to texture description: Texture, acoustics, sensory and structure. Journal of Chemometrics 20: 311–320. Verhagen, J. V., and L. Engelen. 2006. The neurocognitive bases of human multimodal food perception: Sensory integration. Neuroscience and Biobehavioral Reviews 30: 613–650. Vickers, Z. M. 1979. Crispness and crunchiness in foods. In Food texture and rheology, ed. P. Sherman, 145– 166. London: Academic Press. Vickers, Z. M. 1981. Relationships of chewing sounds to judgments of crispness, crunchiness and hardness. Journal of Food Science 47: 121–124. Vickers, Z. M. 1983. Pleasantness of food sounds. Journal of Food Science 48: 783–786. Vickers, Z. M. 1984. Crispness and crunchiness—A difference in pitch? Journal of Texture Studies 15: 157–163. Vickers, Z. M. 1987. Crispness and crunchiness—Textural attributes with auditory components. In Food texture: Instrumental and sensory measurement, ed. H. R. Moskowitz, 45–66. New York: Marcel Dekker. Vickers, Z. M. 1991. Sound perception and food quality. Journal of Food Quality 14: 87–96. Vickers, Z. M., and M. C. Bourne. 1976. A psychoacoustical theory of crispness. Journal of Food Science 41: 1158–1164. Vickers, Z. M., and S. S. Wasserman. 1979. Sensory qualities of food sounds based on individual perceptions. Journal of Texture Studies 10: 319–332. von Békésy, G. 1957. Neural volleys and the similarity between some sensations produced by tones and by skin vibrations. Journal of the Acoustical Society of America 29: 1059–1069. Welch, R. B., and D. H. Warren. 1980. Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 3: 638–667. Wheatley, J. 1973. Putting colour into marketing. Marketing 67: 24–29. White, T. L., and J. Prescott. 2007. Chemosensory cross-modal Stroop effects: Congruent odors facilitate taste identification. Chemical Senses 32: 337–341 Yau, N. J. N., and M. R. McDaniel. 1992. The effect of temperature on carbonation perception. Chemical Senses 14: 337–348. Yeomans, M., L. Chambers, H. Blumenthal, and A. Blake. 2008. The role of expectancy in sensory and hedonic evaluation: The case of smoked salmon ice-cream. Food Quality and Preference 19: 565–573. Zampini, M., D. Sanabria, N. Phillips, and C. Spence. 2007. The multisensory perception of flavour: Assessing the influence of colour cues on flavour discrimination responses. Food Quality & Preference 18: 975–984. Zampini, M., and C. Spence. 2004. The role of auditory cues in modulating the perceived crispness and staleness of potato chips. Journal of Sensory Studies 19: 347–363. Zampini, M., and C. Spence. 2005. Modifying the multisensory perception of a carbonated beverage using auditory cues. Food Quality and Preference 16: 632–641. Zampini, M., E. Wantling, N. Phillips, and C. Spence. 2008. Multisensory flavour perception: Assessing the influence of fruit acids and colour cues on the perception of fruit-flavoured beverages. Food Quality & Preference 18: 335–343. Zellner, D. A., A. M. Bartoli, and R. Eckard. 1991. Influence of colour on odor identification and liking ratings. American Journal of Psychology 104: 547–561. Zellner, D. A., and P. Durlach. 2003. Effect of colour on expected and experienced refreshment, intensity, and liking of beverages. American Journal of Psychology 116: 633–647. Zellner, D. A., and M. A. Kautz. 1990. Colour affects perceived odor intensity. Journal of Experimental Psychology: Human Perception and Performance 16: 391–397. Zellner, D. A., and L. A. Whitten. 1999. The effect of colour intensity and appropriateness on color-induced odor enhancement. American Journal of Psychology 112: 585–604.
MEDICINE
It has become accepted in the neuroscience community that perception and performance are quintessentially multisensory by nature. Using the full palette of modern brain imaging and neuroscience methods, The Neural Bases of Multisensory Processes details current understanding in the neural bases for these phenomena as studied across species, stages of development, and clinical statuses. Organized thematically into nine subsections, the book is a collection of contributions by leading scientists in the field. Chapters build generally from basic to applied, allowing readers to ascertain how fundamental science informs the clinical and applied sciences. Topics discussed include • Anatomy, essential for understanding the neural substrates of multisensory processing • Neurophysiological bases and how multisensory stimuli can dramatically change the encoding processes for sensory information • Combinatorial principles and modeling, focusing on efforts to gain a better mechanistic handle on multisensory operations and their network dynamics • Development and plasticity • Clinical manifestations and how perception and action are affected by altered sensory experience • Attention and spatial representations The last sections of the book focus on naturalistic multisensory processes in three separate contexts: motion signals, multisensory contributions to the perception and generation of communication signals, and how the perception of flavor is generated. The text provides a solid introduction for newcomers and a strong overview of the current state of the field for experts.
K10614
an informa business
w w w. c r c p r e s s . c o m
6000 Broken Sound Parkway, NW Suite 300, Boca Raton, FL 33487 711 Third Avenue New York, NY 10017 2 Park Square, Milton Park Abingdon, Oxon OX14 4RN, UK
www.crcpress.com