Common Mechanisms in Perception and Action: Attention and Performance XIX
Wolfgang Prinz Bernhard Hommel Editors
OXFORD UNIVERSITY PRESS
aapa01.fm Page i Wednesday, December 5, 2001 9:14 AM
Common Mechanisms in Perception and Action
aapa01.fm Page ii Wednesday, December 5, 2001 9:14 AM
Attention and Performance Attention and Performance XIV: Synergies in Experimental Psychology, Arti1cial Intelligence, and Cognitive Neuroscience. Edited by David E. Meyer and Sylvan Kornblum, 1992 Attention and Performance XV: Conscious and Nonconscious Information Processing. Edited by Carlo Umiltà and Morris Moscovitch, 1994 Attention and Performance XVI: Information Integration in Perception and Action. Edited by Toshio Inui and James L. McClelland, 1996 Attention and Performance XVII: Cognitive Regulation of Performance: Interaction of Theory and Application. Edited by Daniel Gopher and Asher Koriat, 1998 Attention and Performance XVIII: Control of Cognitive Processes. Edited by Stephen Monsell and Jon Driver, 2000 Attention and Performance XIX: Common Mechanisms in Perception and Action. Edited by Wolfgang Prinz and Bernhard Hommel, 2002
aapa01.fm Page iii Wednesday, December 5, 2001 9:14 AM
Common Mechanisms in Perception and Action Attention and Performance XIX
edited by Wolfgang Prinz and Bernhard Hommel
This book is based on the papers presented at the Nineteenth International Symposium on Attention and Performance held at Kloster Irsee, Germany, July 16–22, 2000
aapa01.fm Page iv Wednesday, December 5, 2001 9:14 AM
Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Bangkok Buenos Aires Cape Town Chennai Dar es Salaam Delhi Hong Kong Istanbul Karachi Kolkata Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi São Paulo Shanghai Singapore Taipei Tokyo Toronto and an associated company in Berlin Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © The International Association for the Study of Attention and Performance, 2002 The moral rights of the author have been asserted Database right Oxford University Press (maker) First published 2002 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer A catalogue record for this title is available from the British Library Library of Congress Cataloging in Publication Data (Data available) ISBN 0 19 851069 1 10
9
8
7
6
5
4
3
2
1
Typeset by Integra Software Services Pvt. Ltd., Pondicherry, India 605 005 www.integra-india.com Printed in Great Britain on acid-free paper by T.J. International Ltd, Padstow, Cornwall
aapa01.fm Page v Wednesday, December 5, 2001 9:14 AM
Contents Acknowledgements The Attention and Performance Symposia Authors and Participants Group Photo
ix xi xiii xxii
Editors’ introduction
1
1 Common mechanisms in perception and action Introductory remarks Wolfgang Prinz and Bernhard Hommel
3
Association lecture
7
2 Sequential effects of dimensional overlap: findings and issues Sylvan Kornblum and Gregory Stevens
9
I
Space perception and spatially oriented action
55
3 Perception and action: what, how, when, and why Introduction to Section I Glyn W. Humphreys
57
4 Several ‘vision for action’ systems: a guide to dissociating and integrating dorsal and ventral functions (tutorial) Yves Rossetti and Laure Pisella
62
5 Attention and visually guided behavior in distinct systems Bruce Bridgeman 6 How the brain represents the body: insights from neurophysiology and psychology Michael S.A. Graziano and Matthew M. Botvinick
120
136
7 Action planning affects spatial localization Jerome Scott Jordan, Sonja Stork, Lothar Knuf, Dirk Kerzel, and Jochen Müsseler
158
8 The perception and representation of human locomotion John J. Rieser and Herbert L. Pick, Jr.
177
II
195
Timing in perception and action
9 Perspectives on the timing of events and actions Introduction to Section II Jeff Summers
197
aapa01.fm Page vi Wednesday, December 5, 2001 9:14 AM
vi
Contents
10 Movement timing: a tutorial Alan M. Wing and Peter J. Beek
202
11 Timing mechanisms in sensorimotor synchronization Gisa Aschersleben, Prisca Stenneken, Jonathan Cole, and Wolfgang Prinz
227
12 The embodiment of musical structure: effects of musical context on sensorimotor synchronization with complex timing patterns Bruno H. Repp
245
13 Action, binding, and awareness Patrick Haggard, Gisa Aschersleben, Jörg Gehrke, and Wolfgang Prinz
266
III
287
Action perception and imitation
14 Processing mechanisms and neural structures involved in the recognition and production of actions Introduction to Section III Raffaella Ida Rumiati 15 Action perception and imitation: a tutorial Harold Bekkering and Andreas Wohlschläger 16 Observing a human or a robotic hand grasping an object: differential motor priming effects Umberto Castiello, Dean Lusher, Morena Mari, Martin Edwards, and Glyn W. Humphreys
289 294
315
17 Action representation and the inferior parietal lobule Vittorio Gallese, Luciano Fadiga, Leonardo Fogassi, and Giacomo Rizzolatti
334
18 Coding of visible and hidden actions Tjeerd Jellema and David I. Perrett
356
19 The visual analysis of bodily motion Maggie Shiffrar and Jeannine Pinto
381
IV
401
Content-specific interactions between perception and action
20 Content-specific interactions between perception and action Introduction to Section IV Martin Eimer 21 Motor competence in the perception of dynamic events: a tutorial Paolo Viviani 22 Eliminating, magnifying, and reversing spatial compatibility effects with mixed location-relevant and irrelevant trials Robert W. Proctor and Kim-Phuong L. Vu 23 Does stimulus-driven response activation underlie the Simon effect? Fernando Valle-Inclán, Steven A. Hackley, and Carmen de Labra 24 Activation and suppression in conflict tasks: empirical clarification through distributional analyses K. Richard Ridderinkhof
403 406
443 474
494
aapa01.fm Page vii Wednesday, December 5, 2001 9:14 AM
Contents
25 Response-evoked interference in visual encoding Jochen Müsseler and Peter Wühr
520
26 Interaction between feature binding in perception and action Gijsbert Stoet and Bernhard Hommel
538
V
Coordination and integration in perception and action
553
27 Coordination and integration in perception and action Introduction to Section V Robert Ward
555
28 From perception to action: making the connection—a tutorial Pierre Jolicœur, Michael Tombu, Chris Oriet, and Biljana Stevanovski
558
29 The dimensional-action system: a distinct visual system Asher Cohen and Uri Feintuch
587
30 Selection-for-perception and selection-for-spatial-motor-action are coupled by visual attention: a review of recent findings and new evidence from stimulus-driven saccade control Werner X. Schneider and Heiner Deubel
609
31 Response features in the coordination of perception and action Gordon D. Logan and N. Jane Zbrodoff
628
32 Effect anticipation in action planning Michael Ziessler and Dieter Nattkemper
645
33 The representational nature of sequence learning: evidence for goal-based codes Eliot Hazeltine
673
Author index Subject index
691 709
vii
aapa01.fm Page viii Wednesday, December 5, 2001 9:14 AM
This page intentionally left blank
aapa01.fm Page ix Wednesday, December 5, 2001 9:14 AM
Acknowledgements We gratefully acknowledge the 1nancial support for the Symposium received from the Max Planck Society and the Deutsche Forschungsgemeinschaft. The Symposium took place at Kloster Irsee, located in the scenic Alpine foothills of the Swabian part of Bavaria, Germany. We thank the director of Kloster Irsee, Dr Rainer Jehl and his staff for their friendly reception, the ef1cient handling of organizational matters and their helpfulness. Further, we owe thanks to Dr Horst Gundlach and his assistant Christian Paulitsch of the Institute for the History of Psychology of the University of Passau, Germany, for the small but outstanding instrument exhibition Short Time Measurement in Early Experimental Psychology. The participants thoroughly enjoyed the unusual exhibition. Above all, we owe our very special thanks to Marina von Bernhardi, a staff member of the Max Planck Institute for Psychological Research in Munich, who was in charge of the entire, extensive organization—mainly prior to and during the symposium. She did a marvelous job in making everything work very smoothly and pleasantly. There was warm praise for her efforts from the participants, to which we add our special appreciation. Competent support in technical and administrative matters was given by the Munich institute’s staff, especially Max Schreder and Klaus Hereth, as well as Peter Schönfelder. Dirk Kerzel diligently created and constantly updated the A&P site containing manuscript versions. Last not least we are particularly grateful to Heide John of the Max Planck Institute, the secretarial of1ce in charge of handling a great deal of the manuscripts, communication between authors, reviewers, editors, and the 1nal compilation of the entire manuscript, tasks that even in—or because of—the electronic age need careful attention. We express to her our gratitude for doing that job with utmost diligence and patience. All chapters based on papers at the symposium were anonymously reviewed by two other participants and went through an extensive revision process. We are indebted to the reviewers. Most of all we would like to express our gratitude to Stephen Monsell, secretary of the Association. He proved to be the Keeper of the Seal of the Association, providing us with excerpts from the bylaws or information about its rules and traditions where applicable and with advice whenever issues beyond bylaws, rules, and traditions came up. WP and BH Munich and Leiden, March 2001
aapa01.fm Page x Wednesday, December 5, 2001 9:14 AM
This page intentionally left blank
aapa01.fm Page xi Wednesday, December 5, 2001 9:14 AM
The Attention and Performance Symposia Since the 1rst was held in The Netherlands in 1966, the Attention and Performance Symposia have become an established and highly successful institution. They are now held every two years, in a different country. The original purpose remains: to promote communication among researchers in experimental cognitive psychology and cognate areas working at the frontiers of research on ‘attention, performance, and information processing’. The format is an invited workshop-style meeting, with plenty of time for papers and discussion, leading to the publication of an edited volume of the proceedings. The International Association for the Study of Attention and Performance exists solely to run the meetings and publish the volume. Its Executive Committee selects the organizers of the next meeting, and develops the program in collaboration with them, with advice on potential participants from an Advisory Council of up to 100 members. Participation is by invitation only, and the rules of the Association1 are constructed to ensure participation from a wide range of countries, with a high proportion of young researchers, and a substantial injection of new participants from meeting to meeting. Held usually in a relatively isolated location, each meeting has four and a half days of papers presented by a maximum of 26 speakers, plus an invited Association Lecture from a leading 1gure in the 1eld. There is a maximum of 65 participants (incl. the current members of the executive committee and the organizers). There are no parallel sessions, and all participants commit themselves to attending all the sessions. There is thus time for substantial papers followed by extended discussion, both organized and informal, and opportunities for issues and ideas introduced at one point in the meeting to be returned to and developed later. Speakers are encouraged to be provocative and speculative, and participants who do not present formal papers are encouraged to contribute actively to discussion in various ways; for example, as formal discussants, by presenting a poster, or as contributors to scheduled discussion sessions. This intensive workshop atmosphere has been one of the major strengths and attractions of these meetings. Manuscript versions of the papers are refereed anonymously by other participants and external referees and published in a high-quality volume edited by the organizers, with a publication lag similar to many journals. Unlike many edited volumes, the Attention and Performance series reaches a wide audience and has considerable prestige. Although not a journal, it is listed in journal citation indices with the top dozen journals in experimental psychology. According to the Constitution, ‘Papers presented at meetings are expected to describe work not previously published, and to represent a substantial contribution . . .’ Over the years, contributors have been willing to publish original experimental and theoretical research of high quality in the volume, and this tradition continues. A&P review papers have also been much cited. The series has attracted widespread praise in terms such as ‘unfailingly presented the best work in the 1eld’ (S. Kosslyn, Harvard), ‘most distinguished series in the 1eld of cognitive psychology’ (C. Bundesen, Copenhagen), ‘held in high esteem throughout the 1eld because of its attention to rigor, quality and scope . . . indispensable to anyone
aapa01.fm Page xii Wednesday, December 5, 2001 9:14 AM
xii
The Attention and Performance Symposia
who is serious about understanding the current state of the science’ (M. Jordan, MIT), ‘the books are an up-to-the-minute tutorial on topics fundamental to understanding mental processes’ (M. Posner, Oregon). In the early days of the Symposium, when the scienti1c analysis of attention and performance was in its infancy, thematic coherence could be generated merely by gathering together the most active researchers in the 1eld. More recently, experimental psychology has rami1ed, ‘cognitive science’ has been born, and converging approaches to the issues we study have developed in neuroscience. Participation has therefore become interdisciplinary, with neuroscientists, neuropsychologists and computational modelers joining the experimental psychologists. Each meeting now focuses on a restricted theme under the general heading of ‘attention and performance’. Recent themes include Synergies in Experimental Psychology, Arti1cial Intelligence and Cognitive Neuroscience (USA, 1990), Conscious and Unconscious Processes (Italy, 1992), Integration of Information (Japan, 1994), Cognitive Regulation of Performance: Interaction of Theory and Application (Israel, 1996), and Control of Cognitive Processes (UK, 1998).
1
For more information about the Association and previous symposia, visit the webpage http://go.to/A&P
aapa01.fm Page xiii Wednesday, December 5, 2001 9:14 AM
Authors and Participants Alan Allport Dept. of Experimental Psychology University of Oxford South Parks Road Oxford OX1 3UD, UK
[email protected]
Bruce Bridgeman Psychology, Social Sciences 2 University of California Santa Cruz, CA 95064 USA
[email protected]
Gisa Aschersleben Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München, Germany
[email protected]
Umberto Castiello Department of Psychology Royal Holloway The University of London Egham, Surrey TW20 0EX, UK
[email protected]
Peter Beek Faculty of Human Movement Sciences Vrije Universiteit Van der Boechorststraat 9 NL-1081 BT Amsterdam
[email protected]
Asher Cohen Department of Psychology The Hebrew University Mount Scopus Jerusalem 91905 Israel
[email protected]
Harold Bekkering Experimental and Work Psychology University of Groningen Grote Kruisstraat 2/1 NL-9712 TS Groningen
[email protected] Matthew M. Botvinick Center for the Neural Basis of Cognition 115 Mellon Institute 4400 Fifth Avenue Pittsburgh, PA 15232 USA
[email protected]
Jonathan Cole Dept. of Clinical Neurophysiology Poole Hospital Long2eet Road Poole, BH15 2JB UK
[email protected] Laila Craighero Istituto di Fisiologia Umana Università di Parma Via Volturno, 39 I-43100 Parma, Italy
[email protected]
aapa01.fm Page xiv Wednesday, December 5, 2001 9:14 AM
xiv
Authors and Participants
Shai Danziger School of Psychology University of Wales Bangor, Gwynedd LL57 2DG, UK
[email protected] Jan De Houwer Department of Psychology University of Southampton High1eld Southampton SO17 1BJ, UK
[email protected] Roberto Dell’Acqua Department of Psychology University of Padova 8, Via Venezia I-35131 Padova, Italy
[email protected] Heiner Deubel Experimental Psychology Ludwig-Maximilians-University Leopoldstr. 13 D-80802 München, Germany
[email protected] John Duncan MRC Cognition and Brain Sciences Unit 15 Chaucer Road Cambridge CB2 2EF, UK
[email protected]
Luciano Fadiga Instituto di Fisiologia Umana Università di Ferrara Via Fossato di Mortara 17/19, I-44100 Ferrara Italy
[email protected] Uri Feintuch Department of Psychology The Hebrew University Mount Scopus Jerusalem 91905 Israel
[email protected] Leonardo Fogassi Istituto di Fisiologia Umana Università di Parma Via Volturno 39 I-43100 Parma Italy
[email protected] Liz Franz Department of Psychology University of Otago Box 56 Dunedin New Zealand
[email protected]
Martin Edwards Behavioral Brain Sciences Centre School of Psychology University of Birmingham Birmingham B15 2TT, UK
[email protected]
Luis Fuentes Dept. of Psychology Universidad de Almería E-04120 Almería Spain
[email protected]
Martin Eimer Department of Psychology Birkbeck College University of London Malet Street London WC1E 7HX, UK
[email protected]
Vittorio Gallese Istituto di Fisiologia Umana Università di Parma Via Volturno, 39 I-43100 Parma Italy
[email protected]
aapa01.fm Page xv Wednesday, December 5, 2001 9:14 AM
Authors and Participants
Jörg Gehrke Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München, Germany Daniel Gopher Industrial Engineering and Management Technion Haifa, 32000 Israel
[email protected] Michael Graziano Psychology Department Princeton University 1-E-14 Green Hall Princeton, NJ 08544 USA
[email protected] Steven Hackley University of Missouri Department of Psychological Sciences 210 McAlester Hall Columbia, MO 65211 USA
[email protected] Patrick Haggard Institute of Cognitive Neuroscience Department of Psychology University College London 17 Queen Square London WC1N 3AR, UK
[email protected] Eliot Hazeltine NASA Ames Research Center MS 262-4 Moffett Field, CA 94035 USA
[email protected]
Bernhard Hommel Section of Experimental and Theoretical Psychology University of Leiden P.O. Box 9555 NL-2300 RB Leiden
[email protected] Glyn Humphreys Behavioural Brain Sciences Centre School of Psychology University of Birmingham Birmingham B15 2TT, UK
[email protected] Tjeerd Jellema Helmholtz Research Institute Utrecht University Heidelberglaan 2 NL-3584 CS Utrecht
[email protected] Pierre Jolicœur Department of Psychology University of Waterloo Waterloo, Ontario N2L 3G1 Canada
[email protected] Jerome Scott Jordan Dept. of Psychology Illinois State University Campus Box 4620 Normal, IL 61790-4620 USA
[email protected] Nancy Kanwisher NE20-454 MIT Dept. of Brain and Cognitive Sciences 77 Mass Ave. Cambridge, MA 02138 USA
[email protected]
xv
aapa01.fm Page xvi Wednesday, December 5, 2001 9:14 AM
xvi
Authors and Participants
Mitsuo Kawato Dept. 3, ATR HIP Labs Hikaridai 2-2 Seikacyo, Sorakugun Kyoto 619-0288 Japan
[email protected]
Carmen de Labra Departamento de Psicología Universidad de La Coruña Elviña E-15071 La Coruña Spain
[email protected]
Dirk Kerzel Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München, Germany
[email protected]
Susan Lederman Dept. of Psychology Queen’s University Kingston, Ontario Canada K7L 3N6
[email protected]
Lothar Knuf Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München, Germany
[email protected]
Hartmut Leuthold Department of Psychology 58 Hillhead Street Glasgow G12 8QB Scotland UK
[email protected]
Sylvan Kornblum Mental Health Research Institute The University of Michigan 205 Zina Pitcher Place Ann Arbor, MI 48109-0720 USA
[email protected]
Gordon Logan Department of Psychology Vanderbilt University Nashville, TN 37240 USA
[email protected]
Ralf Krampe Max Planck Institute for Human Development Center for Lifespan Psychology Lentze-Allee 94 D-14195 Berlin, Germany
[email protected]
Dean Lusher Department of Psychology The University of Melbourne Parkville 3052 Victoria, Australia
[email protected]
Wilfried Kunde Lehrstuhl für Psychologie III Universität Würzburg Röntgenring 11 D-97070 Würzburg, Germany
[email protected]
Morena Mari Autism Unit Maggiore Hospital Bologna Italy
[email protected]
aapa01.fm Page xvii Wednesday, December 5, 2001 9:14 AM
Authors and Participants
Nachshon Meiran Department of Behavioural Sciences Ben-Gurion University of the Negev Beer-Sheva, 84105 Israel
[email protected]
David Perrett School of Psychology St Andrews University St Andrews, Fife, KY 16 9JU Scotland, UK
[email protected]
Stephen Monsell University of Exeter School of Psychology Washington Singer Laboratories Exeter EX4 4QG, UK
[email protected]
Herbert L. Pick jr. Institute of Child Development 55 East River Road University of Minnesota Minneapolis, MN 55455 USA
Jochen Müsseler Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München, Germany
[email protected] Dieter Nattkemper Institut für Psychologie Humboldt Universität zu Berlin Oranienburger Str. 18 D-10178 Berlin, Germany Dieter.nattkemper@psychologie. hu-berlin.de Kevin O’Regan Laboratoire de Psychologie Expérimentale Centre Universitaire de Boulogne 71, ave. E. Vaillant F-92774 Boulogne-Billancourt Cdx.
[email protected] Chris Oriet Department of Psychology University of Waterloo Waterloo, Ontario N2L 3G1 Canada Giuseppe di Pellegrino Centre for Cognitive Neuroscience University of Wales Bangor LL57 2AS UK
[email protected]
Jeannine Pinto Department of Psychology Pardee Hall Lafayette College Easton, PA 18042 USA
[email protected] Laure Pisella Neuropsychologie Cognitive Unité 534—Espace et Action INSERM—Institut National de la Santé et de la Recherche Médicale 16 avenue Lépine F-69676 Bron France
[email protected] Mary C. Potter Department of Brain and Cognitive Sciences Massachussetts Institute of Technology Cambridge, MA 02139 USA
[email protected] Wolfgang Prinz Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München Germany
[email protected]
xvii
aapa01.fm Page xviii Wednesday, December 5, 2001 9:14 AM
xviii
Authors and Participants
Robert Proctor Psychological Sciences Purdue University 1364 Psychology Building West Lafayette, IN 47907-1364 USA
[email protected]
David A. Rosenbaum Department of Psychology 642 Moore Building Pennsylvania State University University Park, PA 16802-3104 USA
[email protected]
Bruno M. Repp Haskins Laboratories 270 Crown Street New Haven, CT 06511-6695 USA
[email protected]
Yves Rossetti Neuropsychologie Cognitive Unité 534—Espace et Action INSERM—Institut National de la Santé et de la Recherche Médicale 16 avenue Lépine F-69676 Bron, France
[email protected]
Richard Ridderinkhof Department of Psychology University of Amsterdam Roeterstraat 15 1018 WB Amsterdam The Netherlands
[email protected] M. Jane Riddoch Behavioural Brain Sciences Centre School of Psychology University of Birmingham Birmingham, B15 2TT, UK
[email protected] John Rieser Psychology and Human Development Box 512, Peabody Vanderbilt University Nashville, TN 37203 USA
[email protected] Giacomo Rizzolatti Instituto di Fisiologia Umana Università di Parma Via Volturno 39 I-43100 Parma Italy
[email protected]
Raffaella Rumiati Cognitive Neuroscience Sector Scuola Internazionale Superiore di Studi Avanzati Via Beirut, n. 2-4 I-34014 Trieste, Italy
[email protected] Werner X. Schneider Experimental Psychology Ludwig-Maximilians-University Leopoldstr. 13 D-80802 München, Germany
[email protected] Maggie Shiffrar Department of Psychology Rutgers University 101 Warren Street Newark, NJ 07102, USA
[email protected] Jeroen Smeets Vakgroep Fysiologie Erasmus University of Rotterdam P.O. Box 1738 NL-3000 DR Rotterdam
[email protected]
aapa01.fm Page xix Wednesday, December 5, 2001 9:14 AM
Authors and Participants
Michael Spivey Department of Psychology Cornell University Ithaca, NY 14853 USA
[email protected] Prisca Stenneken Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 Munich Germany
[email protected]
Jeff Summers School of Psychology University of Tasmania GPO Box 252-30 Hobart, Tasmania 7001 Australia
[email protected] Michael Tombu Department of Psychology University of Waterloo Waterloo, Ontario N2L 3G1 Canada
Biljana Stevanovski Department of Psychology University of Waterloo Waterloo, Ontario N2L 3G1 Canada
Carlo Umiltà Dept. of General Psychology University of Padova Via Venezia, 8 I-35131 Padova, Italy
[email protected]
Gregory Stevens Department of Psychology University of California – Los Angeles 1285 Franz Mall Los Angeles, CA 90095-1563 USA
[email protected]
Fernando Valle-Inclán Department of Psychology University of La Coruña Campus Elviña E-15071 La Coruña Spain
[email protected]
Gijsbert Stoet Washington University School of Medicine 660 South Euclid Avenue Saint Louis, MO 63110 USA
[email protected]
Paolo Viviani Faculty of Psychol. and Educational Sciences University of Geneva 40, Boulevard du Pont d’Arve CH-1205 Geneva Switzerland
[email protected] and Faculty of Psychology UHSR University 58, via Olgettina I-20132 Milan Italy
[email protected]
Sonja Stork Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München Germany
[email protected]
xix
aapa01.fm Page xx Wednesday, December 5, 2001 9:14 AM
xx
Authors and Participants
Kim-Phuong Vu Psychological Sciences Purdue University 1364 Psychology Building West Lafayette, IN 47907-1364, USA
[email protected] Robert Ward Centre for Cognitive Neuroscience University of Wales Bangor LL57 2AS, UK
[email protected] Alan Wing Behavioural Brain Sciences Centre School of Psychology University of Birmingham Birmingham, B15 2TT, UK
[email protected] Andreas Wohlschläger Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München, Germany
[email protected]
Peter Wühr Max Planck Institute for Psychological Research Amalienstr. 33 D-80799 München Germany
[email protected]
N. Jane Zbrodoff Department of Psychology Vanderbilt University Nashville, TN 37240 USA
[email protected]
Michael Zießler University of Sunderland Business School Dept. of Psychology St. Peter’s Campus Sunderland SR6 0DD UK
[email protected]
aapa01.fm Page xxi Wednesday, December 5, 2001 9:14 AM
This page intentionally left blank
aapa01.fm Page xxii Wednesday, December 5, 2001 9:14 AM
5
2 1
3
8 7
4 6
19
14 17 9 16 13 10
31 30 29
20
39 38
21
15
22
40
33
28 27
11
52
32
34
45
42
53 51
41
37
46
58
47 44
59
54
50
26
12 23
55
35
24
36
25
43 48
61
49
60 56 57
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Hartmut Leuthold Wolfgang Prinz Patrick Haggard Alan Allport Gijsbert Stoet Mitsuo Kawato Michael Zießler John Duncan Ralf Krampe Elizabeth Franz Raffaella Rumiati Laila Craighero David Perrett Asher Cohen Shai Danziger Mary C. Potter
17 18 19 20 21 22. 23 24 25 26 27 28 29 30 31 32
Kevin O’Regan Stephen Monsell Robert Proctor Umberto Castiello Jochen Müsseler Gordon Logan Nachshon Meiran Luis Fuentes Paolo Viviani Carlo Umiltà Robert Ward Bruno Repp Jeff Summers Peter Beek J. Scott Jordan Martin Eimer
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Fernando del’ Valle-Inclán Vittorio Gallese Daniel Gopher Sylvan Kornblum Bruce Bridgeman Eliot Hazeltine Michael Graziano John Rieser Roberto Dell’Acqua Werner X. Schneider Susan Lederman Andreas Wohlschläger Jeroen Smeets Glyn Humphreys Pierre Jolicœur Nancy Kanwisher
49 50 51 52 53 54 55 56 57 58 59 60 61
Yves Rossetti Gregory Stevens M. Jane Riddoch Harold Bekkering Bernhard Hommel Maggie Shiffrar Michael Spivey David Rosenbaum Gisa Aschersleben Jan De Houwer Wilfried Kunde Alan Wing Richard Ridderinkhof
aapa01.fm Page xxiii Wednesday, December 5, 2001 9:14 AM
18
aapa01.fm Page xxiv Wednesday, December 5, 2001 9:14 AM
This page intentionally left blank
aapc01.fm Page 1 Wednesday, December 5, 2001 9:17 AM
Editors’ introduction
aapc01.fm Page 2 Wednesday, December 5, 2001 9:17 AM
This page intentionally left blank
aapc01.fm Page 3 Wednesday, December 5, 2001 9:17 AM
1
Common mechanisms in perception and action Introductory remarks Wolfgang Prinz and Bernhard Hommel
The contributions to this volume discuss a classical theme in human-performance research, and they do so under some new perspectives that have emerged in recent years. The classical theme refers to the interplay between perception and action—a theme that is, and has ever been, one of the core issues in the 1eld of attention and performance. For instance, in the classical work inspired by linear stage theory, notions like stimulus–response translation and/or response selection have been introduced to account for putative operations underlying the transition from stimulus- to response-related processing, and a number of factors affecting these operations have been identi1ed. Yet, despite their supposed central function, the study of translation mechanisms has always played a somewhat marginal role. Instead, research has tended to emphasize stimulus- over response-related processing, and there has been rather little interest in response-related processing mechanisms and the way they are linked to those dealing with stimulus information. In recent years, some new perspectives have emerged, suggesting both structural diversity and functional coherence in the interplay between perception and action. On the one face there is now substantial evidence from a large variety of neuroscience studies supporting diversity in the sense that interaction between perception and action may be going on in parallel in a number of pathways, and a variety of maps or modules for special computational purposes may be involved. On the other face there is also much evidence supporting a substantial degree of functional coherence within these modules—in the sense of questioning the classical separation between sensory, or stimulusrelated processing and motor, or response-related processing and calling for more overlap and integration between the two. Surprising interactions between perception and action have been observed in a number of both behavioral and neuroscience studies indicating that input and output may draw on tightly coupled, or perhaps even identical representations. At the same time, new theoretical frameworks and models have been proposed to meet the challenges inherent in these observations and account for these interactions, for example, in terms of shared mechanisms that draw on common representational resources. The aim of this volume is to gather these various approaches in an attempt to focus on structural and functional aspects of the architecture mediating between stimulus- and response-related processing, with an emphasis on both diversity due to its modular organization and coherence due to common mechanisms within modules.
aapc01.fm Page 4 Wednesday, December 5, 2001 9:17 AM
4
Common mechanisms in perception and action
The chapters in this volume are based on oral contributions to: ‘Attention and Performance, XIX: Common Mechanisms in Perception and Action’, a symposium held on behalf of the International Association for the Study of Attention and Performance (IASAP) at Kloster Irsee, Bavaria, Germany from July 16 to 22, 2000. At every Attention and Performance symposium it is customary to honor an eminent researcher’s distinguished contribution to the 1eld by an invitation to give the Association Lecture. For this symposium, IASAP’s Executive Committee invited Sylvan Kornblum to deliver the Association Lecture. In his Lecture, Kornblum presents evidence on interactions between stimulus–response compatibility and sequential order in choice-reaction tasks and discusses their implications for his Dimensional Overlap model. The Association Lecture is followed by 1ve sections devoted to different domains and forms of interactions between perception and action. The 1rst two sections are concerned with the classical domains of space and time, where issues related to the interplay between perception and action have long been topical. The third section deals with action perception and imitation, which has recently attracted converging attention in developmental and cognitive psychology, neurophysiology as well as clinical neuropsychology. The last two sections then address various forms of interaction between perception and action, partly going from input to output, partly taking the reverse perspective, and partly dealing with their integration. Since each section comes with an introductory overview of its own, we can here be very brief in sketching the varieties of the themes discussed in the sections and the underlying theoretical issues that link them together. Section I considers Space perception and spatially oriented action. Spatially adapted behavior requires reliable, high-precision alignment of perceptual space and action space. Traditional theories in this domain have therefore tended to invoke underlying representational structures that act as a common representational basis for both space perception and spatially oriented action. Over the past two decades this view has become increasingly challenged by clinical and experimental evidence suggesting parallel pathways and multiple maps in the brain as well as related dissociations between perception and action in behavioral performance. The contributions to this section present new evidence on the dialectic relationship between the diversity of pathways and maps and the functional unity of perception and action in space perception and spatially adapted behavior. Section II considers Timing in perception and action. Time is a dimension underlying both actorindependent events in the environment and actions, that is, actor-generated body movements. Like in the spatial domain, adaptive behavior requires high-precision alignment of the timing of actions to the timing of events, suggesting a common representational basis for events and actions and shared mechanisms for their timing. Moreover, since any representational operation and any neural activity carrying such operation is, in itself too, extended in time, the dimension of time has often been considered special in the sense that the representation of time is isomorphically grounded in the time of representation. Accordingly, the contributions to this section address mechanisms of timing and sequencing in perception and action as well as relationships between the representation of time and the timing of representational operations. Section III considers Action perception and imitation. Issues of action perception and imitation have recently become topical in neurophysiology, brain imaging, human development, and human performance. Studies from these 1elds differ considerably with respect to scope and aims, ranging from single-cell-based mechanisms involved in afferent and efferent processing to high-level mechanisms subserving the construction of mental selves. However, they do converge in suggesting close couplings, and even a considerable degree of equivalence between perceiving and producing actions. Correspondingly, processing theories in this 1eld tend to invoke shared representational
aapc01.fm Page 5 Wednesday, December 5, 2001 9:17 AM
Common mechanisms in perception and action: introductory remarks
structures for information from different modalities (e.g. vision and proprioception) or for representations of more abstract features shared by perceived and produced actions. The contributions to this section discuss new 1ndings from various approaches to action perception and imitation and assess their implications for understanding the underlying representational structures and processing mechanisms. Section IV considers Content-specific interactions between perception and action. Research on stimulus–response compatibility plays an increasingly important role in providing insights into both the processes and cognitive structures underlying the relationship between perception and action planning; into how stimulus and response codes are formed; how they speak to each other; and how their interactions change in the course of practice. Progress has also been made in tapping into the temporal dynamics of stimulus and response coding, and of the interactions between those codes, especially by applying increasingly sophisticated data-analysis techniques and by including psychophysiological measurements. Interestingly, recent investigations have revealed that stimulus– response relations do not only affect action planning, but perception as well. In fact, planning an action sometimes facilitates, sometimes interferes with, and sometimes even changes the perception of stimulus events, depending on the speci1c relation between planned action and perceived stimulus. The contributions to this section sketch the emerging picture of perception and action as the outcome of a dynamic interplay between content-speci1c codes, rather than a unidirectional 2ow of information from stimulus processing to motor execution. Section V considers Coordination and integration in perception and action. Perceptual and action-related structures and processes are tightly coupled and coordinated, and in several cases they share cognitive resources. However, resource sharing creates all sorts of capacity bottlenecks and binding problems—problems that are revealing with respect to how stimulus and response information is organized, integrated, and coordinated. Aspects of stimulus–response (or response–effect) coordination and integration have gained attention only recently, but both empirical evidence and theoretical insights are growing steadily. Accordingly, the contributions to this section provide a colorful but nicely converging overview of basic principles governing the integration of perception and action, and of actions and action goals, in and across stimulus and response processing, manual action and eye movements, and perception–action sequences.
5
aapc01.fm Page 6 Wednesday, December 5, 2001 9:17 AM
This page intentionally left blank
aapc02.fm Page 7 Wednesday, December 5, 2001 9:18 AM
Association lecture
aapc02.fm Page 8 Wednesday, December 5, 2001 9:18 AM
This page intentionally left blank
aapc02.fm Page 9 Wednesday, December 5, 2001 9:18 AM
2 Sequential effects of dimensional overlap: findings and issues Sylvan Kornblum and Gregory Stevens
Abstract. We begin this chapter by outlining some of the basic principles of the dimensional overlap (DO) model (Kornblum, Hasbroucq, and Osman 1990; Kornblum, Stevens, Whipple, and Requin 1999), spelling out how these principles generate a taxonomy of tasks, and showing how, based on these principles, the structure of four of these tasks can be represented by a common processing architecture, and performance with them accounted for. We then consider the effects of stimulus and response repetitions in choice reaction time (RT) tasks and the in2uence that DO has on this repetition effect. We report data from four experiments that demonstrate this in2uence with a prime-probe, trial pair procedure in which the relevant or irrelevant stimuli in either or both trials of the pair have DO and, in the case of relevant DO, repeat either physically or conceptually. The DO model is able to account for the results by postulating that the information requirements on repeated trials are less than on non-repeated trials. We call this the Information Reduction Hypothesis. When the relevant stimuli overlap, the repetition effects are accounted for by a reduction in either the stimulus and/or the response thresholds. When the irrelevant stimuli overlap, the repetition effects are accounted for by a reduction in the time needed to distinguish between relevant and irrelevant stimuli. Thus, depending on whether the relevant or irrelevant stimulus dimension has DO, one or the other of two parameters in the DO model is modi1ed, contingent on the occurrence of a repetition. Simulations, based on this implementation of the hypothesis in the DO model, 1t the experimental results well.
2.1 Introduction Thirty years ago, at the fourth International Symposium on Attention and Performance, one of us presented a tutorial on sequential effects in choice reaction time (RT) (Kornblum 1973). Ten years ago we published the initial version of the dimensional overlap (DO) model in which we addressed, what we viewed as, some of the basic issues in stimulus–stimulus (S–S) and stimulus–response (S–R) compatibility (Kornblum, Hasbroucq, and Osman 1990). In this chapter we would like to bring these two problem areas together theoretically and empirically. As will be evident, even though this effort has resulted in modest successes it has also uncovered some interesting problems that remain to be solved. This is roughly how the chapter is organized:
• We start with a brief description of the computational version of the DO model (Kornblum, Stevens, Whipple, and Requin 1999);
• this is followed by a set of experiments in which we look at basic sequential effects in tasks with and without DO between relevant stimuli and responses;
• we then present the DO model’s account of those results;
aapc02.fm Page 10 Wednesday, December 5, 2001 9:18 AM
10
Common mechanisms in perception and action
• this is followed by a second set of experiments in which we take a further look at sequential effects in tasks with and without DO between the relevant and irrelevant stimuli, and between the irrelevant stimuli and the responses;
• we then present the DO model’s account for those results; • we end with a summary and conclusions.
2.2 The dimensional overlap model 2.2.1 Representational component From the very outset, we have always made a sharp distinction between the representational and the processing parts of the model (see Kornblum et al. 1990). The representational component of a theory spells out how the phenomena to be explained are to be described and abstracted; the processing component speci1es a set of possible mechanisms that might account for these observations.1 At the heart of the representational component of the DO model is the notion of dimensional overlap (DO). This is de1ned as the degree to which stimulus and/or response sets are perceptually, conceptually, or structurally similar. Dimensional overlap is, therefore, an attribute of the mental representations of sets, and patterns of DO de1ne certain task properties. We have used these dimensional relationships as the basis of a taxonomy which, up to now, has identi1ed eight unique types of compatibility tasks (see Kornblum et al. 1999, for the most recent version of this taxonomy). In a poster shown at this meeting, Stevens (Stevens and Kornblum 2000) has extended this representational aspect of the model to include response effects and ends up with a taxonomy of over a dozen tasks. He also presents the results of simulations that demonstrate the critical role that DO and the patterns of dimensional relationships play in the functional interpretation of response effects. In this chapter, we shall focus on just four of these tasks. A task in which the set of relevant stimuli, or features, does not have DO with either the set of responses or with the set of irrelevant stimuli, or features, we call a Type 1 task. This is the basic choice RT task in which the relevant stimuli could, for example, be color patches presented in different shapes that are irrelevant, and the responses are key presses. In the context of S–R compatibility, this is a neutral task for which, in principle, any stimulus–response pairing is as good as any other pairing (see Fig. 2.1). When the DO is between the set of relevant stimuli and the set of responses, we call it a Type 2 task (e.g. Fitts and Seeger 1953). In the literature this is often referred to as a straightforward ‘stimulus–response compatibility’ (SRC) task. Depending on the S–R mapping rule, the individual stimuli in such tasks either do or do not match the responses; we call this S–R relation ‘stimulus–response (S–R) congruence’ (see Fig. 2.1). When the overlap is between the set of irrelevant stimuli and the set of responses, we call it a Type 3 task (see Kornblum and Lee 1995). When the overlapping dimension is spatial the literature refers to it as a ‘Simon task’ (see Simon 1990). We often refer to Type 3 tasks as ‘Simon-like’ when the irrelevant dimension is non-spatial. Because of the pattern of overlap, individual irrelevant stimuli are either consistent or inconsistent with the responses; we call this property ‘stimulus–response (S–R) consistency’ (see Fig. 2.1). When the overlap is between the set of relevant and irrelevant stimuli, it is a Type 4 task (see Keele 1967; Kornblum 1994). When the overlapping dimension is color, the literature often refers to it as a ‘Stroop’ task. This, we believe is an error that leads to confusion. ‘Stroop-like’ task, which is also
aapc02.fm Page 11 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.1 Dimensional relationships between relevant stimuli, irrelevant stimuli and responses that characterize 1ve of the eight tasks in the current DO taxonomy. Whenever any two aspects of a task have dimensional overlap they are joined by a line indicating the nature (S–S or S–R) and value (+/–) of the consistency or congruence relationship between them. Horizontal and vertical striations in the stimulus rectangles depict blue and green color patches respectively.
11
aapc02.fm Page 12 Wednesday, December 5, 2001 9:18 AM
12
Common mechanisms in perception and action
Table 2.1 Five of the eight task types in the DO taxonomy with indications in columns 2, 3, and 4 of the locus of overlap2 Task type
Overlapping relevant stimulus and response
Overlapping irrelevant stimulus and response
Overlapping irrelevant and relevant stimulus
#1 Neutral #2 SRC #3 Simon #4 Stroop-like #8 Stroop
No Yes No No Yes
No No Yes No Yes
No No No Yes Yes
often used, seems more accurate. The important criterion is that the irrelevant stimulus dimension overlap with the relevant stimulus dimension, and that this be the only overlap in the task. Because of the pattern of overlap, irrelevant stimuli on particular trials are either consistent or inconsistent with the relevant stimuli; we call this property ‘stimulus–stimulus (S–S) consistency’ (see Fig. 2.1). When DO is between the sets of irrelevant and relevant stimuli as well as the set of responses, and the dimension is the same, we call it a Type 8 task. In the literature, when that dimension is color, it is usually referred to as a ‘Stroop’ task—correctly this time (see McLeod 1991; Stroop 1935). Because of the pattern of overlap, the mapping instructions can be either congruent or incongruent; moreover, when the mapping is congruent, the individual irrelevant stimuli are consistent or inconsistent with both the relevant stimuli and the responses, which leads to a serious confounding. We have shown that these factors can be unconfounded by using incongruent S–R mapping with Stroop tasks (Zhang and Kornblum 1998; but see also Stevens and Kornblum 2001). Because we will be using the DO terminology throughout this article we have summarized it in Table 2.1. If these taxonomic classes have any functional signi1cance at all, then one would expect all tasks in the same taxonomic category to show the same pattern of effects regardless of the particular stimuli or responses used—and this, for the most part, has been veri1ed by the results of many studies in the literature (for a review, see Kornblum 1992). Based purely on this representational scheme, the DO model asserts that RT is generally faster for consistent than for inconsistent conditions, and the RT for congruent mapping is faster than for incongruent mapping. Differences in the magnitude of these effects occur between tasks, of course; most of these may attributed to differences in the degree of DO between sets.
2.2.2 The processing component The processing part of the model is where we have been proposing, what seemed to us, plausible sets of mechanisms that might underlie the compatibility effects observed in the family of tasks encompassed by the representational part of the model. Ten years ago, the model started out as a boxology. However, this was recently replaced by a connectionist architecture (Kornblum et al. 1999) where processing takes place in a system of interconnected modules, arranged in two layers: a stimulus layer and a response layer (see Fig. 2.2). Each stimulus and response module represents a dimension, or class of features. Within each module are individual units that represent the individual features of the stimulus or response. The activation of a unit within a module, therefore, represents activation of a feature along that dimension.
aapc02.fm Page 13 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.2 The three generic processing modules of a task (relevant stimuli, irrelevant stimuli, and responses) and the possible positive connections between them, according to the DO model. Negative connections are not shown (but see text).
2.2.2.1 Architecture and connectivity The connections between modules are of two types: automatic and controlled. Automatic lines, which have also been called Long Term Memory (LTM) connections (Barber and O’Leary 1997), connect modules that represent overlapping dimensions. These could both be stimulus dimensions, or one could be a response and the other a stimulus dimension—relevant or irrelevant. Controlled lines, which have also been called Short Term Memory (STM) connections (Barber and O’Leary 1997), are speci1ed by the task instructions instead of by the DO. They connect each unit in the relevant stimulus module with the correct unit in the response module. The strength of the signal sent over the automatic lines is a function of the level of stimulus activation, weighted by the degree of DO between the pair of connected modules. Because the activation level of the stimulus unit changes over time, the signal sent over the automatic lines changes over time as well, and is thus continuous. In contrast, the signal that is sent over the controlled lines is all or none, and may be said to represent a binary decision (for details see Kornblum et al. 1999). These simple architectural principles can be used to represent each of the tasks that we have described thus far (these are all illustrated in the ‘architecture’ column of Fig. 2.3). The 1rst task is a Type 1, neutral, task in which the relevant stimuli are color patches mapped onto left and right keypress responses. Because there is no DO in this task, there are no automatic connections. Controlled lines connect the relevant stimulus units to their assigned response units (see Fig. 2.3). Next is a Type 2 task, in which color stimuli are mapped onto color-name responses. As is true in Type 1 tasks, the controlled lines connect the relevant stimulus and response units in accordance with the task instructions. However, because of the dimensional overlap between the set of relevant stimuli and the set of responses, automatic lines also connect the relevant stimulus units to the response units. Whenever two modules represent overlapping dimensions, positive automatic lines connect corresponding units, and negative automatic lines connect non-corresponding units. Only the positive connections are shown in the 1gure. When the mapping instructions are congruent, both the automatic and the controlled lines connect each stimulus unit to its matching response unit (see
13
aapc02.fm Page 14 Wednesday, December 5, 2001 9:18 AM
14
Common mechanisms in perception and action
Fig. 2.3—Type 2, congruent mapping). In effect, then, each correct response unit receives two positive inputs: one from the controlled line, the other from the automatic line. When the mapping instructions are incongruent (see Fig. 2.3—Type 2, incongruent mapping), each correct response unit receives one positive input from the controlled line, and one negative input from the automatic line. As a result, the total net input to the correct response unit is less than in the congruent case. The same general rules apply to Type 3, Simon-like, tasks. Here, controlled lines connect the relevant stimulus units to their assigned response units; and, because the DO is between the irrelevant stimuli and the responses, positive automatic lines connect the irrelevant stimulus units to their corresponding response units, with negative connections between non-corresponding units, not shown here (see Fig. 2.3). Similarly for Type 4, Stroop-like, tasks: controlled lines connect the relevant stimulus and response units; with positive automatic connections between the irrelevant stimulus and their corresponding relevant stimulus units, and negative connections between non-corresponding irrelevant and relevant stimulus units (see Fig. 2.3). To get a clearer picture of how this architecture and pattern of connectivity works in processing information we need to spend a brief moment on the details of activation in individual units.
2.2.2.2 Activation and information 2ow According to the model, inputs to both the relevant and irrelevant stimulus units start at the same value—say 1. The input to the relevant unit remains at one. The input to the irrelevant unit starts decaying at a 1xed rate shortly after onset. The time (identi1ed by the parameter τ in Kornblum et al. 1999) between when these two inputs begin and when their values start to diverge is the time the system takes to distinguish between the relevant and the irrelevant input.3 Whether one believes that attention remains focused on the relevant input and is withdrawn from the irrelevant input after this distinction is made, or that the irrelevant input just gradually decays away, is not a question that we deal with in this paper. Suf1ce it to say that this decrease in the irrelevant stimulus input is a critical property of the model that enables it to account for the time-course and distributional properties of reaction times in S–S and S–R consistency tasks (see Kornblum et al. 1999). Given these two sources of input, the activation levels in the relevant and irrelevant stimulus units change over time according to a gradual, time-averaging activation function. Given a constant input, as in the case of the relevant stimulus unit, activation gradually increases and asymptotically approaches the input level. With a decreasing input, as in the case of the irrelevant stimulus unit, activation is an inverted U-shaped function of time (see Kornblum et al. 1999 for the details). Now let us examine the actual 2ow of information over time for each of the tasks that we have listed (these are illustrated in the ‘activation’ column in Fig. 2.3). The three fundamental steps to keep in mind are:
• the stimulus is presented; • the input is turned on; and • activation accumulates. Consider a Type 1, neutral, task 1rst. The information 2ow in this task can be thought of as a baseline, or generic activation 2ow, as speci1ed by the model (see Fig. 2.3). It represents the simplest instance of the three basic steps: the stimulus is presented causing the input to the relevant stimulus unit to turn on which, in turn, causes activation in the relevant stimulus unit to start accumulating
aapc02.fm Page 15 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.3 Processing architecture and activation patterns for 2-choice tasks (used for illustrative purposes only) of Types 1, 2, 3, and 4 showing the congruent/incongruent cases for Type 2, and the consistent/inconsistent cases for Types 3 and 4 tasks. Whenever two modules represent overlapping dimensions, positive automatic lines connect corresponding units, and negative automatic lines connect non-corresponding units. Only the positive connections are shown in this 1gure. The rectangles in the architecture column represent modules, the circles represent features. Shaded circles indicate activated feature units. Horizontal and vertical striations in the stimulus rectangles depict blue and green color patches, respectively. The vertical dotted line in the activation column marks the combined duration of the stimulus and response units for the neutral, Type 1 task; this is included for purposes of comparison. (*) This curve depicts the decaying, relevant stimulus activation value after it has reached threshold. (**) This curve depicts the level of activation for the irrelevant stimulus (see also Fig. 2.4).
15
aapc02.fm Page 16 Wednesday, December 5, 2001 9:18 AM
16
Common mechanisms in perception and action
until it reaches threshold. When this threshold has been reached, it indicates that the stimulus has been fully identi1ed. At this point, and not before, the controlled line sends a discrete ‘on’ signal (equal 1) from the relevant stimulus to the correct response unit. Because this input is 1, activation in the response unit accumulates in exactly the same fashion as it did in the relevant stimulus unit. Once it reaches its threshold, the response is considered fully selected, and the overt response is initiated. Because of the existence of a threshold in both the relevant stimulus and the response units, and because the controlled line sends a discrete ‘on’ signal from the stimulus to the correct response unit, reaction times can be partitioned into two distinct, stage-like, intervals: a stimulus identi1cation time, and a response selection time. This discrete characteristic was present in the boxology (Kornblum et al. 1990), and is retained in the PDP version of the model. The total time taken from the moment the relevant stimulus is presented to when the response unit reaches threshold, is de1ned in the model as the RT, and is what the model simulates as the RT. As will be apparent, the activation patterns in all the other tasks in the taxonomy are modi1cations of this basic pattern. Consider a Type 2 task next. Processing in the stimulus module is, of course, identical to what it was for the Type 1 task. Stimulus is presented, input is turned on, activation accumulates, and threshold is reached in the relevant stimulus unit (see Fig. 2.3). Once the stimulus activation reaches this threshold, it is no longer needed and starts decaying back to zero. This decay was utterly inconsequential in Type 1 tasks, because the only signal being sent from the stimulus to the response unit was the ‘on’ signal on the controlled line. That same ‘on’ signal is now being sent along the controlled line in the Type 2 task as well. However, because of the DO between the relevant stimulus set and the response set, an automatic signal is also being sent from the stimulus unit to the response unit. The strength of that signal, you will recall, is proportional to the amount of activation in the stimulus unit, so that as activation level changes, so does the strength of that signal. When the mapping instructions are congruent, the positive automatic signal goes to the same response unit that is getting the ‘on’ signal via the controlled line. Thus, even though activation in the stimulus unit is decaying, the total positive input to the response unit is high, and activation in that unit accumulates quickly. When mapping is incongruent, stimulus processing remains the same. However, the response unit that is getting the ‘on’ signal via the controlled line is now connected to the stimulus unit by a negative automatic line. As a result, the activation in the stimulus unit is subtracted from the total input to the response unit, making its activation accumulate more slowly. This, of course, increases the total reaction time which, when compared to the RT with congruent mapping, is the mapping effect or, as it is known in the literature, ‘the SR compatibility effect’. We now come to tasks in which there is dimensional overlap between the irrelevant stimulus dimension and some other dimension of the task, either the response or the relevant stimulus (see also Kornblum et al. 1999, Fig. 7). Type 3 tasks are those in which the irrelevant stimulus dimension overlaps with the response. The stimulus identi1cation stage in these tasks is no different from what it is in the two tasks that we just 1nished discussing (Types 1 and 2). However, simultaneously with the presentation of the relevant stimulus, we are now also presenting an irrelevant stimulus. The basic three-step activation process is still in place, but is now modi1ed to take this new fact into account: the stimulus is presented; inputs to both relevant and irrelevant stimulus units are turned on, and activation accumulates in both the relevant and irrelevant stimulus units. Because there is no DO between the relevant and irrelevant stimuli, the irrelevant stimulus has no in2uence on processing in the relevant stimulus unit. So, when relevant stimulus activation reaches threshold, the controlled line sends its discrete ‘on’ signal to the correct response unit, just as it did in the Type 1 neutral task.
aapc02.fm Page 17 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.4 A: Input to the relevant and irrelevant stimulus units shown as a function of time. The irrelevant input starts decreasing after a duration (τ). B: Activation functions in the relevant and irrelevant stimulus units shown as a function of time.
However, there is DO between the irrelevant stimuli and the responses, which means that the irrelevant stimulus units have automatic connections to the response units. On S–R consistent trials, these automatic connections with the correct response units are positive, which means that activation in the correct response unit gets a boost, and accumulates more rapidly, thus reaching threshold sooner (see Fig. 2.3). On S–R inconsistent trials, these automatic connections between the irrelevant stimuli and the correct responses are negative. The net effect of this is to slow the rate of accumulation of activation for the correct response, thus causing it to reach threshold more slowly. It is this processing difference between S–R consistent and S–R inconsistent trials that, according to our model, generates the S–R consistency, or Simon, effect. Exactly the same argument holds for Type 4 tasks when the overlap is between the relevant and irrelevant stimuli: the stimulus is presented; inputs to both relevant and irrelevant stimulus units are turned on; activation accumulates in both the relevant and irrelevant stimulus units; but now because the DO is between the irrelevant and relevant stimuli, they have automatic connections between them, so that the irrelevant stimulus does in2uence processing in the relevant stimulus unit. Evidence in support of this assumption has recently been reported by Stevens (2000). On S–S consistent trials the input of the irrelevant stimulus to the corresponding relevant stimulus unit is positive, which means that activation in that unit accumulates faster than it would without this added input, thus reaching threshold sooner. On S–S inconsistent trials, instead of providing positive input to the relevant stimulus, the input of the irrelevant stimulus is negative, thus slowing the rate of accumulation of activation for the relevant stimulus. Once activation reaches threshold in the relevant stimulus unit an on signal is sent to the correct response along the controlled line, just as in the Type 1 neutral task. To summarize: 1. Activation and information 2ow in the DO model consists of three basic steps: (a) a stimulus is presented; (b) input is turned on; (c) activation accumulates in the relevant, and possibly irrelevant, stimulus units.
17
aapc02.fm Page 18 Wednesday, December 5, 2001 9:18 AM
18
Common mechanisms in perception and action
2. When activation reaches the stimulus identi1cation threshold: (a) the controlled line sends a discrete signal to the correct response unit; and (b) the relevant stimulus activation level starts decaying back to zero. 3. This process is repeated in the response unit until the response threshold is reached. 4. The strength of the automatic signal being sent from a stimulus unit to either a response or to another stimulus unit is a function of the level of stimulus activation weighted by the level of DO between the sets of relevant stimuli and responses, or the two sets of stimuli. 5. This means that when there is either S–R or S–S overlap, activation in the irrelevant stimulus unit produces either facilitation (in the consistent case) or interference (in the inconsistent case). There are many technical details of the model that we have not presented here that an interested reader may 1nd elsewhere (Kornblum et al. 1999). We now turn to the second theme of our tale: sequential effects.
2.3 Sequential effects 2.3.1 Introduction In his classic book on reaction times Luce observed that: ‘ . . . sequential effects . . . have a major impact on . . . response times . . . any model or experiment that ignores this or fails to predict it surely is incomplete and likely wrong, as well.’ (Luce 1986, p. 405). Because of our current work on the DO model, together with our past interest in sequential effects, it should come as no surprise that we should have been doubly attentive to Luce’s admonition. In this second portion of our chapter, therefore, we shall be looking into the DO model to see whether it is suf1ciently complete, at least in principle, to account for sequential effects. This goal is doubly appealing, for to be able to demonstrate this would: (1) extend and provide further validation of the DO model, and (2) account for some of the sequential effects that up to now have proven dif1cult to explain. The DO model as it stands, like most (non-learning) models of human performance, assumes that successive trials are independent. That is, every trial starts fresh, unaffected by the history of earlier trials in the block. Empirically, we have known for a long time that this is patently false and that sequential effects of all sorts permeate the data. By ‘sequential effects’ we mean that: ‘If a subset of trials can be selected from a series of consecutive trials on the basis of a particular relationship that each of these selected trials bear to their predecessor(s) in the series, and the data for that subset differ signi1cantly from the rest of the trials, then these data may be said to exhibit sequential effects’ (Kornblum 1973, p. 260). De1ned in this way, the term ‘sequential effects’ covers many different phenomena including stimulus and/or response repetitions and non-repetitions (of 1rst and higher orders), task switching, set or einstellung effects, etc. The sequential effects that we consider in this chapter are 1rst-order stimulus and/or response repetitions and non-repetitions, in which relevant and irrelevant stimuli do or do not have DO. As will be evident, task switching effects are also present in our data; however, because of space limitations these are not discussed. Readers interested in pursuing them should consult recent reviews (e.g. Allport, Styles, and Hsieh 1994; Monsell and Driver 2000).
aapc02.fm Page 19 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
2.3.2 Summary of earlier empirical 1ndings Sequential effects in RT tasks were 1rst reported by Hyman (1953). Shortly thereafter, Bertelson and his colleagues published a series of in2uential studies in which they described important properties of this new phenomenon (e.g. Bertelson 1961, 1963, 1965; Bertelson and Renkin 1966; Bertelson and Tysseyre 1966). Studies by other investigators followed that veri1ed and extended many of Bertelson’s original 1ndings, and added new observations as well. Some of these are brie2y summarized below:4 1. Given equiprobable stimuli and responses, the RT for repetitions is faster than for non-repetitions (e.g. Bertelson 1961). 2. Given equiprobable stimuli and responses, the size of the repetition effect (where ‘repetition effect’ is de1ned as the difference in RT between non-repetitions and repetitions) is greater for incompatible than for compatible tasks (Bertelson 1963). This is principally due to the increase in RT with incompatible tasks being greater for non-repetitions than for repetitions (see Kirby 1980 and Kornblum 1973 for reviews). 3. Given equiprobable stimuli and responses, the RT for repetitions and non repetitions is inversely related to the probability of these transitions. In the case of two-choice tasks this often results in the RT for non-repetitions (often called ‘alternations’ in two-choice) being faster than for repetitions (e.g. Hyman 1953; Kornblum 1969; Williams 1966). 4. The magnitude of the repetition effect increases as the number (k) of equiprobable stimuli and responses increases. This is due primarily to the fact that increasing (k) increases the RT for non-repetitions much more than for repetitions (even though the probability of non-repetitions increases with (k)—see Kornblum 1969, 1973 for more detail). 5. Repetition effects extend beyond immediate, 1rst order repetitions and non-repetitions up to about fourth order (e.g. see Remington 1969, 1971). 6. The probability of error is usually higher for non-repetitions than for repetitions (e.g. Falmagne 1965; for a review see Luce 1986). 7. The response to stimulus interval (RSI) has extensive, albeit dif1cult to systematize, effects on the magnitude the repetition effect (see Kirby 1980; Kornblum 1973; Luce 1986; Soetens 1998). This list is not intended to be exhaustive. However, it includes the principal 1ndings that investigators in the area regard as having been reasonably well established.
2.3.3 Summary of earlier accounts As the empirical 1ndings accumulated, various proposals were made to account for different aspects of the data. Most of these explanations fall into one of two major lines of argument 1rst formulated by Bertelson (1961). He suggested that sequential effects might need to be accounted for by two different types of mechanisms: the 1rst, based on the subjects’ ‘expectation of’, hence ‘preparation for’, certain events; the second, an otherwise unspeci1ed ‘residual effect’ generated
19
aapc02.fm Page 20 Wednesday, December 5, 2001 9:18 AM
20
Common mechanisms in perception and action
by one trial that facilitated repetitions on the next trial. Both mechanisms, according to Bertelson (1961), were sensitive to changes in RSI: the effects of expectation increased with RSI, whereas the residual effects decreased with RSI. As these conjectures were elaborated, expectation came to be viewed as a controlled, or strategic component, while residual effects were viewed as an automatic part of the process. This dichotomy has held up fairly well, supported in part by the data of Soetens and his colleagues (Soetens 1998; Soetens, Boer, and Hueting 1985) whose work has focused on substantiating and spelling out the conditions under which one or the other component would be evident. Further support has also comes from ERP (event related potentials) data (e.g. Leuthold and Sommer 1993; Squires, Wickens, Squires, and Donchin 1976). Some have characterized the automatic component in terms of activation, or sensory stimulation produced by one stimulus that leaves a trace so that if the next stimulus is the same it gets a boost by being superimposed on the traces of the 1rst (e.g. Vervaeck and Boer 1980). Others have characterized it in terms of repeated stimuli being able to bypass some of the processing stages (e.g. Bertelson 1965). Others still, speak of the stimulus (or response) on one trial priming the occurrence of the same stimulus (or response) on the next trial. None of these conjectures is spelled out in suf1cient detail to be tested, however. The most detailed model of sequential effects was constructed by Falmagne and his colleagues (Falmagne 1965; Falmagne and Theios 1969; Falmagne, Cohen, and Dwivedi 1975). Falmagne bases his model on the notion of preparation and treats preparation in the conceptual framework of a memory search. According to the model, the relative position of an item in a memory stack determines the probability with which a subject is prepared, or not prepared, for that item: the higher in the stack, the more prepared and the shorter the RT. Quantitative predictions of the model are well supported by their data. The DO model, and the extensions made to it to accommodate the sequential data, do not 1t easily into either camp, as we shall see.
2.4 Overview of the experiments 2.4.1 Objectives Our first objective, and the issue of greatest concern and interest to us in this chapter, was to examine the interaction of the repetition effect with SRC (Bertelson 1963; Kirby 1976; Schvaneveldt and Chase 1969) wherein the increase of RT with incompatible tasks is greater for non-repetitions than for repetitions. Bertelson (1963) originally accounted for this result in terms of a processing short cut that favors repetitions. He suggested that the 1rst thing a subject does when presented with a stimulus is check to see if it matches the stimulus on the previous trial. If the match is con1rmed, stimulus processing is bypassed and the response made on the previous trial is retrieved from memory and made again on this trial. If there is no match, processing proceeds until the correct response is identi1ed and executed. Because, by assumption, this processing is more complex, hence more time consuming, for incompatible than for compatible tasks, incompatibility will increase the RT for non-repetition more than for repetitions. On the surface this reasoning seems straightforward. However, nowhere is the underlying processing structure made explicit. The particular subprocesses that are being short-circuited on the one hand, and those that increase in complexity on the other, therefore, remain vague and dif1cult to identify. Our 1rst experiment is explicitly designed to address this issue.
aapc02.fm Page 21 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Our second objective was to examine and compare DO and sequential effects in terms of their underlying stimulus coding processes. In particular, DO, whether based on physical or conceptual similarity, produces robust compatibility effects. Some of the best known examples of the effects of conceptual similarity are the Stroop (Type 8) and Stroop-like tasks in which the presence of color words interferes with the processing of physical colors even though these two aspects of the stimulus are physically very different. Sequential effects have been reported for the repetition of physically identical stimuli (Bertelson 1961), the repetition of categories (Marcel and Forrin 1974), and the repetition of S–R mapping rules (Shaffer 1965). However, the effects of repeating conceptually similar but physically different stimuli are not known. Because of the functional signi1cance of conceptual similarity in the DO model, it was important to learn what role, if any, this type of similarity plays in the production of sequential effects. This question was addressed in Experiment 1. In the ‘same carrier’ condition, the stimuli on the prime and probe trials were physically identical; i.e. they were either both color patches, color words, digits, etc. In the ‘different carrier’ condition the stimuli on the prime and probe trials were conceptually similar but physically different; e.g. if the stimulus on the prime was a color patch, then the stimulus on the probe was a color word, etc. Our third objective, and concern, not unrelated to the 1rst, is the question of the locus of the repetition effect. As is true of many issues in this area, this question was 1rst broached by Bertelson (1963) using a task in which he mapped different pairs of stimuli onto each of two responses. This generated three types of transitions: ‘identical’, in which both the stimulus and the response of one trial were repeated on the next trial; ‘equivalent’, in which only the response of the preceding trial was repeated on the next trial; and ‘different’, in which neither the stimulus nor the response of one trial was repeated on the next. The logic was simple: the effect of stimulus repetition was obtained by subtracting identical from equivalent RTs, and the effect of response repetition was obtained by subtracting equivalent from different RTs. Based on this procedure Bertelson concluded that the principal component of sequential effects was the repetition of the response. Pashler and Bayliss (1991), using a three-response task with the same basic paradigm, reached the same conclusion. According to Soetens (1998), however, whether one attributes sequential effects to stimulus or to response repetitions depends on RSI: response processes appear to be responsible at long RSIs, and stimulus processes at short RSIs. The basic logic of this many-to-one procedure is brought into question by Smith (1968), who reported the results of an experiment in which the equivalent RT, instead of lying between the identical and different RTs, was actually longer than the different RT. Rabbitt (1968) also reported that the relative position of the equivalent RT, between identical and different, changed with training. Overall, therefore, the locus of the repetition effect appears to be an important open question. Furthermore, based on the results of our 1rst experiment we shall conclude that both stimulus and response repetitions play a critical role in the repetition effect. Our second experiment will follow up these results and address the locus question using a one-to-one rather than a many-to-one procedure. Our fourth objective was to examine the effects of repeating and not repeating irrelevant stimuli in task Types 1, 3, and 4. We know of no other work that explicitly addresses this question in tasks with more than two choices. Experiments 3 and 4 do so.
2.4.2 General procedures 1. All experiments use four-choice tasks. The basic experimental unit is the trial pair. The 1rst trial in such pairs is called the prime, the second trial the probe. The stimulus transition probabilities
21
aapc02.fm Page 22 Wednesday, December 5, 2001 9:18 AM
22
Common mechanisms in perception and action
between primes and probes were randomized and balanced. The time interval between primes and probes (700 ms and 1500 ms) was blocked. The data that are reported for the two shortest RPIs are from the probe trials; the data for RSI = 3000 ms are from prime trials. 2. It is clear from the literature that the effects of RSI are capricious and problematic (see e.g. Kirby 1980; and Kornblum 1973, Table 3). Yet they cannot be ignored. In order to present a more complete empirical picture of the phenomena that we are investigating we included RSI in the design of our experiments (700 ms and 1500 ms within pairs for all four experiments, and 3000 ms between balanced pairs for Experiments 1 and 2). However, because of space limitations, we analyze the details and discuss the data for the shortest RSI only. One general observation that can be made is that the longer the RSI the slower the overall RT. The individual results of RSI for particular experiments are presented in Appendices B–D. 3. Errors are reported in Appendix A for all experiments.
2.5 Experiment 1 In this 1rst experiment we were interested in examining the interaction between sequential effects and stimulus–response compatibility. Our experimental prime–probe pairs consisted of either Type 1 or Type 2 pairs.
2.5.1 Methods 2.5.1.1 Time line A trial (whether it was a prime or probe) began with a warning signal. Seven hundred milliseconds later, the stimulus was presented and was terminated by the response. The prime-to-probe interval was either 700 ms or 1500 ms and was constant for a block. The time interval between prime–probe pairs was always three seconds. 2.5.1.2 Stimuli and responses The stimuli were presented on a CRT screen, and consisted of either four color words (RED, BLUE, GREEN, and YELLOW) or four rectangular color patches (red, blue, green, and yellow). The responses were verbal and consisted of either four color names (‘red’, ‘blue’, ‘green’, and ‘yellow’) or four-digit names (‘two’, ‘four’, ‘six’, and ‘eight’). When the responses were color names it was a Type 2 task for which the mapping was either congruent (e.g. RED → ‘red’), or incongruent (RED → ‘blue’). When the responses were digit names it was a Type 1 task, for which the mapping was neutral (e.g. RED → ‘two’). 2.5.1.3 Same/different carrier The stimuli on the prime and probe trials were either color words or color patches. In our illustration (see Fig. 2.5) we use color patches as the prime stimuli (however, note that the experiment included another set of prime–probe pairs in which the prime stimuli are color words). Following this illustration, a Type 1 prime with a color patch stimulus was followed by a Type 1 probe with either a color patch or a color word as the stimulus. In the same carrier condition, if the probe stimulus was a color patch, the nature of the probe stimulus remained what it was on the prime (color patch–color patch); this is true whether it was a repetition or a non-repetition. In the different carrier condition if the probe stimulus was a color-word, the nature of the probe stimulus changed from what it was on
aapc02.fm Page 23 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.5 The different prime–probe pairs for Type 1 and Type 2 tasks in Experiment 1 for which the prime stimuli were color patches. (There is another set of prime–probe pairs (not shown) for which the prime stimuli were color words; same/different carrier transitions with these prime stimuli are simply the reverse of what is shown on this 1gure.) The particular colors used are for illustrative purposes only; horizontal striations indicate the color blue, diagonal striations the color red. Same carrier transitions (see text for explanation) are indicated by a dotted line, different carriers by a solid line. Whether the probe was a repetition or a non-repetition is indicated on the right. In Type 2 tasks the prime could be either congruent or incongruent, as shown. Similarly, the probe, in addition to being a congruent repetition or non-repetition, could also be an incongruent non-repetition; these are marked as ‘congruent’ or ‘incongruent’, respectively, on the right.
the prime (color patch–color word); this is true whether it was a repetition or a non-repetition. This same/different carrier designation was, of course, reversed when the prime stimulus was a color word. The Type 2 task had exactly the same properties. A prime with a color patch stimulus and congruent or incongruent mapping was followed by a probe with either a color patch or a color word as the stimulus. Same and different carrier conditions were de1ned in precisely the same manner as they were for the Type 1 tasks.
2.5.1.4 Conditions, blocks, and procedure There were three groups of six subjects each: congruent mapping (Type 2), incongruent mapping (Type 2), and neutral (Type 1). The incongruent mapping group was further divided into three subgroups, each with a different S–R mapping. The neutral group also included three subgroups each
23
aapc02.fm Page 24 Wednesday, December 5, 2001 9:18 AM
24
Common mechanisms in perception and action
Fig. 2.6
Results of Experiment 1 for RSI = 700 ms.
with its own S–R mapping. Each subject in each group was run on six experimental blocks of 32 trials each at one RSI, followed by six more experimental blocks at the other RSI. This order was balanced. At the start of each series of trials each subject was run on one practice block. The prime–probe transition frequencies were balanced within 64 prime–probe pairs presented in two sub-blocks of 32 pairs. Mapping (which included DO, i.e. task type), carrier, and repetition were factorially combined and constituted the three principal independent variables of the experiment.
2.5.1.5 Subjects Eighteen University of Michigan students participated in this experiment. They were all right handed, native English speakers with self reported normal hearing and vision, and tested normal color vision. They were volunteers and were paid for their participation.
2.5.2 Results The principal results that we report for this and all the other experiments are for the shortest RSI (700 ms). (For the results of different RSIs see Appendices B–D.)
2.5.2.1 Same carrier We start with the results for the same carrier condition. In Type 1 tasks the RT for repetitions is 100 ms faster than for non-repetitions [F(1, 3) = 94.72, p < 0.0023]—no surprises. In Type 2 tasks a number of things should be noted: 1. The overall RT with congruent mapping is 335 ms faster than with incongruent mapping [F(1, 10) = 82.95, p < 0.0001], and the RT for the neutral mapping is almost exactly half way in between: neutral vs. congruent [F(1, 10) = 76.62, p < 0.0001], neutral vs. incongruent [F(1, 10) = 12.96, p < 0.0049].
aapc02.fm Page 25 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.7 Activation pattern, according to the Residual Activation Hypothesis, for a probe trial in a Type 1 task in which neither (upper panel) or both (lower panel) the stimulus and the response repeat. 2. There is also a highly signi1cant interaction between mapping, and repetition. When the S–R mapping is congruent, there is a signi1cant 21 ms difference between repetitions and non repetitions [F(1, 5) = 12.11, p < 0.0176]; when it is incongruent this difference is 235 ms [F(1, 3) = 22.63, p < 0.0176]; and when it is neutral it is in between, at 100 ms [F(1, 3) = 94.72, p < 0.0023]. These results generally replicate earlier 1ndings in the literature.
2.5.2.2 Different carrier We turn to the results with different carrier next. Recall that in the different carrier condition when the prime stimulus is a color patch the probe stimulus is a color word, and vice versa. The basic results that we obtained with the same carrier condition replicate: the overall RT for congruent mapping is 382 ms faster than for incongruent mapping [F(1, 10) = 124.04, p < 0.0001], with the neutral mapping condition falling between the two: neutral vs. congruent, [F(1, 10) = 53.47, p < 0.0001]; neutral vs. incongruent, [F(1, 10) = 22.88, p < 0.0007]. As was also true in the same carrier condition there is a highly signi1cant interaction between mapping, and repetition. When the S–R mapping is congruent the difference between repetitions and non repetitions is not statistically signi1cant (1 ms) [F(1, 5) = 0.13 p < 0.7315]. When it is incongruent, it is 132 ms [F(1, 3) = 9.48, p < 0.0542], and when it is neutral it is in between at 55 ms [F(1, 3) = 12.06, p < 0.0403]. Note that the mapping effect for repetitions with different carriers (264 ms) is larger than with same carriers (155 ms). This generates the highly signi1cant triple interaction between carrier, mapping, and repetition [F(1, 10) = 20.35, p < 0.001]. 2.5.3 Discussion: the Information Reduction Hypothesis The model must now show that it can account for the following: (1) the effects of repetition; (2) the interaction between repetition and S–R mapping; and (3) the interaction between repetition, S–R mapping, and carrier.
25
aapc02.fm Page 26 Wednesday, December 5, 2001 9:18 AM
26
Common mechanisms in perception and action
Fig. 2.8 Activation pattern, according to the Information Reduction Hypothesis, for a probe trial in a Type 1 task in which neither (upper panel) or both (lower panel) the stimulus and the response repeat. Note that repetitions, instead of having a higher starting point, as in the Residual Activation Hypothesis (Fig. 2.7), have a lower threshold. See text for the implications of this difference.
Common to all the ‘automatic’ accounts of the repetition effect, as we have seen, is the notion that having performed a certain action, residual traces (e.g. memory, perceptual, or response traces) are left that automatically facilitate the processing of subsequently repeated stimuli or responses. One easy way of implementing this view in our model would be to represent this trace as residual activation left over from the previous trial that has not yet decayed all the way to zero by the time the current trial begins (see Fig. 2.7). Let us call this the ‘Residual Activation Hypothesis’.5 Given the head-start provided by the residual activation, activation levels in the stimulus and response units would reach threshold earlier than they otherwise might. This would reduce overall processing time, hence overall RT—thus producing a repetition effect. Simple?—Yes; correct?—Unfortunately, no. This scheme can be shown to account for the effects of repetitions, the effects of mapping, and their interaction. However, it cannot account for the interaction with carrier which is one of the striking aspects of our results. We will now argue that what we are calling the ‘Information Reduction Hypothesis’, can. It is a property of the DO model that each individual stimulus and response unit has its own threshold. According to the information reduction hypothesis, whenever a stimulus is identi1ed or a response made or selected, the amount of information required to identify that stimulus or select that response again is temporarily lowered. This is implemented as a decrease in the stimulus or response threshold associated with the appropriate unit (see Fig. 2.8). If the same stimulus is presented again, or the same response is selected on the next trial, activation has a shorter way to go before it reaches this lower threshold, and the processing durations of the stimulus or response units are consequently reduced.
aapc02.fm Page 27 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.9 Activation patterns, according to the Information Reduction Hypothesis, for probe trials in Type 1 and Type 2 tasks for same carrier condition where, by hypothesis, both or neither the stimulus and the response repeat. As illustrated here, for the non-repetitions, activation on probe trials is identical to activation on primes. Note that not only is the overall RT for repetitions faster than for non-repetition (the repetition effect), but the mapping effect is smaller as well. At 1rst glance these two hypotheses seem equivalent. Exactly the same reduction in processing time is achieved by postulating residual activation as is by lowering thresholds. However, these two hypotheses have profoundly different consequences further down the line (see * in Fig. 2.3, Type 2 tasks).
2.5.3.1 Same carrier We start with the results from the same carrier condition. According to the information reduction hypothesis, performance in the three different mapping conditions is determined as follows: In Type 1 tasks, when the mapping is neutral, the only factor in2uencing the RT is whether the stimulus or response repeat or not. When neither repeat, the thresholds of the stimulus and response units are identical on probe and prime trials. The RTs on probe trials are, therefore, the same as on primes. When both the stimulus and the response repeat, the thresholds of both the stimulus and the response units are lower on probe trials than they were on prime trials. The RTs on probe trials are, therefore, faster than on primes. This is the baseline, the basic repetition effect (see Fig. 2.9, Type 1). In Type 2 tasks, when the mapping is either congruent or incongruent two factors come into play: the 1rst is whether the stimulus or response repeat or not, and we just saw how this factor affects RT just by itself in the Type 1 task where the mapping is neutral; the second is the facilitation and interference produced by the congruent and incongruent mappings. When neither stimulus nor response repeat, the thresholds of the stimulus and response units are identical on probe and prime trials—just like in Type 1 tasks. Irrespective of mapping, the RTs on probes are the same as on primes. When both the stimulus and response repeat, the thresholds of both the stimulus and response units are lower on probe trials than they were on primes. The effect of this lowered threshold in the stimulus unit is identical for the congruent, incongruent, and neutral mappings (see Fig. 2.9). However,
27
aapc02.fm Page 28 Wednesday, December 5, 2001 9:18 AM
28
Common mechanisms in perception and action
because of the DO between the relevant stimulus and the response, this lower stimulus threshold affects the input to the response unit. In particular, the lower threshold reduces the level to which stimulus activation rises, hence from which it starts to decay (see * in Fig. 2.3, Type 2 tasks). Therefore, the automatic input to the response unit is less when the stimulus repeats than when it does not repeat. In addition, because the rate at which activation accumulates in the response unit is faster for congruent than for incongruent mapping, lowering the response selection threshold has differential effects for congruent and incongruent mapping: it decreases the RT for both, however, it decreases the RT for incongruent mapping more than for congruent mapping. The observed interaction between repetition and S–R mapping is, therefore, the result of lowering both the stimulus and the response threshold.
2.5.3.2 Different carrier Now consider the different carrier condition. This is when the prime stimulus is a color word and the probe stimulus is a color patch, or vice versa. We shall look at the repetition effect 1rst, and the mapping effect next. We’ve already seen that in Type 1 tasks, when the mapping is neutral, the only factor in2uencing RT is whether or not the stimulus or the response repeat. When neither repeat, according to the model, the RTs for same (682 ms) and different (690 ms) carriers ought to be identical—which they are; this 8 ms difference in the data is not statistically signi1cant [F(1, 5) = 1.63, p < 0.2583]. The Type 2 results are not as clean. When the mapping is congruent and neither stimulus nor response repeat, the RT for same carrier (484 ms) is faster than for different carrier (490 ms), and this 6 ms difference is statistically signi1cant [F(1, 5) = 11.47, p < 0.0195]. The same is true for the incongruent mapping. The RT for non-repetitions with the same carrier (853 ms) is faster than with the different carrier (885 ms); and this 32 ms difference is also statistically signi1cant [F(1, 5) = 6.75, p < 0.0484]. It seemed reasonable to us to attribute these small differences to the cost of switching between carriers (i.e. from color patch to word and vice versa). For consider, these differences are for total nonrepetitions, which means that neither the stimulus nor the response on the prime are repeated on the probe. Therefore, the most plausible contrast between the same and different carrier conditions that might account for this difference in RT is that repetition of the carrier itself had an effect. This interpretation of the data is consistent with the fact that same carrier RTs were faster than different carrier RTs. Consider the repetitions next. Recall that repetitions with different carriers were 49 ms slower than with same carrier. This probably re2ects the difference between the physical and the possible conceptual repetition, or total non-repetition of the stimulus. However, we had no way of assessing whether conceptual repetition contributed in any way to the repetition effect in this experiment. In our modeling, therefore, we treated repetition trials in different carrier conditions as pure response repetitions (see Fig. 2.10). Thus, when only the response repeats, as it does by hypothesis, the threshold of the response unit is lower on the probe than on the prime and RTs on probe trials are, therefore, faster than on primes, but not as fast as with probes in the same carrier condition where both the stimulus and response are repeated. We go to the mapping effects next. When the mapping is either congruent or incongruent, we again have two factors coming into play: repetition and mapping. On non-repetition trials, when neither stimulus nor response are repeated, probe RTs are the same as on prime trials by the same argument that we made in the case of same carrier. On repetition trials, when only the response is repeated, only the threshold for the
aapc02.fm Page 29 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.10 Activation patterns for probe trials in Type 1 and Type 2 tasks for different carrier condition where, by hypothesis, the stimulus does not repeat but the response may. As was true for the same carrier condition, the non-repetition probe trials are identical to the primes. The overall RT for repetitions is faster than for non-repetitions. However, the repetition effect is smaller here than it was for the same carrier condition (Fig. 2.9). This is because here only the response repeats. Note also, that as was true for the same carrier condition, the mapping effect is smaller for repetitions than for non-repetitions, but this reduction is not as large here as it was for the same carrier condition. response unit is lower on the probe than on the prime trial. This means that stimulus activation will start its decaying process from a higher level in the different carrier than in the same carrier condition producing a larger mapping effect thus generating the triple interaction between repetition, mapping, and carrier. Based on these principles we used the model to simulate the results for Experiment 1. As can be seen on Fig. 2.11 they appear to match the empirical data reasonably well. Thus, based on our results and model, repetitions of both stimulus and response play an important role in the production of the repetition effect. However, as we have seen, previous studies that have explicitly addressed the question of locus (e.g. Bertelson 1963) concluded that the bulk of the repetition effect lies with the repetition of the response. Our next experiment addresses this issue directly.
2.6 Experiment 2 Here we examine the locus of the repetition effect. Because of our interest in the interaction of S–R compatibility with the repetition effect, our primary focus will be on Type 2 tasks. However, we shall also be looking at the effects of repetitions and non-repetitions for Type 1 and Type 2 tasks, each preceded by the other. Consider a Type 2 probe—for example, one where the stimulus is drawn from a set of color patches and the response from a set of color names. Now consider the Type 1 prime preceding this probe, where
29
aapc02.fm Page 30 Wednesday, December 5, 2001 9:18 AM
30
Common mechanisms in perception and action
Fig. 2.11 Simulated data for Experiment 1 are shown as solid lines; the empirical data are shown as dashed lines for comparison. either the stimulus is drawn from a set of color patches, with digit names as responses, or the response is drawn from a set of color names with digits as the stimuli. Depending on which of these two primes one chose, the probe in such a prime–probe pair would display either a stimulus repetition or a response repetition, but not both. This is precisely how we designed the prime–probe pairs for this experiment.
2.6.1 Methods 2.6.1.1 Time line The temporal relationships within and between trials were the same as they were in Experiment 1. 2.6.1.2 Stimuli and responses The stimuli were presented on a CRT screen, and consisted of either four rectangular color patches (red, blue, green, and yellow) or four digits (2, 4, 6, 8). The responses were verbal and consisted of either four color names (‘red’, ‘blue’, ‘green’, and ‘yellow’) or four digit names (‘two’, ‘four’, ‘six’, and ‘eight’). For Type 1 tasks, we used either color patch stimuli and mapped them onto digit name responses, or digit stimuli and mapped them onto color name responses. For Type 2 tasks, we used the same sets of stimuli and responses but paired them differently: color patch stimuli were mapped onto color name responses—congruently and incongruently, and digit stimuli were mapped onto digit name responses—congruently or incongruently. 2.6.1.2.1 Type 1 → Type 2, and Type 2 → Type 1, prime–probe pairs Consider 1rst the case in which the prime-probe pairs consisted of Type1–Type 2 tasks, respectively. In our illustrations (Fig. 2.12) we show the stimuli and responses for the probe as color patches and color names respectively; of course, as was true of experiment 1, in this experiment there was another set of probe trials for which the stimuli and responses consisted of digits and digit names.
aapc02.fm Page 31 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.12 Illustrative stimuli and responses for different prime–probe pairs in Experiment 2 in which the primes are Type 1 and the probes are Type 2 tasks. For this illustration the prime stimuli are color patches. There is another set of prime–probe pairs (not shown) for which the prime stimuli are digits. With those primes, the S–R pairing on probe trials remains the same as what is shown except that what are here identi1ed as response set switches (see text) become stimulus switches and vice versa. The particular colors and digits are for illustrative purposes only. Horizontal striations indicate the color blue, diagonal striations the color red. Response set switches are indicated by a dotted line, stimulus set switches are indicated by a solid line. Whether the probe is a stimulus or response, repetition or non-repetition is indicated on the right. Stimulus repetitions with Type 2 color probes, whether congruent or incongruent, were produced by Type 1 primes for which the stimuli were color patches and the responses digit names. Similarly, response repetitions with Type 2 color probes were produced by Type 1 primes for which the responses were color names and the stimuli were digits. When the order of task types in the prime–probe pairs was reversed, and consisted of Type 2– Type 1 tasks, respectively (see Fig. 2.13), the procedure for obtaining stimulus and response repetitions was identical to what we just saw. For example, if the Type 1 probe stimulus was a color patch and the response a digit name, a stimulus repetition was produced by a Type 2 color prime, with either congruent or incongruent mapping. Similarly, a response repetition was produced by a Type 2 digit prime, with either congruent or incongruent mapping.
2.6.1.3 Conditions, blocks, and procedures Thirty-two subjects participated in the experiment; for half the subjects the prime stimulus was color, for the other half it was digits. These two groups were further subdivided into four groups of
31
aapc02.fm Page 32 Wednesday, December 5, 2001 9:18 AM
32
Common mechanisms in perception and action
Fig. 2.13 Illustrative stimuli and responses for different prime–probe pairs in Experiment 2 in which the primes are Type 2 and the probes are Type 1 tasks. For this illustration the prime stimuli are color patches. There is another set of prime–probe pairs for which the prime stimuli are digits. With those primes, the S–R pairing on probe trials remains the same as what is shown except that what are here identi1ed as response set switches become stimulus switches (see text) and vice versa. The particular colors and digits are for illustrative purposes only. Horizontal striations indicate the color blue, diagonal striations the color red. Response set switches are indicated by a dotted line, stimulus set switches are indicated by a solid line. Whether the probe is a stimulus or response, repetition or non-repetition is indicated on the right.
four subjects each. Each of these subgroups was identi1ed by the mapping instructions that it received for the Type 2 task: one subgroup was given the congruent mapping, and each of the other three subgroups was given a different incongruent mapping. Each of the four subjects in these subgroups received a different mapping for the Type 1 task. Each subject was run on six experimental blocks of 32 trials each at one RSI, followed by six more experimental blocks at the other RSI. This order was balanced. Half the subjects in each subgroup of four started with a Type 1 prime, the other half started with a Type 2 prime. Prime stimulus and S–R mapping were between subject variables. At the start of each series of trials each subject was run on one practice block. The prime–probe transition frequencies were balanced within 64 prime–probe pairs presented in two sub-blocks of 32 pairs.
2.6.1.4 Subjects Thirty-two University of Michigan students volunteered for the experiment and were paid for their participation. They were all right-handed, native English speakers with self-reported normal hearing and vision. Their color vision tests were normal.
aapc02.fm Page 33 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.14 Results of Experiment 2. The left panel shows the data when the prime–probe pair consisted of Type 1–Type 2 tasks, respectively; the right panel shows the results when the prime–probe pairs consisted of Type 2–Type 1 tasks. ‘Congruent’ and ‘incongruent’ refer to the S–R mapping for the Type 2 tasks, irrespective of order. The dotted lines are the data for the stimulus repetitions and non-repetitions, when the response set was switched; the solid lines are the data for the response repetitions and non-repetitions, when the stimulus set was switched.
2.6.2 Type 1 → Type 2 2.6.2.1 Results First we present the results for the Type 2 probes preceded by Type 1 primes averaged over the digit and color patch stimuli (see Fig. 2.14). 1. As expected, there is a highly signi1cant effect of mapping: the RT for congruent mapping is over 365 ms faster than for incongruent mapping [F(1, 30) = 44.31, p < 0.0001]. 2. The interaction between mapping and the size of the repetition effect is also signi1cant [F(1, 30) = 5.48, p < 0.0260]: when the S–R mapping is congruent, the repetition effect is 25 ms. [F(1, 23) = 9.06, p < 0.0197]; when it is incongruent it is 96 ms [F(1, 23) = 31.01, p < 0.0001], a fourfold increase. 3. Note that when we speak of repetition effects in this experiment we are speaking of repetitions of either the stimulus or the response, with corresponding shifts in response and stimulus sets respectively (see Fig. 2.14). That is, when a stimulus repetition or non-repetition occurs (the dotted lines in Fig. 2.14), the prime stimulus and the probe stimulus are both drawn from the same stimulus set (they are either both color patches or both digits). In contrast, the prime response and the probe response are each drawn from different response sets (digit names for the one and color names for the other, or vice versa). This means that the subject must shift from one response set to another (i.e. digit names to color names or vice versa). The symmetric situation holds for response repetitions and non-repetitions (the solid lines in Fig. 2.14). Here, the responses on the prime and probe are
33
aapc02.fm Page 34 Wednesday, December 5, 2001 9:18 AM
34
Common mechanisms in perception and action
both drawn from the same response set (they are either both color names or both digit names) and it is the prime and probe stimuli that are each drawn from a different stimulus set (color patch for one and digit for the other, or vice versa). This means that subjects must shift from one stimulus set to another (digits to colors, or vice versa). These shifts appear to have exacted a cost. Shifting from one response set to another (digit names to color names, or vice versa) takes about 27 ms longer than shifting from one stimulus set to another (digits to color patches or vice versa) [F(1, 30) = 6.76, p < 0.0143]. These set shifting costs (25 ms for congruent mapping ([F(1, 7) = 10.15, p < 0.0154], and 29 ms for incongruent mapping [F(1, 7) = 4.32, p < 0.0489]) are additive with the effects of repetition and mapping [set-shift × congruence : F(1, 30) = 0.03, p < 0.8643].
2.6.2.2 Discussion Our results, in contrast to earlier reports, and using a different experimental paradigm, show that stimulus and response repetition effects are both fairly large, and roughly equal in size. The locus of the repetition effect thus appears to be equally apportioned between stimulus and response processes. These 1ndings also discon1rm a prediction recently made by Hommel (1998; Hommel et al. in press). In a recent paper in which he extends the notion of feature integration (Treisman 1988) to include action features to construct, what he calls ‘event 1les’, Hommel (1998; Hommel et al. in press) makes speci1c predictions about the relative costs and bene1ts of certain kinds of repetitions. In particular, according to Hommel’s view, if one takes as a baseline the total non-repetition condition, when neither the stimulus nor the response are repeated, then the RT for the total repetition condition, when both the stimulus and the response are repeated, should show a distinct bene1t. The partial repetition condition, on the other hand, when either the stimulus or the response, but not both, are repeated, would show no bene1t, at best, and possibly a cost. Our results, in which we obtain clear bene1ts from stimulus repetitions and response repetitions, each in the absence of the other, are clearly inconsistent with these predictions. The results of the interaction between mapping and repetition are qualitatively similar to the results obtained in Experiment 1. That is, the probe which was a Type 2 task, unsurprisingly, behaved like a Type 2 task: there was a very large mapping effect (365 ms), and the repetition effect was much smaller for the congruent (25 ms) than for the incongruent (96 ms) mapping conditions. There are, of course, differences between the two experiments as well. First, note that the overall RT in this experiment is marginally longer than in Experiment 1 [F(1, 40) = 3.38, p < 0.0735]. (Even though this difference appears to interact with mapping, this between-subjects difference is not statistically signi1cant [F(1, 40) = 0.16, p < 0.6899].) Recall that in the present experiment the Type 2 probes were preceded by Type 1 primes. In the previous experiment these Type 2 probes were preceded by primes that were Type 2 as well and also used the same stimulus and response sets. In this experiment the RT for congruent mapping was 79 ms slower than in the different carrier condition of Experiment 1, and the overall RT for incongruent mapping was 149 ms. slower than in Experiment 1. We suggest that these differences are attributable to task switching. Let us anticipate the results of the second half of this experiment in which the primes were congruent/incongruent Type 2 tasks, and the probes were Type 1. There we 1nd a similar effect, with the primes and probes reversed: the cost of switching from a congruent prime to a neutral probe is much less than the cost of switching from an incongruent prime to the same neutral probe. So, whether one switches to or from a trial with congruent mapping, RT appears to be faster than when one is switching to or from a trial with incongruent mapping.
aapc02.fm Page 35 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.15 Simulation of the results of Experiment 2 with Type 2 probes. Panel A shows the simulated results when the effects of stimulus and response set switching is not taken into account. Panel B shows the simulated results when a constant is added for set switching: 60 ms and 80 ms have been added for the stimulus and response set switches, respectively, for the congruent conditions; 165 and 200 ms have been added for the stimulus and response set switches, respectively for the incongruent conditions. Panel C shows the empirical results on the same scale (these are the same data as are shown on Fig. 2.14). The dotted lines are the data for the stimulus repetitions and non-repetitions, when the response set was switched; the solid lines are the data for the response repetitions and non-repetitions, when the stimulus set was switched.
Another result that we believe is due to switching is the 1nding that switching from one response set to another generated a longer RT than switching from one stimulus set to another. One 1nal observation worth noting is that Rogers and Monsell (1995) reported that in their study the response repetition effect vanishes following a task switch. We, on the other hand, obtained a robust response repetition effect in Type 2 probe trials following a Type 1 prime. As we indicated in the introduction to these experiments we have not tried to make our model account for either this or any other effects of switching. We would, of course, have liked to use the same parameter values for the simulation of these results as we used to 1t the data of the previous experiment. However, because of the effects of task and set switching, and their interactions, we were unable to do that. Nevertheless, the new simulations capture the repetition effects quite nicely as is evident by comparing the slopes of the empirical data in Panel C (Fig. 2.15) with the slopes of the simulated data in Panel A (Fig. 2.15). If we now treat the vertical displacements as due to task and set switching, and add these as arbitrary constants (Panel B, Fig. 2.15), which we confess is far from theoretically satisfying, then the overall 1t is quite good.
2.6.3 Type 2 → Type 1 2.6.3.1 Results Next we turn to the results of the Type 1 probes preceded by Type 2 primes (see Fig. 2.14, right panel). 1. One of the most striking aspects of these data is the large and reliable effect that mapping of the preceding prime had on these neutral probes. This is a result that we have already alluded to: when
35
aapc02.fm Page 36 Wednesday, December 5, 2001 9:18 AM
36
Common mechanisms in perception and action
the mapping for the prime was congruent, the RT for the probe was 92 ms faster than when the mapping was incongruent [F(1, 30) = 4.19, p < 0.0494]. 2. Also, the time to switch between response sets was 100 ms longer than the time to switch between stimulus sets [F(1, 30) = 40.20, p < 0.0001]. This, of course, was much larger than the set switching effect that we had seen with Type 2 probes, and was observed with incongruent primes only [F(1, 23) = 43.18, p < 0.0001]; when the prime was congruent, set switching failed to have a signi1cant effect [F(1, 23) = 0.01, p < 0.9228]. 3. Finally we note that in Experiment 1, when Type 1 probes are preceded by the same type primes, so that there was no task switching, the overall RT is much faster than in this experiment— as expected. Now we come to the repetition effects. When the prime was congruent, the repetition effect was 54 ms [F(1, 7) = 43.6, p < 0.0003]; when the prime was incongruent it was 31 ms [F(1, 23) = 10.31, p < 0.0039]. The difference between these two repetition effects is not signi1cant [Rep × Cong interaction: F(1, 30) = 1. 72, p < 0.1999]. Thus, even though the congruence and incongruence of the Type 2 primes seem to have in2uenced the overall RT of the Type 1 probe, they did not have a differential in2uence on the repetition effect. Because these effects all appear to be due to switching of one kind or another we shall have nothing further to say about them from the point of view of the DO model, and we leave them with the reader to ponder as empirical results that pose theoretical puzzles.
2.7 Irrelevant stimuli and sequential effects In this next section we shall look at the results of some experiments in which the prime and probe trials have irrelevant stimuli that either do or do not overlap with some other aspect of the task. These are either neutral Type 1 tasks in which there is no DO, Type 3, or Simon tasks, in which the irrelevant stimulus dimension overlaps with the response, or Type 4, Stroop-like tasks, in which the irrelevant stimulus dimension overlaps with the relevant stimulus dimension. All these experiments used the same procedures; we will, therefore, describe them just once at the start. As was true of all the experiments up to now, the experimental unit consisted of trial pairs: a prime and a probe trial.
2.7.1 General procedures 2.7.1.1 Stimuli and responses In all cases the relevant stimuli were the letters B, J, Q, Z. The responses consisted of joystick movements up, down, left or right. The S–R mapping was arbitrary. The irrelevant stimuli were presented as 2ankers to the left and right of the relevant letters, and differed depending on the task type. There were four possible irrelevant stimuli for each task type, which generated 16 different stimuli of each type: For the Type 1 tasks, the irrelevant stimuli were diacritical marks and a plus sign (#, %, &, +); for the Type 3 tasks, the irrelevant stimuli were up, down, left, and right arrows; and for the Type 4 tasks, the irrelevant stimuli were the letters B, J, Q, Z. 2.7.1.2 Experimental factors and design Given the task type (Type 1, 3, or 4) for the prime and probe, the factors of interest were:
aapc02.fm Page 37 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
1. the consistency of the prime and probe; 2. the repetition/non-repetition of this consistency state; 3. the repetition/non-repetition of the relevant and/or irrelevant stimulus; 4. RSI. Because in two-choice tasks some of these factors are confounded with each other as well as with negative priming, we used four-choice tasks and constructed a transition matrix (see Fig. 2.16) in which the 1rst three factors were explicitly represented. This matrix, which revealed a surprising number of constraints, makes the confounding that necessarily occurs in two-choice tasks very clear. For example, whether the relevant stimulus in a two-choice task repeats or not, the repetition/non-repetition of the irrelevant stimulus is confounded with the repetition/non-repetition of consistency and negative priming preconditions. This matrix was the starting point for the design of all our experiments.
2.8 Experiment 3 In this 1rst experiment we were interested in examining the question of whether the consistency or inconsistency effects of probe trials was affected by the consistency or inconsistency of primes. According to the DO model there is no reason why such contingencies should occur. However, such effects have been reported in the literature for Type 3 tasks, so we wanted to verify these reports before proceeding (e.g. Mordkoff 1998).
2.8.1 Design We used Type 3 and Type 4 tasks, presented in different experimental blocks. Each block also contained a Type 1 task. The prime in a Type 3 block was, therefore, S–R consistent, inconsistent, or neutral. The prime in a Type 4 block was S–S consistent, inconsistent, or neutral. The probe, similarly, was either consistent, inconsistent, or neutral. Each block, therefore, contained nine different primeto-probe transitions, whether it was a Type 3 or a Type 4 block (see Fig. 2.17). From the master transition matrix (see Fig. 2.16) it was also evident that in order for these nine conditions to be comparable and not be confounded with other variables, neither the relevant nor the irrelevant stimuli of the prime could be repeated in the probe. Each block included four randomized instances of each of the nine prime-to-probe transitions; there were two blocks per task type and RSI (700 and 1500 ms). Twelve subjects participated in the experiment. Four different mappings were used for each task type, and mapping was as between subjects variable.
2.8.2 Results We start with the results of the Type 3 task (see Fig. 2.18). When both the prime and the probe are Type 3 tasks, there is a highly signi1cant S–R consistency effect of 80 ms that is totally immune to differences in the consistency of the prime [consistent prime, 87 ms; inconsistent prime, 73 ms; the prime × probe interaction is not signi1cant: F(1, 8) = 0.25, p < 0.6275]. However, when the prime is neutral, the consistency effect of the probe jumps to 140 ms. This 75% increase is achieved by having both a faster RT for consistent probes, and a slower RT for inconsistent probes. When the probe is
37
aapc02.fm Page 38 Wednesday, December 5, 2001 9:18 AM
38
Common mechanisms in perception and action
Fig. 2.16 Generic prime-to-probe transition matrix for four-choice tasks with irrelevant stimuli. The capital letters A, B, C, and D designate either relevant stimuli or responses, depending on the task being represented; for Types 1 and 4, they represent relevant stimuli, for Type 3 they represent responses. This generated sixteen large square areas representing all the transitions between the four capital letters (relevant stimuli for Types 1 and 4, or responses for Type 3). The lower-case letters a, b, c, d designate irrelevant stimuli that overlap either with the relevant stimuli (Type 4) or with the responses (Type 3). The lower-case letters w, x, y, z designate irrelevant stimuli that have no DO with any aspect of the task (Type 1). Each of these large square areas is thus subdivided into four quadrants that represent the two-by-two combination of overlapping (DO) and non-overlapping (N) prime–probe pairs: DO→DO; DO→N; N→N; and N→DO. Inside these four quadrants are sixteen individual cells identi1ed by the letters c and i, as well as by dashes; they have the following meaning: ‘c’ stands for a consistent trial, ‘i’ for an inconsistent trial, and ‘-’ for a neutral trial. These letters and dashes appear in pairs, where the 1rst position in the pair denotes the nature of the prime
aapc02.fm Page 39 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
neutral, the RT falls between the RTs for consistent and inconsistent probes and is completely unaffected by the prime [F(1, 8) = 0.07, p < 0.8008]. We turn next to the results of the Type 4 task (see Fig. 2.18). When both the prime and the probe are Type 4 tasks, there is a highly reliable S–S consistency effect of 62 ms [F(1, 8) = 6.70, p < 0.0322] which is not signi1cantly altered by neutral primes (59 ms) [F(1, 8) = 0.44, p < 0.5260]. (There also appears to be a 31 ms interaction between prime and probe consistency which, however, is not signi1cant [F(1,8) = 0.82, p < 0.3907] and appears to be due entirely to the effects of the prime on inconsistent probes: the RTs for consistent probes all fall within a range of 11 ms for the various primes that they are paired with. The RTs for inconsistent probes span a range of 47 ms.) To summarize. First, just as there are clear differences in performance between Type 3 and Type 4 tasks in terms of overall mean RTs, we again see differences in performance between these task types when merely considering the sequence of consistent inconsistent trials. Second, the results for Type 3 tasks are clear and systematic: the consistency or inconsistency of the prime has no effect whatsoever on the size of the consistency effect of the probe. However, whether or not the prime has DO has an enormous in2uence on the size of the S–R consistency effect: a neutral prime almost doubles the size of that effect. The results with the Type 3 task are inconsistent with Mordkoff’s (1998) earlier reports. However, as we have indicated, these reports, which are based on two-choice data, may have included confoundings between the repetition effects of relevant and irrelevant stimuli with other factors in the experiment. The results with Type 4 tasks are not as clean and, obviously, need further work.
2.9 Experiment 4 In this next experiment we examine the basic repetition effects of relevant and irrelevant stimuli in four-choice tasks of Types 1, 3, and 4.
trial and the second position the nature of the probe. Thus, for example, a ‘cc’ in a cell identi1es this cell as the transition between a consistent prime and a consistent probe; a ‘ci’ cell would be the transition between a consistent prime and an inconsistent probe; ‘i-’ would be the transition between an inconsistent prime and a neutral probe, etc. We now come to the repetition/non-repetition properties of the prime–probe pairs in this matrix. The only repetitions of relevant stimuli or response occur in the four large square areas on the main diagonal. The remaining twelve large, offdiagonal, square areas represent non-repetitions of relevant stimuli or responses. The cells on the main diagonal of these sixteen large square areas all represent repetitions of the irrelevant stimuli; the off-diagonal cells are all non-repetitions of the irrelevant stimuli. This matrix makes it relatively easy to identify some transitions with special properties that may be interesting. For example, consider the large A × A square area. The 1rst column represents transitions in which the irrelevant stimulus on the probe trial is the same as either the relevant stimulus, or the response on the prime. The 1rst row represents transitions in which the irrelevant stimulus on the prime becomes the relevant stimulus, or the response on the probe. Each of the sixteen large square areas has one row and one column with these same properties. Other interesting transitions may be those in which the relevant and irrelevant stimuli on the prime are switched on the probe; by de1nition, of course, these can only occur in the large, off-diagonal squares.
39
aapc02.fm Page 40 Wednesday, December 5, 2001 9:18 AM
40
Common mechanisms in perception and action
Fig. 2.17 Basic design for Experiment 3 showing the nine different prime-to-probe transitions in a block. Task types were blocked so that some experimental blocks had task Types 3 and 1, and other blocks had task Types 4 and 1.
Fig. 2.18 Results of Experiment 3 for Type 3 and Type 4 blocks; each of these blocks included Type 1 neutral trials. On the abscissa are the three values of the probe: consistent, inconsistent and neutral. The parameters for the data lines are the nature of the prime: circles ( ) indicate consistent primes; squares ( ) indicate inconsistent primes; triangles ( ) indicate neutral primes.
2.9.1 Methods and procedures The relevant stimuli were the four letters and the responses were the up, down, left, and right movements of a joystick. The irrelevant stimuli differed depending on the type of task: for the Type 1 task,
aapc02.fm Page 41 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
they were diacritical marks, for the Type 2 task they were directional arrows, and for the Type 4 task they were letters (see the general description of the stimuli and responses at the beginning of this section). The Type 1 task was run on one group of 12 subjects, the Type 3 and 4 tasks were run on another group of 12 subjects in a balanced order. For each task type there were four different mappings, each assigned to a different group of subjects. We used a simple 2 × 2 design: the repetitions and non-repetitions of the relevant stimuli were crossed with the repetitions and non-repetitions of the irrelevant stimuli. In order to obtain this factorial combination in the Types 3 and 4 tasks, both the prime and the probe trials in each pair had to be inconsistent; in the Type 1 task, of course, this issue was moot (see Fig. 2.16). There were two experimental blocks for each task type and RSI value (700 ms and 1500 ms). Each block contained eight randomized presentations of the four repetition/non-repetitions primeto-probe transitions for a total of 32 pairs.
2.9.2 Results The results of the Type 1 task are illustrated in Fig. 2.19. There was a highly signi1cant repetition effect of 90 ms [F(1, 8) = 38.26, p < 0.0003] for the relevant stimulus which, of course, includes repetition of the response. This is almost indistinguishable from the repetition effect that we observed with Type 1 tasks in Experiment 1 (100 ms), where we used very different stimuli and responses, and where the overall RT was also more than 120 ms longer than in this experiment. The repetition of the irrelevant stimulus had no signi1cant effect [F(1, 8) = 0.01, p < 0.9393], and there was no signi1cant interaction [F(1, 8) = 0.01, p < 0.9393]. The repetition of irrelevant stimuli in Type 1 neutral tasks, therefore, has no effect on performance. In the Type 3 task repeating the relevant stimulus had a signi1cant 116 ms effect [F(1, 8) = 69.34, p < 0.0001], whereas the repetition of the irrelevant stimulus had no signi1cant effect [F(1, 8) = 0.09, p < 0.7753], and there was no interaction. (Even though the repetition effect for the relevant stimulus is 22 ms greater when the irrelevant stimulus does not repeat than when it repeats, this interaction is not signi1cant [F(1, 8) = 0.89, p < 0.3740].) Thus, as was true for the Type 1, neutral, task, repeating or not repeating the irrelevant stimulus has no effect on performance in Type 3 tasks. In the Type 4 task, repeating the relevant stimulus had a signi1cant effect of 110 ms [F(1, 8) = 95.87, p < 0.0001]. And, unlike the results obtained with the Types 1 and 3 tasks, there is a 26 ms repetition effect for the irrelevant stimulus [F(1, 8) = 17.47, p < 0.0031]. There was no signi1cant interaction between the repetitions of the relevant and irrelevant stimulus.
2.9.3 Discussion These results are inconsistent with Hommel’s event 1le view (Hommel 1998; Hommel et al. in press) in at least three different ways: 1rst, according to that view, if the relevant stimulus repeats then repeating the irrelevant stimulus should show a bene1t compared to the non-repetition of that irrelevant stimulus. We fail to con1rm this in both Type 1 and Type 3 tasks. Second, the event 1le view predicts an interaction between the repetition of relevant and irrelevant stimuli. The results of our Type 4 task fail to con1rm this. Third, according to our reading, Hommel’s event 1le position would make identical predictions for Type 1, Type 3, and Type 4 tasks. Our results show that performance on these three tasks is quite different and appears to be based on the patterns of dimensional overlap.
41
aapc02.fm Page 42 Wednesday, December 5, 2001 9:18 AM
42
Common mechanisms in perception and action
Fig. 2.19 Results of Experiment 4 for task Types 1, 3, and 4. On the abscissa is indicated whether the relevant stimulus repeats or not. The parameter for the data line is whether the irrelevant stimulus repeats or not: the circle indicates a repetition, the square a non-repetition.
These results are consistent with one of the DO model’s principal assertions, namely: whether and how irrelevant stimuli affect performance, depends on what they overlap with. In this case we see that the repetition of irrelevant stimuli in Type 3 and Type 4 tasks clearly affects performance in very different ways—ways that, as we will now show, the model is able to account for.
2.9.3.1 The Information Reduction Hypothesis for irrelevant stimuli Let us 1rst consider the 1nding that the overall RT for Type 1 is faster than for Types 3 and 4. Recall that in this experiment, in order to avoid the confounding between factors, the Type 3 and Type 4 trials were all inconsistent; the Type 1 trials, in contrast, were all neutral . The RTs for S–R (Type 3) and S–S (Type 4) inconsistent trials are known to be slower than for neutral trials. The DO model’s account for these consistency effects was summarized in Sections 2.2.2.1 and 2.2.2.2 at the beginning of this chapter. Now consider the effects of repetitions for the relevant and irrelevant stimuli. Recall that the basic way in which the model accounts for the effects of relevant stimulus, and response repetitions is by lowering the threshold in the stimulus and response units when stimuli or responses are repeated (see Fig. 2.8). That is, less information is required to reach threshold after a repetition than after a non-repetition. We suggest that the Information Reduction Hypothesis is equally applicable to the effects of irrelevant stimulus repetitions: whatever the process that distinguishes between relevant and irrelevant stimuli (see Fig. 2.4), that process requires less information when irrelevant stimuli are repeated than when they are not repeated. This proposition is easily implemented in the model: if less information is required, it is reasonable to assume that less time would be required to process that reduced amount of information. Following a repetition, therefore, we shorten the time parameter (τ) (see Kornblum et al. 1999) which, in the model, determines how long it takes to distinguish between relevant and irrelevant inputs (see Fig. 2.4). The effect of this time reduction on the irrelevant stimulus activation curve is to reduce the
aapc02.fm Page 43 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Fig. 2.20 Illustration of the Information Reduction Hypothesis for irrelevant stimuli. According to the hypothesis, when an irrelevant stimulus is repeated less information is needed to distinguish between the inputs of the relevant and the irrelevant stimuli. This translates into less time (τ) being required to make that distinction, which means that the input for the irrelevant stimulus (see also Fig. 2.4) will start decreasing sooner after a repetition than after a non-repetition. This means that the peak of the irrelevant stimulus activation curve will be shallower (a) and occur earlier (b) for repetitions (right) than for non-repetitions (left).
Fig. 2.21 Illustration of how the magnitude of the repetition effect for irrelevant stimuli depends on what the irrelevant stimuli overlap with, and whether the bulk of the irrelevant stimulus activation curve is in the stimulus identi1cation or the response selection stage. As illustrated, the bulk of the irrelevant stimulus activation curve is in the stimulus identi1cation stage. Note that when the irrelevant stimulus repeats, that curve is shallower and peaks earlier than when it does not repeat, which reiterates what was shown on Fig. 2.20. The shaded and unshaded portions of the irrelevant activation curves show the amount of irrelevant activation in the stimulus and response stages, respectively. The fact that the difference between the two shaded portions of the curves (in the stimulus stage) is greater than between the two unshaded portions of the curves (in the response stage) generates the greater repetition effect of irrelevant stimuli for Types 4 (with S–S overlap) than for Types 3 (with S–R overlap), according to the Information Reduction Hypothesis.
43
aapc02.fm Page 44 Wednesday, December 5, 2001 9:18 AM
44
Common mechanisms in perception and action
Fig. 2.22 Simulation of the results for Experiment 4. See Fig. 2.19 for the empirical results.
level to which the irrelevant stimulus activation curve rises following a repetition, and to move its peak earlier in time (see Fig. 2.20). Because the overall irrelevant stimulus activation is now less for repetitions (see the bottom right panels in Fig. 2.20) than for non-repetitions (see the bottom left panel in Fig. 2.20), the in2uence of the irrelevant stimuli on performance will necessarily be less for repetitions than for non-repetitions. However, the magnitude of this effect depends on whether the irrelevant stimulus activation curve affects stimulus processing, as in the Type 4 tasks, or response processing, as in the Type 3 tasks. For example, suppose that most of the irrelevant stimulus activation curve lies in the stimulus identi1cation stage (see shaded areas of the curve in Fig. 2.21). This would produce a relatively large irrelevant stimulus repetition effect (e.g. Type 4) because the difference between the shaded areas for repetitions and non-repetitions is large. In the meantime, the amount of activation in the response stage (the unshaded areas of the curve in Fig. 2.21) is very small whether it is repetition or a non-repetition. As a result, there will be a very small, and perhaps undetectable, effect of irrelevant stimulus repetition in the Type 3 condition. Figure 2.22 shows the actual simulation of the data, illustrating numerically how these principles generate the reaction time for Experiment 4. The correspondence with the empirical data (Fig. 2.19) is quite good.
2.10 Summary and conclusions We began this chapter by outlining the representational and functional principles of the DO model, spelled out how these principles generated a taxonomy of tasks, selected four tasks from this taxonomy, and showed how, based on these principles, the structure of these tasks could be represented by a common processing architecture, and performance with them accounted for by the model. One of the effects of S–R compatibility that we had not considered in our model up to this point, however, was its interaction with the repetition effect (Bertelson 1963). These effects are ubiquitous and have a pervasive in2uence on RT. If the DO model is to be considered as having contributed to our understanding of performance in S–R compatibility tasks, then we needed to 1nd out whether, and, if so, how the model handled sequential effects. If it had turned out that these effects were beyond the model’s ability to deal with,
aapc02.fm Page 45 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
then, as Luce pointed out, the model would have been incomplete (at best) and probably wrong to boot (see Luce 1986). We reported the results of four experiments. In the 1rst two experiments we examined the sequential effects of relevant dimensions, congruent and incongruent mapping, and the repetition of physically identical, as well as conceptually similar, but physically different, stimuli in task Types 1 and 2. In the third experiment we looked at sequential effects of consistency (consistent, inconsistent, and neutral) in task Types 1, 3, and 4. In the fourth experiment we examined the sequential effects of relevant and irrelevant stimuli in task Types 1, 3, and 4. In the 1rst experiment we found a large repetition effect that interacted with congruent/incongruent mapping as well as with the repetition/non-repetition of conceptually similar stimuli. That is, the overall RT was longer and the repetition effect larger for incongruent than for congruent mapping. In addition, the mapping effect for repetitions was larger with conceptually similar (different carrier) than with physically identical (same carrier) stimuli In the second experiment, which was aimed at identifying the locus of the repetition effect, we again found an interaction between the repetition effect and congruent/incongruent mapping. This interaction was present whether the stimulus or the response was repeated, each in the absence of the other, which placed the locus of the repetition effect in both the stimulus and the response processing modules. This 1nding contrasts with earlier reports (e.g. Bertelson 1965; Pashler and Bayliss 1991) that attributed the bulk of the repetition effect to the repetition of the response. These results were accounted for by the DO model’s newly formulated Information Reduction Hypothesis, which states: information requirements on repeated trials are less than on non-repeated trials. According to this hypothesis, when a relevant stimulus or a response is repeated, the stimulus or response threshold on the repeated trial drops so that both the information and the time required to reach this lower threshold are reduced—hence the repetition effect. The results of Experiment 3 showed the expected differences in the effects of irrelevant stimuli for task Types 1, 3, and 4, none for Type 1, and robust consistency effects for Types 3 and 4. However, there was no signi1cant sequential effect of consistency for either Type 3 or Type 4 prime–probe pairs. The only sequential effect of consistency was the 1nding that the size of the S–R consistency effect in the Type 3 tasks was greater when the prime was neutral than when it was another Type 3 task. In Experiment 4, we obtained signi1cant repetition effects of the relevant stimulus, and response, in task Types 1, 3, and 4. Repetition of the irrelevant stimulus produced no signi1cant effects for task Types 1 and 3; however, that effect was signi1cant for task Type 4. These results were also accounted for by the Information Reduction Hypothesis. According to the hypothesis, when an irrelevant stimulus is repeated the information, and hence the time, required to distinguish between the relevant and irrelevant stimuli are both reduced. This was implemented in the model by reducing the value of the parameter (τ) following a repetition. Because a shorter value of (τ) causes the irrelevant stimulus input to start falling sooner than it otherwise would, the resulting irrelevant stimulus activation curve, following a repetition, has a shallower peak and is also shifted earlier in time so that a proportionately greater portion of the curve coincides in time with the stimulus module. The net result is for the repetition of an irrelevant stimulus that overlaps with the relevant stimulus (Type 4) to have a greater effect than the repetition of an irrelevant stimulus that overlaps with the response Type 3). Thus, the underlying reasoning for the repetition effects of relevant stimuli, irrelevant stimuli, and responses is the same: repetition leads to reduced information requirements which, in turn, leads to
45
aapc02.fm Page 46 Wednesday, December 5, 2001 9:18 AM
46
Common mechanisms in perception and action
faster processing. Depending on whether the relevant or irrelevant stimuli have DO, the repetition effect is accounted for by modifying one or the other of two parameters in the DO model, contingent on the occurrence of a repetition, thus leaving the basic mechanisms of the model intact.
Acknowledgements We are grateful for support from the Air Force Of1ce of Scienti1c Research Grant F496020-94-10020 and from The Horace H. Rackham School of Graduate Studies at the University of Michigan. We thank Anthony Whipple for technical support and discussions, and Greta Williams for assistance in carrying out these studies.
Notes 1. It is interesting to note that theories that deal with fundamental (i.e. irreducible) concepts (e.g. gravity) express the lawful relationships between the entities identi1ed (and de1ned) in the representational part of the theory. Such theories have no processing component because, in principle, these relationships are irreducible. Ecological theories, and so-called dynamic theories in psychology, often take this approach—prematurely and erroneously, in our opinion. Boyle’s law illustrates this point well. When it was 1rst formulated it expressed the systematic relationship between the pressure, volume, and temperature of an enclosed gas and was thought to be fundamental. It was not until Dalton’s atomic theory that a mechanism was discovered that could give rise to this relationship. This mechanism became the functional part of Boyle’s model. 2. We have included the Type 8, or Stroop, task in this table because of the broad interest that people have in it and also to show how, in accordance with DO principles, it could be parsed into separate components. In the rest of the article, however, we shall have nothing further to say about this task. 3. This time plays an important role later on in this paper in enabling the model to account for the sequential effects of irrelevant stimuli. 4. Thorough reviews of this literature exist that interested readers may wish to consult (Audley 1973; Kirby 1980; Kornblum 1973; Luce 1986). 5. Within the framework of the DO model there is no way to literally implement the version of the Residual Activation Hypothesis in which a process is bypassed without doing violence to the model itself and radically altering its structure. However, the duration of any process in the DO model could, in principle, be made arbitrarily small.
References Allport, A., Styles, E.A., and Hsieh, S. (1994). Shifting intentional set: Exploring the dynamic control of tasks. In C. Umiltà and M. Moscovitch (Eds.), Attention and performance XV, pp. 421–452. Cambridge, MA: MIT Press. Audley, R.J. (1973). Some observations on theories of choice reaction time: Tutorial review. In S. Kornblum (Ed.), Attention and performance IV, pp. 509–546. New York: Academic Press. Barber, P., and O’Leary, M. (1997). The relevance of salience: Towards an activational account of irrelevant stimulus–response compatibility effects. In B. Hommel and W. Prinz (Eds.), Theoretical Issues in Stimulus– Response Compatibility, pp. 135–172. Amsterdam: North Holland, Elsevier. Bertelson, P. (1961). Sequential redundancy and speed in a serial two-choice responding task. Quarterly Journal of Experimental Psychology, 12, 90–102.
aapc02.fm Page 47 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Bertelson, P. (1963). S–R relationships and reaction times to new versus repeated signals in a serial task. Journal of Experimental Psychology, 65, 478–484. Bertelson, P. (1965). Serial choice reaction time as a function of response versus signal-and-response repetition. Nature, 206, 217–218. Bertelson, P. and Renkin, E. (1966). Reaction times to new vs. repeated signals in a serial task as a function of response–signal time interval. Acta Psychologica, 25, 132–136. Bertelson, P. and Tysseyre, F. (1966). Choice reaction time as a function of stimulus vs. response relative frequency of occurrence. Nature, 212, 1069–1070. Falmagne, J.C. (1965). Stochastic models for choice-reaction time with application to experimental results. Journal of Mathematical Psychology, 2, 11–127. Falmagne, J.C. and Theios, J. (1969). On attention and memory in reaction time experiments. In W.G. Köster (Ed.), Attention and performance II. A special issue of Acta Psychologica, 30, 316–323. Falmagne, J.C., Cohen, S.P., and Dwivedi, A. (1975). Two-choice reactions as an ordered memory scanning process. In P.M.A. Rabbitt and S. Dornic (Eds.), Attention and performance V, pp. 296–344. New York: Academic Press. Fitts, P.M. and Seeger, C.M. (1953). S–R compatibility: Spatial characteristics of stimulus and response codes. Journal of Experimental Psychology, 46, 199–210. Hommel, B. (1998). Event 1les: Evidence for automatic integration of stimulus–response episodes. Visual Cognition, 5, 183–216. Hommel, B., Müsseler, J., Ascherslebem, G., and Prinz, W. (in press). The theory of event coding (TEC): a framework for perception and action planning. Behavioral and Brain Sciences. Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal of Experimental Psychology, 45, 188–196. Keele, S.W. (1967). Compatibility and time sharing in serial reaction time. Journal of Experimental Psychology, 75, 529–539. Kirby, N. (1980). Sequential effects in choice reaction time. In A. Welford (Ed.), Reaction times, pp. 129–172. London: Academic Press. Kornblum, S. (1969). Sequential determinants of information processing in serial and discrete choice reaction time. Psychological Review, 76, 113–131. Kornblum, S. (1973). Sequential effects in choice reaction time: A tutorial review. In S. Kornblum (Ed.), Attention and performance IV, pp. 259–288. New York: Academic Press. Kornblum, S. (1992). Dimensional overlap and dimensional relevance in stimulus–response and stimulus–stimulus compatibility. In G.E. Stelmach and J. Requin (Eds.), Tutorials in motor behavior, Vol. 2, pp. 743–777. Amsterdam: Elsevier. Kornblum, S. (1994). The way irrelevant dimensions are processed depends on what they overlap with: The case of Stroop- and Simon-like stimuli. Psychological Research/Psychologische Forschung, 56, 130–135. Kornblum, S. and Lee, J.W. (1995). Stimulus–response compatibility with relevant and irrelevant stimulus dimensions that do and do not overlap with the response. Journal of Experimental Psychology: Human Perception and Performance, 21, 855–875. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility—A Model and taxonomy. Psychological Review, 97, 253–270. Kornblum, S., Stevens, G.T., Whipple, A., and Requin, J. (1999). The effects of irrelevant stimuli: 1. The time course of stimulus–stimulus and stimulus–response consistency effects with Stroop-like stimuli and Simonlike tasks, and their factorial combinations. Journal of Experimental Psychology: Human Perception and Performance, 25, 688–714. Leuthold, H. and Sommer, W. (1993). Stimulus presentation rate dissociates sequential effects in event-related potentials and reaction times. Psychophysiology, 30, 510–517. Luce, R.D. (1986). Response times. New York: Oxford University Press. McLeod, C.M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163–203. Marcel, T. and Forrin, B. (1974). Naming latency and the repetition of stimulus categories. Journal of Experimental Psychology, 103, 450–460. Monsell, S. and Driver, J. (2000). Control of cognitive processes. In S. Monsell and J. Driver (Eds.) Attention and performance XVIII. Cambridge, MA: MIT Press. Mordkoff, J.T. (1999). The gating of irrelevant information in selective-attention tasks. Abstracts of the Psychonomic Society, 3, 21.
47
aapc02.fm Page 48 Wednesday, December 5, 2001 9:18 AM
48
Common mechanisms in perception and action
Pashler, H. and Bayliss, G. (1991). Procedural learning: 2. Intertrial repetition effects in speeded-choice tasks. Journal of Experimental Psychology: Learning, Memory and Cognition, 17, 33–48. Rabbitt, P.M.A. (1968). Repetition effects and signal classi1cation strategies in serial choice–response tasks. Quarterly Journal of Experimental Psychology, 20, 232–240. Remington, R.J. (1969). Analysis of sequential effects in choice reaction times. Journal of Experimental Psychology, 82, 250–257. Remington, R.J. (1971). Analysis of sequential effects for a four-choice reaction time experiment. Journal of Psychology, 77, 17–27. Rogers, R.D. and Monsell, S. (1995). Costs of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124, 207–231. Schvaneveldt, R.W., and Chase, W.S. (1969). Sequential effects in choice reaction time. Journal of Experimental Psychology, 80, 1–8. Shaffer, L.H. (1965). Choice reaction with variable S–R mapping. Journal of Experimental Psychology, 70, 284–288. Shiu, L.-P. and Kornblum, S. (1996). Negative priming and stimulus–response compatibility. Psychonomic Bulletin and Review, 3, 510–514. Simon, J.R. (1990). The effects of an irrelevant directional cue on human information processing. In R.W Proctor and T.G. Reeve (Eds.), Stimulus–response compatibility: An integrated perspective, pp. 31–86. Amsterdam: North-Holland. Smith, M.C. (1968). Repetition effect and short-term memory. Journal of Experimental Psychology, 77, 435–439. Soetens, E. (1998). Localizing sequential effects in serial choice reaction time with the information reduction procedure. Journal of Experimental Psychology: Human Perception and Performance, 24, 547–568. Soetens, E., Boer, L.C., and Hueting, J.E. (1985). Expectancy or automatic facilitation? Separating sequential effects in two-choice reaction time. Journal of Experimental Psychology: Human Perception and Performance, 11, 598–616. Squires, K.C., Wickens, C., Squires, N.K., and Donchin, E. (1976). The effect of stimulus sequence on the waveform of the cortical event-related potential. Science, 193, 1142–1146. Stevens, G.T. (2000). The locus of Eriksen, Simon and Stroop Effects: New data and a comparison of models. Ph.D. Dissertation, University of Michigan, Ann Arbor. Stevens, G.T. and Kornblum, S. (2000). Goals and dimensional overlap: The effects of irrelevant response dimensions. Poster presented at the XIXth International Symposium on Attention and Performance, Kloster Irsee, Germany, July 16–22, 2000. Stevens, G.T. and Kornblum, S. (2001). The locus of consistency effects in Eriksen and Stroop-like tasks. Manuscript in preparation. Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–661. Treisman, A. (1988). Features and objects: The fourteenth Bartlett Memorial Lecture. Quarterly Journal of Experimental Psychology, 47A, 201–237. Vervaeck, K.R. and Boer, L.C. (1980). Sequential effects in two-choice reaction time: Subjective expectancy and automatic after-effect at short response–stimulus intervals. Acta Psychologica, 44, 175–190. Williams, J. (1966). Sequential effects in disjunctive RT: Implications for decision models. Journal of Experimental Psychology, 71, 665–672. Zhang, H. and Kornblum, S. (1998). The effects of stimulus–response mapping and irrelevant stimulus– response and stimulus–stimulus overlap in four-choice Stroop tasks with single carrier stimuli. Journal of Experimental Psychology: Human Perception and Performance, 24, 3–19.
aapc02.fm Page 49 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
Appendix A Experiment 1 Same carrier
Congruent Neutral Incongruent
Different carrier
Rep
Non-Rep
Rep
Non-Rep
0.0 2.1 0.0
0.7 1.9 3.7
0.7 0.7 1.4
1.4 3.2 4.9
Experiment 2 Type 2 probe
Type 1 probe
Rep
Non-Rep
Rep
Non-Rep
1.0
4.2
2.1
Cong
Stim. Rep/Non-Rep
1.0
Resp. Rep/Non-Rep
0.0
1.0
0.0
3.1
Incong
Stim. Rep/Non-Rep Resp. Rep/Non-Rep
0.4 3.2
1.1 2.5
3.5 2.5
1.7 2.5
Experiment 3 Prime
Probe Type 3 Consist
Consistent Neutral Inconsistent
Neut
4.2 2.1 2.1
2.1 2.1 1.0
Type 4 Inconsist
Consist
0.0 0.0 0.0
4.2 0.0 3.1
Neut 1.0 2.1 0.0
Inconsist 2.1 3.1 4.2
Experiment 4 Irrel S
Rel S Type 1
Rep Non-Rep
Type 3
Type 4
Rep
Non-Rep
Rep
Non-Rep
Rep
Non-Rep
0.0 1.6
3.1 3.1
0.5 1.0
2.6 1.6
1.0 0.5
3.1 1.6
Error rates at the 700 ms RSI for the four experiments reported.
49
aapc02.fm Page 50 Wednesday, December 5, 2001 9:18 AM
50
Common mechanisms in perception and action
Appendix B Experiment 1 Type 2 → Type 2; Congruent RSI Prime
Probe
700
1500
3000
Same carrier
red → ‘red’ blue → ‘blue’
red → ‘red’ red → ‘red’
463 (27) 484 (28)
460 (24) 488 (30)
473 (17) 490 (25)
Rep Non-Rep
Different carrier
RED → ‘red’ BLUE → ‘blue’
red → ‘red’ red → ‘red’
489 (29) 490 (27)
494 (31) 495 (24)
494 (19) 492 (25)
Rep Non-Rep
Type 2→Type 2; Incongruent RSI Prime
Probe
700
1500
3000
Same carrier
red → ‘green’ blue → ‘yellow’ green → ‘blue’ blue → ‘yellow’
red → ‘green’ red → ‘green’ red → ‘green’ yellow → ‘red’
618 (42) 853 (125) 890 (145) 871 (81)
629 (86) 849 (157) 905 (199) 863 (168)
730 (68) 872 (112) 912 (123) 895 (122)
Rep Non-Rep NR (S→R) NR (R→S)
Different carrier
RED → ‘green’ BLUE → ‘yellow’ GREEN → ‘blue’ BLUE → ‘yellow’
red → ‘green’ red→‘green’ red → ‘green’ yellow → ‘red’
753 (45) 885 (98) 933 (154) 915 (122)
738 (61) 842 (157) 935 (236) 915 (193)
798 (85) 875 (120) 901 (112) 908 (122)
Rep Non-Rep NR (S→R) NR (R→S)
Type 1→Type 1 RSI Prime
Probe
700
1500
3000
Same carrier
red → ‘two’ blue → ‘four’
red → ‘two’ red → ‘two’
582 (37) 682 (38)
593 (54) 695 (62)
623 (50) 691 (43)
Rep Non-Rep
Different carrier
RED → ‘two’ BLUE → ‘four’
red → ‘two’ red → ‘two’
635 (62) 690 (44)
650 (76) 711 (65)
677 (52) 700 (42)
Rep Non-Rep
Mean RTs (and standard deviations) for Experiment 1 at three values of RSI. Because all possible prime–probe pairs were randomized and equiprobable, and the interval between prime–probe pairs was three seconds, RTs for RSI values of three seconds were obtained by considering the probes of regular prime–probe pairs as primes, and the primes of the next regular pair as probes of new pairs. Even though the RTs for the 700 ms RSI are discussed in detail in the text, we have included them in this table for ease of comparison. The stimuli and responses for the primes and probes are prototypical, generic descriptions. Thus, for example, the probe stimuli in this table are all shown as color
aapc02.fm Page 51 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
patches; however, we know (see text) that the probe stimuli were either color patches or color words. This generic description is intended to encompass both cases, and the data shown are averaged over both cases. Note also that we distinguish between three cases of non-repetitions. The 1rst is the case of pure, or total, non-repetitions in which no aspect of the prime is repeated in the probe. When we speak of ‘non-repetitions’ in the text, these are the trials to which we refer. The second is the case in which the label of, or the congruent responses to, the stimulus on the prime becomes the response on the probe (S → R). We view these as negative priming (NP) trials (see Shiu and Kornblum 1996). At RSI of 700 ms, the RT for this case is signi1cantly longer than for the total non-repetitions [F(1, 5) = 10.85, p < 0.0216], which con1rms Shiu and Kornblum’s (1996) earlier 1ndings. The third is the case in which the response on the prime trial becomes the label of the probe stimulus (R → S). Even though the RT for this case is also longer than for the total non-repetitions (but shorter than on NP trials) this difference is not statistically signi1cant [F(1, 5) = 2.12, p < 0.2051]. The statistical results of RSI for this experiment are summarized below:
Type 1 tasks
• Main effect: the longer the RSI, the slower the RT [F(2, 6) = 7.61, p < 0.0226)]; • Interesting to note that as RSI increases the repetition effect decreases. This effect is not statistically signi1cant. Nevertheless, the fact that this appears to be due principally to the fact that as RSI increases the RT for repetitions increases more than for non-repetitions is consistent with the Information Reduction Hypothesis: as RSI increases the threshold goes back to normal thus reducing the advantage of the reps.
Type 2 tasks
• RSI has no signi1cant main effect with Type 2 tasks, either congruent or incongruent; • RSI has an 8 ms interaction with carrier for congruent mapping, which is signi1cant [F(2, 10) = 8.01, p < 0.0084].
Appendix C Experiment 2 Type 1 → Type 2; Congruent Prime
Probe
RSI 700
1500
3000
red → ‘two’ red → ‘two’
2 → ‘two’ 4 → four’
533 (53) 565 (58)
555 (85) 566 (64)
547 (51) 562 (53)
Resp Rep Resp Non-Rep
red → ‘two’ red → ‘two’
red → ‘red’ blue → ‘blue’
565 (71) 583 (71)
568 (80) 580 (73)
550 (55) 564 (70)
Stim Rep Stim Non-Rep
51
aapc02.fm Page 52 Wednesday, December 5, 2001 9:18 AM
52
Common mechanisms in perception and action
Appendix C Continued Type 1→Type 2; Incongruent Prime
Probe
RSI 700
1500
3000
red → ‘two’ red → ‘two’ red → ‘two’
4 → ‘two’ 6 → ‘eight’ 2 → ‘six’
861 (132) 962 (173) 920 (169)
902 (162) 954 (178) 912 (182)
949 (190) 974 (187) 986 (209)
Resp Rep Resp Non-Rep Non-Rep (R→S)
red → ‘two’ red → ‘two’ red → ‘two’
red → ‘green’ blue → ‘yellow’ yellow → ‘red’
895 (148) 986 (186) 964 (183)
916 (179) 988 (212) 980 (216)
968 (194) 990 (205) 1049 (224)
Stim Rep Stim Non-Rep Non-Rep (S→R)
Type 2→Type 1; Congruent Prime
Probe
RSI 700
1500
3000
red → ‘red’ red →‘red’
2 →‘red’ 4 →‘blue’
793 (110) 860 (114)
835 (156) 907 (187)
830 (132) 862 (150)
Resp Rep Resp Non-Rep
red → ‘red’ red → ‘red’
red →‘two’ blue →‘four’
808 (126) 847 (96)
823 (104) 862 (133)
853 (168) 864 (149)
Stim Rep Stim Non-Rep
Type 2→Type 1; Incongruent Prime
Probe
RSI 700
1500
3000
red → ‘green’ red → ‘green’ red → ‘green’’
8 → ‘green’ 4 → ‘blue’ 2 → ‘red’
844(112) 894 (121) 872 (119)
845 (115) 882 (111) 894 (139)
892 (110) 897 (110) 900 (127)
Resp Rep Resp Non-Rep Non-Rep (S→R)
red → ‘green’ red → ‘green’ red → ‘green’
red →‘two’ blue → ‘four’’ green → ‘six’
963 (128) 974 (126) 931 (133)
971 (153) 966 (120) 951 (163)
1039 (144) 1036 (149) 1018 (147)
Stim Rep Stim Non-Rep Non-Rep (R→S)
Mean RTs (and standard deviations) for Experiment 2 at three values of RSI. As was true of Appendix B, the stimuli and responses for the primes and probes are prototypical cases, or generic descriptions. Thus, for example, the prime stimuli in this table are all shown as red color patches; however, as is made clear in the text, not only were there four different color patches, but the stimuli on prime trials could also be digits. The generic descriptions in this table are, therefore, intended to encompass both cases, and the data shown are averaged over both cases. Again, similarly to what we did in Appendix B, when the mapping is incongruent we distinguish between three different cases of non-repetitions: total non-repetitions; non-repetitions in which the label of, or the congruent response to, the prime stimulus becomes the response on the probe; and non-repetitions in which the response on the prime becomes the label on the probe. In contrast to the
aapc02.fm Page 53 Wednesday, December 5, 2001 9:18 AM
Sequential effects of dimensional overlap: findings and issues
results of Experiment 1, none of the differences between the total non-repetitions and the other nonrepetition case within a particular incongruent condition are signi1cant. The statistical results of RSI for this experiment are summarized below:
• Main effect: The longer the RSI the slower the RT [F(2, 62) = 6.24, p < 0.0034]. Type 1 → Type 2
• The increase of RT with RSI was greater for repetitions than for non-repetitions: this interaction is signi1cant for stimulus rep/non-rep [F(2, 46) = 4.24, p < 0.0204], and response rep/non-rep [F(2, 46) = 6.8, p < 0.0026].
Type 2 → Type 1
• RSI interacted with response repetition whether the mapping of the prime was congruent [F(2, 14) = 3.95, p < 0.0436], or incongruent [F(2, 46) = 4.37, p < 0.0183].
Appendix D Experiment 3 RSI 700
1500
Con.
Incon.
Neut.
Con.
Incon.
Neut.
Type 3 ( +1)
Con. Incon. Neut.
572 (80) 586 (75) 557 (71)
659 (63) 659 (81) 697 (108)
616 (86) 614 (63) 614 (68)
606 (118) 627 (105) 637 (103)
681 (105) 687 (107) 710 (145)
634 (98) 709 (134) 633 (90)
Type 4 ( +1)
Con. Incon. Neut.
594 (77) 609 (52) 598 (92)
640 (78) 687 (128) 657 (91)
605 (77) 619 (115) 589 (78)
627 (96) 627 (87) 627 (99)
682 (101) 695 (115) 710 (120)
634 (99) 655 (112) 604 (100)
Mean RT (and standard deviations) for Experiment 3 at the two values of RSI that were used. RSI did not have any statistically signi1cant effects in this experiment, although the same trend is observable here as in the other experiments: the longer the RSI, the slower the RT. Experiment 4 RSI 700
1500
Rel. Rep
Rel. Non-Rep
Rel. Rep
Rel. Non-Rep
Type 1 nn
Irr. Rep Irr. Non-Rep
470 (43) 469 (30)
559 (77) 561 (45)
506 (60) 497 (63)
585 (60) 593 (65)
Type 3 ii
Irr. Rep Irr. Non-Rep
498 (75) 491 (58)
606 (105) 616 (114)
520 (94) 521 (82)
619 (108) 612 (121)
53
aapc02.fm Page 54 Wednesday, December 5, 2001 9:18 AM
54
Common mechanisms in perception and action
Experiment (Continued) RSI 700
Type 4 ii
Irr. Rep Irr. Non-Rep
1500
Rel. Rep
Rel. Non-Rep
Rel. Rep
Rel. Non-Rep
484 (63) 516 (78)
599 (78) 620 (71)
538 (67) 543 (73)
616 (90) 631 (88)
Mean RT (and standard deviations) for Experiment 4 at the two values of RSI that were used. The row labels are for the irrelevant (Irr.) transitions; the column labels are for the relevant (Rel.) transitions. Included in the Task type identi1cation for the rows are reminders of consistency status of the prime–probe pairs. Thus, nn indicates that both the prime and probe were neutral, ii indicates that both the prime and the probe were inconsistent. The statistical results of RSI for this experiment are easily summarized: the longer the RSI the slower the RT. This is signi1cant for Type 1 [F(1, 8) = 6.49, p < 00343], Type 3, and Type 4 [F(1, 8) = 10.05, p < 0.0132]. There are no signi1cant interactions. However, In Type 3 the effect of relevant stimulus repetitions appears to decrease as RSI increases, and in Type 4 there appears to be that same trend except that now it is for the irrelevant stimulus whose effect of repetition decreases as RSI increases.
aapc03.fm Page 55 Wednesday, December 5, 2001 9:19 AM
I Space perception and spatially oriented action
aapc03.fm Page 56 Wednesday, December 5, 2001 9:19 AM
This page intentionally left blank
aapc03.fm Page 57 Wednesday, December 5, 2001 9:19 AM
3 Perception and action: what, how, when, and why Introduction to Section I Glyn W. Humphreys
3.1 Perception and action can dissociate Over the past decade, a good deal of evidence has accumulated that perception and action can dissociate (see Milner and Goodale 1995, for one summary). Dissociations can occur even when perception and action are tested using the same object, and even when input from the same modality is used (e.g. vision). This leads to a counter-intuitive conclusion: that the information, and the underlying brain systems, may differ when we perceive and recognize a cup (on the one hand) and when we make a grasp action to pick it up (on the other). The papers reported in this section of the book are concerned with the relations between perception and action, and they provide state-of-the-art summaries of work on this important topic. The papers detail not only how perception and action can dissociate but also how they interact. This is clearly important if we are to understand how coherent behaviour emerges, and the papers presented here provide interesting suggestions as to how forms of integration can take place. In the chapters reported by Rossetti and Pisella, as well as Bridgeman, the evidence on dissociations between perception and action, even with single objects, is discussed. As with all of the chapters in this section, the use of visual information for perception and action is emphasized. One source of evidence here comes from studies of perceptual illusions. For example, perceptual judgements to a part of a display can be strongly in2uenced by surrounding parts (e.g. as in the Titchener–Ebbinghaus size illusion). In contrast, actions to pick up a local stimulus can be much less affected by the surrounding context (e.g. so that grasp apertures do not show the size illusion; see Aglioti, DeSouza, and Goodale 1995; Haffenden and Goodale 1998). The paper by Bridgeman here illustrates this. Bridgeman reports work on an adapted form of the Roelofs effect (Roelofs 1935), in which perceptual judgements about the location of a target are affected by its position relative to a surrounding frame, instead of being based on the target’s absolute position with respect to the viewer. Bridgeman shows that this Roelofs effect disappears if observers make a rapid jab to the target’s location. The effect of the rectangle context on action is less than on perception. This occurs even when observers are prevented from 1xating on the target, suggesting that actions are being directed by a relatively rich sensorimotor representation rather than simple information about where the eyes are currently looking (see Bridgeman, this volume, Chapter 5). Rossetti and Pisella also review neuropsychological evidence for the perception–action dissociation. This includes optic ataxic patients with parietal damage who show intact perception but impaired
aapc03.fm Page 58 Wednesday, December 5, 2001 9:19 AM
58
Common mechanisms in perception and action
action to objects (e.g. misreaching under visual guidance), and agnosic patients (often with occipitotemporal damage) who have impaired perception along with intact action (e.g. orienting the hand appropriately when reaching even when unable to make simple orientation discriminations in perceptual judgement tasks). The distinction between these two classes of patient supports the separation between an occipito-parietal (dorsal) route in vision which supports visually-guided action, and an occipito-temporal (ventral) route supporting perceptual judgements and object recognition. The dorsal and ventral visual pathways in the cortex were originally linked to the processes of computing ‘where’ (spatial location) and ‘what’ objects were (Ungerleider and Mishkin 1982). However, the evidence on intact action in agnosia has been used to motivate the further argument that the dorsal route is involved not just in computing where an object is but also ‘how’ an action should be made (e.g. how an action should be shaped to pick up an object; see Milner and Goodale 1995). In these cases patients appear to have intact access to forms of representation supporting one but not the other task. To this can be added other evidence. For example, work on patients with visual neglect after parietal damage has indicated that more neglect may be expressed when patients have to point to the centre of a rod than when they have to pick it up (Robertson, Nico, and Hood 1995). Bridgeman (this volume, Chapter 5) suggests that, even though pointing is a motor action, it often plays a role in communication and so may be controlled through a recognition rather than an action pathway. If so, then the point–grasp dissociation in neglect would again 1t with the idea of one pathway being disturbed (the perceptual recognition route) and the other intact (the action route). I return to discuss an alternative view of this below.
3.2 When perception influences action Although there is considerable evidence that ‘pulls apart’ perception and action, other work highlights occasions on which the processes involved in object recognition in2uence action. Several findings are discussed by Rossetti and Pisella (this volume, Chapter 4), and I mention only two that pick up on the dissociation between optic ataxia and neglect noted above. One is that, in studies of agnosia, patients can be impaired when required to use a representation of the relative locations of stimuli to direct action (e.g. placing the 1ngers in the holes of a ten-pin bowling ball! See Dijkerman, Milner, and Carey 1998). In this case, the perceptual impairment in the patient seems to carry-over into their actions. A second is that the poor reaching behaviour in optic ataxic patients can be improved when they grasp a known relative to an unknown object (Jeannerod, Decety, and Michel 1994). Here intact perceptual processes in the patient seem to improve action. Other examples come from the chapters by Rieser and Pick, and by Graziano and Botvinick (this volume, Chapters 8 and 6, respectively). Rieser and Pick discuss effects of perceptual representations on the reconstruction of action, when participants have to negotiate large environments. In a series of ingenious experiments, they demonstrate that vision helps recalibrate locomotive activity. They propose that ‘when walking with vision or without it, people tend to perceive their locomotion relative to the surrounding environment as a frame of reference, and their perception serves to update their representation of their spatial orientation. The resulting representation of spatial orientation is unitary—it re2ects the perception of locomotion, it serves as the basis for control of continuing locomotion, and it serves as the basis for the control of all other environmentally-directed actions.’ Graziano and Botvinick report on physiological studies in monkeys that demonstrate interactions between visual and proprioceptive input—as when cells have visual receptive 1elds that are tuned to the position of a hand in space (and not on the retina). They argue that ‘the body schema is used to
aapc03.fm Page 59 Wednesday, December 5, 2001 9:19 AM
Perception and action: what, how, when, and why
cross-reference between different senses, as a basis for spatial cognition and for movement planning’. Here they also maintain a common representation underlying visual perception and motor actions.
3.3 When action affects visual perception As well as visual perception and object recognition sometimes in2uencing action, effects can go in the reverse direction. The chapters by Graziano and Botvinick and by Jordan et al. (this volume, Chapter 7) illustrate this. Jordan et al. discuss experiments showing that perceived displacements in the location of a visual target are biased in the direction of a planned action. This effect is not due to the action being made per se, and it extends in time even beyond the duration required to perform the action. They suggest that ‘action plans contribute to, and shape, perceptual space’. Graziano and Botvinick report cells that only respond to the visual presence of stimuli if they are close to a hand being used for action. Analogies can also be found in experimental studies with humans. Tipper and colleagues (Tipper, Lortie, and Baylis 1992), for instance, have provided evidence that visual attention can be locked to the position of the hand with respect to a target for action. If you reach with your hand directly out to an object then irrelevant stimuli lying in front of the object may be suppressed; however if you move your hand back to the object then irrelevant stimuli before the target are not suppressed but those behind are. In these instances, a planned action in2uences perception. There is also neuropsychological evidence along the same lines. For example, Berti and Frassinetti (2000) report a patient who showed neglect of ‘near’ but not ‘far’ space (e.g. poor bisection of a line close to the body but not when it was shown far away, even when matched for visual angle). Interestingly, neglect could be induced even for distant stimuli if the patient responded using a long pointer. Berti and Frassinetti argue that the neglected representation of ‘near space’ was reconstituted by using a pointer for action. The opposite pattern has been found by Ackroyd and colleagues (Ackroyd, Riddoch, Humphreys, and Townsend, in press). Here the patient had neglect of far and left space in visual detection tasks. However, when given a pointer to hold, the neglected areas decreased. Importantly, this patient demonstrated aspects of visual neglect, rather than his neglect being only ‘motoric’ in nature (e.g. when required to use mirrored feedback he responded to stimuli seen on his right, though this involved moving towards his impaired, left side). The extension of the patient’s body space, by using a pointer, ameliorated the degree of visual neglect. The physiological basis of such effects may be the bimodal cells discussed by Graziano and Botvinick (this volume, Chapter 6), who also report work showing that the receptive 1elds of such cells extend to an implement held in the hand used for an action.
3.4 When do perception and action interact? The chapters by Rieser and Pick, Graziano and Botvinick, and Jordan et al. here all provide examples of interactions between perception and action, in one direction or the other. Rossetti and Pisella, in their very thorough overview chapter, consider the implications. One proposal they consider is that perception–action interactions are contingent on the time parameters governing behaviour. We can understand perception–action relations by understanding when behaviour is constrained by each factor. They suggest that there is a fast action route (through dorsal cortex) and a slow action route (through ventral cortex). The dorsal route may be used for actions that operate on-line and perhaps
59
aapc03.fm Page 60 Wednesday, December 5, 2001 9:19 AM
60
Common mechanisms in perception and action
also in an open-loop fashion, using visual feedback. The ventral route may be used for action, but only when actions are delayed or not contingent on on-line feedback. They present evidence from their laboratory on this. For example, normal participants show rapid use of perturbations in the locations of targets when guiding their on-line reaching, but fail to use changes in target colour in the same way. Location information may be computed via the dorsal visual stream, but colour only through the ventral stream. The evidence on the contrast and between pointing and grasping in neglect may be understood in the same way. Thus Martin Edwards and I (Edwards and Humphreys 1999) found that the improved behaviour when a neglect patient grasped rods was due to corrections taking place at the end of the reach trajectory, due to on-line feedback. It is possible to argue that, in such cases, neglect is due to an impaired representation used in perceptual judgements. This can be overcome by on-line feedback via the action route. Pointing may be worse than grasping either because pointing is always dependent on the perceptual-recognition system (e.g. being used in communicative acts) or because pointing is less dependent on feedback in any case. The chapter by Rieser and Pick takes a different view, suggesting instead that there is always an integrated representation of perception and action. They present evidence for such integrated representations. However, it is not clear whether this evidence emerges because of the time parameters in the studies. These time parameters are typically long—as participants carry out tasks walking around their environment, and this may encourage involvement of the slow ventral route (Rossetti and Pisella, this volume, Chapter 4). Also action recall rather than on-line control is sometimes measured, and this may be crucial. A further possibility is that there may be effects of spatial distance and/or scale. The studies of Rieser and Pick typically take place over large-scale areas, and actions can be dictated by stimuli presented far from the body—unlike the hand actions made close to the body used in many studies demonstrating dissociations between perception and action (e.g. Bridgeman, this volume, Chapter 5). Such dissociations may occur only under conditions in which immediate actions are made in which a participant directly interacts with their environment. Other ways to consider when perception and action interact are proposed by Bridgeman and Jordan et al. (both this volume, Chapters 5 and 7). Bridgeman suggests that differences may emerge due to the kinds of spatial representation being computed. He suggests that the dorsal route operates in absolute, egocentric coordinates, whilst the ventral route codes relative location information between stimuli (an allocentric coding scheme). Some of the work on impaired action in agnosia is consistent with this (Dijkerman et al. 1998). On this view, perception and action dissociate when responses are based on an egocentric reference frame but they interact when actions use an allocentric frame. We may think of Rieser and Pick’s studies in this light too, since their work on locomotion typically examines the use of allocentric coding of the environment. Jordan et al. (this volume, Chapter 7) consider a third alternative, which is that what counts is whether behaviour is controlled by temporally proximal or temporally distal events. Perception and action can dissociate when actions are made to temporally proximal events, but they are integrated when actions are to be made to temporally distal events—and they remain integrated even when, later in time, actions are made to the stimulus then present. On this view, perception and action are brought together in anticipatory planning.
3.5 How do perception and action interact? As well as considering the issue of when perception and action interact, we may also consider how such an interaction takes place. Graziano and Botvinick (this volume, Chapter 6) suggest that such
aapc03.fm Page 61 Wednesday, December 5, 2001 9:19 AM
Perception and action: what, how, when, and why
interactions may be mediated through the body schema, which is constantly re-calibrated by on-line perceptual information. Jordan et al. (this volume, Chapter 7) propose the ‘common coding’ scheme suggested by Prinz (1997). They argue that the planning of actions to distal events recruits processes also involved in the perception of the consequences of the action. It may be that common coding is effected through the kind of body scheme representation outlined by Graziano and Botvinick. Though how this form of representation, thought to underlie perception-action interactions, links to the idea of allocentric coding for such interactions (Bridgeman, this volume, Chapter 5; Rieser and Pick, this volume, Chapter 8), is somewhat unclear and remains for future work to specify. Whatever account is 1nally formulated, the evidence reviewed in this part of the book indicates that there are both dissociations and interactions between perception and action. Rossetti and Pisella’s overview chapter provides a clear account of many of the relevant 1ndings along with a well-articulated view of how perception and action can be interrelated. The chapter by Bridgeman emphasizes the dissociation between perception and action, whilst the chapters by Rieser and Pick, Graziano and Botvinick, and Jordan et al. emphasize perception–action interactions. The work reported here stimulates questions about the best way to conceptualize both the dissociations and interactions, as well as highlighting the need for us to develop more detailed processing models of perceptuo-motor integration.
Acknowledgement This work was supported by grants from the Medical Research Council and the Wellcome Trust.
References Ackroyd, K., Riddoch, M.J., Humphreys, G.W., and Townsend, S. (in press). When near becomes far and left becomes right: Using a tool to extend extrapersonal visual space in a patient with severe neglect. Neuropsychologia. Aglioti, S., DeSouza, J.F., and Goodale, M.A. (1995). Size-contrast illusions deceive the eye but not the hand. Current Biology, 5, 679–685. Berti, A. and Frassinetti, F. (2000). When far becomes near: Remapping of space by tool use. Journal of Cognitive Neuroscience, 12, 415–420. Dijkerman, H.C., Milner, A.D., and Carey, D.P. (1998). Grasping spatial relationships: Failure to demonstrate allocentric visual coding in a patient with visual form agnosia. Consciousness and Cognition, 7, 424–437. Edwards, M.G. and Humphreys, G.W. (1999). Pointing and grasping in unilateral visual neglect: Effect of on-line visual feedback in grasping. Neuropsychologia, 37, 959–973. Haffenden, A. and Goodale, M.A. (1998). The effect of pictorial illusion on prehension and perception. Journal of Cognitive Neuroscience, 10, 122–136. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action. London: Academic Press. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Robertson, I., Nico, D., and Hood, B.M. (1995). The intention to act improves unilateral neglect: two demonstrations. NeuroReport, 7, 246–248. Roelofs, C. (1935). Optische Localisation. Archiv für Augenheilkunde, 109, 395–415. Tipper, S.P., Lortie, C., and Baylis, G.C. (1992). Selective reaching: Evidence for action-centred attention. Journal of Experimental Psychology: Human Perception and Performance, 18, 891–905. Ungerleider, L.G. and Mishkin, M. (1982). Two cortical visual systems. In D. Ingle, M.A. Goodale, and R.J.W. Mans1eld (Eds.), Analysis of visual behavior, pp. 49–86. Cambridge, MA: MIT Press.
61
aapc04.fm Page 62 Wednesday, December 5, 2001 9:21 AM
4 Several ‘vision for action’ systems: a guide to dissociating and integrating dorsal and ventral functions (Tutorial) Yves Rossetti and Laure Pisella
Abstract. There is a well-established argument for a double-dissociation between vision for action and vision for conscious identi1cation. The distinction between these two visual systems applies to both the attributes being processed and the outputs of the processing. However, numerous direct and indirect interconnections and convergence have been described between cortical and subcortical visual pathways as well as between the dorsal and the ventral streams. This chapter presents an attempt to solve this apparent contradiction between neuroanatomy and behaviour, by organizing our knowledge about several aspects of how vision can be involved in the various aspects of action. First, several cognitive → motor interactions can be observed, which suggest that action can rely on cognitive representations. Reciprocally, examples of sensorimotor → cognitive interactions have been provided by the reorganization of cognitive representation of space induced by sensorimotor plasticity. Second, it is shown that introducing a memory delay between the target presentation and the action emulates a cognitive representation of the target that is used for action. Conversely, adding a speed constraint to a simple pointing task seems to allow a speci1c activation of the sensorimotor system: for fast movements no in2uence of cognitive representations or of intention is observed. Neuropsychological data suggest that the most typical function of the dorsal stream is the on-line control of an ongoing goal-directed action. It is concluded that, depending on the time-scale considered, no interaction or two-way interactions between the dissociated vision for action and vision for identi1cation can be observed. These functional data are fully compatible with the temporal constraints of the complex anatomical network involved in the processing of visual information, in relation to fast (magnocellular) and slow (parvocellular) streams. Recipes are proposed to isolate or integrate a sensorimotor or a cognitive sensory system, according to the type of stimulus, the type of response, and the temporal link between the stimulus and the response.
4.1 Perception and action: a perfect couple? The question of the relationship between action and perception is central to many areas of philosophy, psychology, and neuroscience. The perception–action couple has been said by ecological psychologists to be inalterably inseparable, whereas speci1c experimental conditions and neurological patients have suggested that a divorce can be achieved. One of the two partners, Action, is the more forthright and does not trouble itself with non-concrete aspects of life. Therefore everybody agrees about what it is up to. But so many people have 2irted with Perception that everyone has their own view of it. Some think that it is only able to deal with proper mental objects which can be manipulated by the mind, while others believe it is a simple-minded character who is under the direct in2uence of the senses. Still others have rather argued that it is a thoughtful creature that is very aware of objects or events, and tends to interpret them in the most appropriate fashion. It has
aapc04.fm Page 63 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
also been claimed that perception has an impressionable character, being easily in2uenced by its environment. With such a many-faceted personality, it is no wonder that Perception and Action have a rather chaotic relationship. In order to go beyond these vaudevillesque considerations and try to consider their relationship in a less con2ictual way, let us specify which view of perception will be adopted in the present chapter. Following the usage of Milner and Goodale (1995), we will consider perception as an integrated process that gives rise to identi1cation. The usual way to investigate the content of perception implies that this content is mediated by conscious awareness before an output (e.g. a verbalization) can be produced. In more general terms, this way of dealing with sensory inputs has been termed ‘cognitive’ processing by Bridgeman (1991) and Paillard (1987, 1991), as opposed to ‘sensorimotor’ processing. Most of the current debates on sensorimotor versus cognitive processing refer respectively to dorsal and ventral streams of the visual brain. As will be detailed in a further section (4.3.2.), the dorsal stream is de1ned here as the projections from the primary visual area to the posterior parietal cortex, in particular to the superior parietal lobule. The ventral stream can be de1ned as the projections from the occipital visual areas to the inferior temporal cortex. Based on neuropsychological observations reviewed below, these two streams have sometimes been associated with implicit (non-conscious) versus explicit (conscious) processing. This issue will be discussed at the end of the present chapter (Section 4.8.4.) Following this introductory section (4.1), we will 1rst summarize the evidence for the dissociability between perception and action in both normals and brain-damaged patients (4.2). Then we will demonstrate that the complex neuroanatomical networks involved in vision, perception, and action do not show a strict segregation between two cortical visual pathways (4.3). Our main aim will be to review the different ways by which this apparent gap between anatomy and behaviour can be 1lled (4.4). On the one hand, several lines of evidence for double interactions between the two visual systems will be presented. It will turn out that both visual systems can contribute to action, suggesting that neuroanatomical data are right (4.5). On the other hand, we review several aspects of the effect of time variables on space processing by the visual system, which suggest that pure anatomical data are not suf1cient to account for behavioural observations. We will argue that dissociations between cognitive and sensorimotor processing can result from temporal limits of visual processing (4.6). Then some simple and more complex experimental recipes will be proposed for either isolating or integrating dorsal and ventral types of function (4.7). Among others, the method of choice for isolating the sensorimotor mode of vision in normals appears to be to apply time constraints to the task. The purest expression of the dorsal processes may consist of an ‘automatic pilot’ able to online drive the hand to a selected visual target, irrespective of the subject’s own intention. To end with, a few concluding remarks will be made on neglected aspects of visuomotor processing (4.8). Instead of seeing dichotomies between sensorimotor and cognitive, dorsal and ventral, implicit and explicit processes, we propose that transitions between these aspects can be viewed as continuous gradients.
4.2 Dissociations The discovery of re2ex reactions at the end of the nineteenth century has given rise to an enormous amount of experimental and theoretical work. The discovery of unconscious nervous processes, as already postulated by von Helmholtz in the case of visual perception, opened new areas of investigation of the mind (e.g. psychoanalysis) and of perception and behaviour. It is interesting to
63
aapc04.fm Page 64 Wednesday, December 5, 2001 9:21 AM
64
Common mechanisms in perception and action
note that the pioneering work of Helmoltz or Freud not only emphasized the distinction between the conscious and the unconscious but also clearly addressed the issue of the interaction between these two aspects of mental life. Unfortunately, for about one century there has been more and more attraction towards the power of unconscious processes as opposed to the conscious mental life, and the report of dissociation between the conscious and the unconscious has become more fashionable than it really deserves to be. This bias has applied to the study of sensory and motor processes separately, and especially to the distinction between implicit processing for action and explicit processing for perception (review: Milner and Goodale 1995; Place 2000; Rossetti and Revonsuo 2000b). As a consequence, more is known about the dissociation than about the interaction between sensorimotor and cognitive processes.
4.2.1 The double-step paradigm The double-step paradigm refers to experimental conditions where a visual target is 1rst presented to the subject (step one: between 1xation point and the target), and then displaced during the action (step two: between the initial target position and the secondary target position). Psychophysical studies have revealed that human subjects are unaware of displacements occurring in the visual world if these displacements are synchronized with the saccade (see e.g. Bridgeman, Hendry, and Stark 1975). Several experiments have explored the consequence of this saccadic suppression phenomenon, which refers to the apparent loss of perception occurring during saccades (Campbell and Wurtz 1978), on arm movement production. In one early experiment, subjects were asked to point at a target that had been displaced during the saccade (by a stroboscopic induced motion) and then extinguished (Bridgeman et al. 1979). These authors observed that the saccadic suppression effect was not followed by related visuomotor errors. Moreover, it was found that a pointing movement following a target jump remained accurate, irrespective of whether this displacement could be verbally reported or not. These experiments therefore suggested that two psychophysically separable visual systems can be distinguished—one system for a ‘cognitive’ response, and a second one for sensorimotor behaviour. This distinction has been more recently referred to as ‘hand sight’ (Rossetti, Pisella, and Pélisson 2000). Following this work, a long series of experiments was initiated by Prablanc and colleagues to explore on-line arm movement control. In a 1rst experiment, they required normal subjects to orient their gaze and point to visual targets presented in full darkness (at 1xation point offset). These targets could be unexpectedly displaced forward or backward during the saccade, so that a shorter or a longer hand movement had to be performed for the 1nger to land on the target. Since eye movements are usually initiated before arm movements, these target jumps occurred well before the hand had reached the target. The use of virtual images of the targets (seen in a mirror) allowed a continuous presentation of the target, without the reaching hand hiding it to the eyes (and thus this differed from the Bridgeman et al. 1979 study). The interesting question raised here was whether the motor system would be able to update the hand movement in conditions where the target jump had been performed unbeknown to the subject (Fig. 4.1). Their results were straightforward: (1) as in the Bridgeman et al. study (1979), subjects altered the amplitude of their movements such as to compensate for most of the target displacement (‘hand-sight’); (2) this hand path correction did not imply a signi1cant increase in movement time; (3) not only did subjects not detect the target jump, but they also remained unable to detect their own movement corrections; (4) forced-choice guesses about the direction of the jump could not discriminate between forward and backward target perturb-
aapc04.fm Page 65 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
ations (Goodale, Pélisson, and Prablanc 1986; Pélisson Prablanc, Goodale, and Jeannerod 1986). This seminal work has been followed by many studies on motor control versus conscious perception. First, Prablanc and Martin (1992) replicated the same experiment for perturbations in direction and performed a detailed kinematic analysis of perturbed as well as unperturbed reaches. Using an analysis of the direction of the tangential velocity vector, they showed that the two types of trials could be discriminated as early as about 110 ms following movement onset. Knowing that this duration includes both sensory processing time and motor pattern activation time, this result suggested that the speci1c access of the motor system to visual information was operating at an extremely fast rate. In addition, they observed that the transition from the unperturbed pattern of trajectory to the updated one was produced very smoothly, suggesting that movement correction was integrated in the initial motor programme. Recent work using transcranial magnetic stimulation applied to the posterior parietal cortex has shown that inhibition of this structure was disrupting the on-line correction system (Desmurget et al. 1999). A further experiment triggered the target jump at different times with respect to the saccade peak velocity (Komilis, Pélisson, and Prablanc 1993). It revealed that identical corrections were observed in conditions where the subjects could or could not detect the target jump, provided it was applied early enough in the movement (i.e. no later than the hand peak velocity). These 1ndings were then extended to more complex actions. For example, smooth corrections were also observed for grasping movements perturbed either in target orientation (Desmurget et al. 1995; Desmurget and Prablanc 1997) or in target location (Gréa, Desmurget, and Prablanc 2000) at the onset of the movement. We shall mention other experiments performed with simulated perturbations of target objects and on the respective timing of the processes involved in following sections (4.6.2 and 4.7).
4.2.2 Illusions When a large structured background is displaced during visual 1xation of a small target, the latter appears to move in the opposite direction. This phenomenon can be observed for both smooth (induced motion) and step (induced displacement) background shifts. Bridgeman, Kirch, and Sperling (1981) extended a 1nding made on eye movements (Wong and Mack 1981) and compared the amount of the perceptual illusory effect with the pointing response to the extinguished target. They showed that the motor system was much less affected by the apparent motion than the cognitive system. It was concluded that apparent target displacement affected only perception whereas real target displacement affected only motor behaviour, which provides a case for a double dissociation between cognitive and motor function (see Bridgeman 2000; this volume, Chapter 5). In about the last 1ve years, a substantial number of experiments have been performed to explore the effect of visual illusion on a grasping action. We shall only review a few of them here and come back in a further section (4.7.6) to this issue, which is becoming controversial. Aglioti, DeSouza, and Goodale (1995) made use of size-contrast illusions (or Titchener’s circle illusion). In this illusion, two circles in the centre of two circular arrays, composed of circles of either smaller or larger size, appear to be different in size even though they are physically identical. The circle surrounded by larger circles appears smaller than the one surrounded by smaller circles. Using this principle, one can build con1gurations with central circles of physically different sizes that will appear perceptually equivalent in size. Using this version of the illusion adapted in pseudo-3-D, Aglioti et al. required subjects to grasp the central circle between thumb and index 1nger, and measured their
65
aapc04.fm Page 66 Wednesday, December 5, 2001 9:21 AM
Common mechanisms in perception and action
Single step
(a)
Eye and target position
Target
Eye
Target
Eye
Double step
Eye velocity
Hand and target position Hand velocity Vision of the hand
On
On
Off
Off
d (b)
d
Frequency
15 10
5 0 25
(c)
Duration (ms)
66
30
35
40
45
500
400 0
30
32
40
44
50
Amplitude (cm)
Fig. 4.1 Saccadic suppression and hand pointing performance. (a) Experimental procedure: Schematic representation of single- and double-step trials randomly presented during an experimental session. In all trials, the target was displaced from a central position to a randomly selected position in the right hemi1eld, and vision of the hand was turned off at the onset of the hand response. In double-step trials, the peripheral target jumped again to the right at the time the saccadic eye response reached its peak velocity, i.e. nearly at hand movement onset. The second target step represented 10% of the 1rst step amplitude and was not detected consciously by subjects. (b) Spatial distribution of hand pointings. Distributions of the endpoints of hand pointing responses, pooled over 4 subjects, towards single-step targets at 30and 40cm and towards double-step targets (30–32 and 40–44 cm). Note that pointings to double-step targets undershoot the 1nal target location (a characteristic of hand movements performed without visual feedback) to the same extent as pointings to single-step targets, demonstrating the existence of corrective processes compensating for the target position perturbation. (From Pélisson et al. 1986.)
aapc04.fm Page 67 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
maximal grip aperture during the reaching phase of the movement. Strikingly, they observed that grip size was largely determined by the true size of the circle to be grasped and not by its illusory size. In a later study, Haffenden and Goodale (1998) compared the scaling of the grasp to a matching condition, in which subjects had to indicate the central circle size with thumb and index 1nger without reaching it. The effect of the illusion on this ‘matching’ task was very similar to the mean difference in actual size required to produce perceptually identical circles, whereas it was signi1cantly smaller in the grasp condition. This result suggests that matching object size with the 1ngers relies on an object representation similar to the perceptual representation. In contrast, the motor representation for grasp remained much less affected by the illusion. Another such experiment was performed by Gentilucci et al. (1996) to explore the effect of the Müller–Lyer illusion on pointing behaviour (see Fig. 4.21). The Müller–Lyer illusion induces the perception of longer or shorter length of a line ended by arrows and has been widely used by psychologists and philosophers to argue about the cognitive penetrability of visual perception (see Rossetti 1999). When the two arrows are directed to the centre of the line, it appears shorter. When they are oriented away from the line, it appears longer. Gentilucci et al. (1996) compared pointing responses made from one to the other end of lines linked to the two types of arrows used in the Müller–Lyer illusion, the subject having to look at the 1gure for two seconds prior to initiating the movement. Mean endpoints were signi1cantly, though slightly, in2uenced by the visual illusion, so that movement distance was increased or shortened by a few millimeters, according to the type of illusion produced. As in the Haffenden and Goodale (1998) study, the in2uence of the illusion on the goal-directed action was much less than on perception, because the perceptual effect usually covered about 20% of the physical line length used by Gentilucci et al. (Rossetti, unpublished). Interestingly, early movement kinematics were also altered, which suggests that the illusion affected the programming of the movement, and not only its 1nal execution. We shall come back later to précis the effects of illusion on action and examine possible points of controversies between authors. For the moment let us consider that visual illusions seem to affect the perceptual system in a more pronounced way than the action system, which may further support the idea of a dissociation between perception and action.
4.2.3 Masking Visual masking has been used extensively as a probe to study conscious experience and cognition (reviews in: Price 2001; Bar 2000), and may explain some of the effects observed during saccadic suppression (Matin, Clymer, and Matin 1972). We will consider here some speci1c implications of masking for action control. Taylor and McCloskey (1990) investigated the triggering of preprogrammed motor responses to masked stimuli. Three stimuli were tested: one small central LED with a 5ms pulse, a large stimulus composed of the central LED plus four surrounding LEDs, and a sequential stimulus, where the central LED was lit 50 ms prior to the onset of the surrounding LEDs. (c) Duration of hand pointings. Relationship between hand pointing duration (mean and standard deviation) and target step amplitude for three single-step targets (30, 40, and 50 cm) and for two double-step targets (30–32 and 40–44 cm). Same responses as in (b) It can be seen that the same relationship accounts for both types of trial, indicating that motor correction in response to a target perturbation (see panel (b)) is not related to an increased pointing duration. (Modi1ed from Goodale et al. 1986, and Pélisson et al. 1986.)
67
aapc04.fm Page 68 Wednesday, December 5, 2001 9:21 AM
68
Common mechanisms in perception and action
This last stimulus could evoke both metacontrast (masking by a surrounding shape) and backward masking (masking with a subsequent light of greater intensity than the small test light). Three motor responses of various complexities (from a single muscle group contraction to a predetermined movement sequence) were used. Reaction times (RT), as measured by EMG, were not affected by the masking of the small stimulus in the sequential condition. Comparison of RTs obtained for the large and for the sequential stimulus showed that motor response registered in the sequential condition was triggered by the short, small stimulus preceding the masking surrounding. Although the simple response evoked a shorter RT, a similar effect of the masked stimulus was observed for the three types of movements tested. This experiment thus con1rmed that motor reaction to a visual stimulus can be dissociated from the verbal report about detection of this stimulus (see also Fehrer and Biederman 1962). As stated by Taylor and McCloskey (1990, p. 445), ‘the ability to react to such stimulus with a voluntary movement implies that sensory processing during reaction time does not have to be completed before motor prcessing can commence’. Indeed, motor RTs are usually shorter than the 500 ms delay that may be required before a conscious sensation can be elicited. Although these results con1rmed that unconscious operations proceed faster than conscious ones, they cannot tell whether conscious perception and motor reaction are processed along parallel pathways with different thresholds, or whether these two responses can be elicited at different stages of serial sensory processing. It appears that masking and metacontrast affect conscious perception of the stimulus although the ability to trigger a motor response remains largely intact. Neumann and Klotz (1994) have speci1cally explored several aspects of this phenomenon. They showed that similar effects could be observed on RT (measured by keypressing) even in a two-choice situation that required integrating form information with position information. In addition, this priming effect in2uenced the error rate as well as speed of the motor response, and could appear despite of the use of variable stimulus–response couplings, showing that it is not restricted to preprogrammed responses. Taylor and McCloskey (1996) also replicated this 1nding in their experimental design. Interestingly it has been shown that the brain-activation pattern triggered by a masked stimulus is very similar to that triggered by unmasked ones. In a very elegant experiment, Dehaene et al. (1998) have shown that a masked stimulus used in a semantic priming task could activate up to the primary motor area (see also Eimer and Schlaghecken 1998). Similarly, recordings of the Lateralised Readiness Potential (LRP) in motor areas provide a physiological basis for the Simon effect (review in Hommel 2000). The double-step paradigm, applied to both reportable and non-reportable target perturbations, as well as experiments exploring the effect of visual illusions or masking on action, suggest that the neural pathways leading to visual awareness are distinct from those involved in visuomotor processing. The implicit processing of sensory information during action may affect the release of a preprogrammed motor output as well as motor planning or on-line control of the execution. The experimental study of neurological cases allows researchers to speculate on the possible anatomical substrate for this dissociation.
4.2.4 Optic ataxia Descriptions of the effects of lesions of a restricted area of the posterior parietal lobe were reported in groups of patients by Jeannerod (1986) and Perenin and Vighetto (1988). These patients had dif1culties in directing actions to objects presented in their peripheral visual 1eld although they were not impaired in the recognition of these objects, a neurological de1cit that was termed ‘optic
aapc04.fm Page 69 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
ataxia’. Visually directed reaching movements made by these patients are inaccurate, often systematically in one direction (usually to the side of the lesion). In addition, these movements are kinematically altered: their duration is increased, their peak velocity is lower, and their deceleration phase is longer. This alteration of movement kinematics becomes particularly apparent when vision of the hand prior to and during the movement is prevented. Restoration of visual feedback reduces the reaching errors, but the movements remain slower than normal (Jeannerod 1986). Object grasping and manipulation are also altered by posterior parietal lesions. Patients misplace their 1ngers when they have to visually guide their hand to a slit (Perenin and Vighetto 1988). During prehension of objects, they open their 1nger grip too wide with no or poor preshaping, and they close their 1nger grip when they are in contact with the object (Jakobson, Archibald, Carey, and Goodale 1991; Jeannerod 1986). They exhibit de1cits not only in their ability to reach to the object, but also in adjusting the hand orienting and shaping during reaching. In contrast, they seem to remain able to indicate the orientation of a stimulus by a wrist movement that is not aimed to the stimulus (matching task: see Jeannerod, Decety, and Michel 1994). These results strongly suggest that the posterior parietal cortex plays a crucial role in the organization of object-oriented actions, whether the visual processig required for a given action is concerned with spatial vision (location) or with object vision (size or shape) (see Jeannerod 1988; Jeannerod and Rossetti 1993; Milner and Goodale 1995; Rossetti 1998, 2000). One interpretation of optic ataxia is that patients present a de1cit in programming hand movements (Jakobson et al. 1991). Recent evidence rather suggests that de1cits result primarily from a disruption of on-line motor control (Gréa et al. 2002; Pisella et al. 2000).
4.2.5 Visual agnosia Pathological conditions may also result in disconnecting parietal mechanisms for processing object attributes from those for programming the hand con1guration. Jeeves and Silver (1988) reported the case of a patient with callosal agenesis who was unable to grasp objects correctly if they were brie2y presented within either half of his visual 1eld. The hands remained wide open throughout the movement and did not adapt to the object size. Jeeves and Silver speculated that, due to the absence of callosal control, the crossed corticospinal pathway (normally responsible for the control of 1nger movements) could not be activated by visuomotor mechanisms. Instead, the patient had to use the ipsilateral motor pathway, which was inappropriate for carrying the correct commands. These results have prompted a reappraisal of the respective functions of the two cortical pathways. The posterior parietal cortex exerts a role in organising object-oriented action, whether movements are executed by the proximal or the distal channel. This role must be dissociated from the role of other cortical structures specialised for object identi1cation and recognition. An observation by Goodale et al. (1991) provides another piece of evidence for this dissociation between perception and action, showing a reciprocal pattern to optic ataxia. These authors reported the case of a patient who developed a profound visual-form agnosia following a bilateral lesion of the occipito-temporal cortex. DF was unable to recognize object size, shape, and orientation (Fig. 4.2). This patient was also unable to purposively size her 1ngers according to the size of visually inspected target objects based on a representation of these objects (matching tasks). In contrast, when instructed to pick up objects by performing prehension movements, the patient was quite accurate and her maximum grip size correlated normally with object size. This observation suggests that, during action, DF could still process visual information about the object properties she could not perceive. If these results are compared with those following posterior parietal
69
aapc04.fm Page 70 Wednesday, December 5, 2001 9:21 AM
70
Common mechanisms in perception and action
Fig. 4.2 Action and object processing in visual agnosia and blindsight. Polar plots illustrating the orientation of a hand-held card in two tasks of orientation discrimination, from an agnosic patient (DF), a blindsight patient (JCG), and an age-matched control subject. On the perceptual matching task, subjects were required to match the orientation of the card with that of a slot placed in different orientations. On the ‘posting’ task, they were required to reach out and insert the card into the slot. The correct orientation has been normalized to the vertical. (Adapted from Goodale et al. 1991, and Perenin and Rossetti 1996.) lesions, impairments in perceptual recognition of objects and in object-oriented action appear to be clearly dissociated. Optic ataxia and visual agnosia patients would support the case for a double dissociation between perceptual recognition of objects and object-oriented action (see Milner and Goodale 1995) (this conclusion will, however, be questioned in a further section). It may be emphasized here that DF had her primary visual area spared. As a consequence, processing of visual information may have been disrupted only in the ventral pathway and spared in the dorsal pathway, which would explain why she could perform visually directed movements. The question therefore arises whether blindsight patients, with V1 lesions, would also exhibit a similar dissociation between perception and action.
4.2.6 Action-blindsight In addition to optic ataxia and visual agnosia, mentioned above, blindsight is another neurological de1cit that is interesting to consider in the framework of the dissociation between implicit and explicit sensorimotor processing. Early studies on patients with lesion of the primary visual area (V1), considered o be amputed from the half of their visual 1eld, showed that they remained able to orient eyes and/or the hand to visual stimuli brie2y presented within their blind 1eld (see Weiskrantz 1986). It has been recently shown that some patients could orient their hand and size their 1nger grip appropriately when reaching out to unseen visual objects (Fig. 4.2) (Jackson 2000; Perenin and
aapc04.fm Page 71 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Rossetti 1996). The neuroanatomical substrate proposed to explain this action-blindsight (Rossetti et al. 2001) was the projection from the superior colliculus to the posterior parietal cortex via the pulvinar (Bullier et al. 1996; Perenin and Rossetti 1996). Therefore this fascinating non-conscious vision, emerging during a goal-directed action, is considered to provide one more instance of dissociation between the dorsal (parietal) and the ventral (temporal) streams of the visual system (e.g. Milner 1998; Milner and Goodale 1995; Rossetti 1998; Rossetti et al. 2000).
4.2.7 Action-numbsense A patient with a left parietal thalamo-subcortical lesion was studied for signs of residual processing of somesthetic modalities. The patient was unaware of any tactile stimuli applied to his arm skin and
Fig. 4.3 Numbsense: direct pointing versus pointing on a drawing. A set of 8 stimulus locations was used in this experiment: the posterior ungual phalanx of the 1ve 1ngers + the palm centre + the wrist + the middle forearm of a patient exhibiting action-numbsense following a central lesion of somatosensory afference. Pointing with the left index 1nger was made toward the locus stimulated (1) directly on the right forearm, (2) on an arm drawing of the right forearm (scale 1). Patient JA was blindfolded when answering directly on the arm. When pointing on the drawing, JA could see the A4 sheet with the arm drawing placed next to his unseen target arm. An opaque curtain was used to prevent him from seeing his target right arm. In addition, the whole arm and face of the investigator delivering the stimuli remained out of sight throughout the experiment, so that no indice (e.g. gaze direction) was available to the patient. (From Rossetti et al. 2001.)
71
aapc04.fm Page 72 Wednesday, December 5, 2001 9:21 AM
72
Common mechanisms in perception and action
failed to demonstrate any signi1cant performance in a verbal forced-choice paradigm. However, he generated above-chance levels of performance when pointing at the stimulus location on the numb arm (Rossetti, Rode, and Boisson 1995, 2001). This observation is similar to that of Paillard, Michel, and Stelmach (1983), who presented a tactile equivalent of blindsight. The question under investigation was whether the residual ability of the patient was linked to the mode of response (motor vs. verbal) or to the representation subservient to these responses (motor vs. symbolic). Interestingly, when the patient had to point to stimulus locations on a drawing of an arm, no signi1cant performance was observed (chance level Fig. 4.3). This dissociation indicates that only a representation of the stimulus linked to the body scheme was preserved, whereas more elaborate representations of the stimulus had vanished. In addition, the patient was unable to localize verbally his right index 1nger when it was passively positioned in a horizontal plane, but demonstrated signi1cant performance when pointing to this 1nger with the left hand. Therefore numbsense can apply to proprioception as well. These results reinforce the interpretation proposed above for actionblindsight: there seems to be a sensory system speci1c for action.
4.2.8 Conclusion These observations suggest the existence of a speci1c representation for those (extrinsic as well as intrinsic) object attributes which are used for controlling movement. In the action of grasping an object, the role of the sensorimotor representation will be to transform the visual qualities of the object into corresponding action-speci1c motor patterns for the hand to achieve the proper action. This mode of representation thus relates to the object as a goal for an action. The object attributes are represented therein as affordances, that is to the extent that they afford speci1c motor patterns (see Riddoch et al. 2001). This pragmatic, or sensorimotor representation seems to specify the metric properties of the action goal in a veridical way, because the hand has to interact with real objects rather than with distorted representations (it would therefore match the theoretical properties of an ‘immaculate perception’,1 Rossetti 1999). It is different to the mode used during the process of overt recognition, and by which an object can be named, categorized, and memorized. This process implies a representation of the semantic type, where the object appears as an identi1able entity and remains invariant across different vantage points. Its elementary attributes (size, orientation, colour, texture, etc.) are bound together to form a speci1c unit. At variance with this cognitive representation, the pragmatic representation implies no binding of attributes into a single percept (see Revonsuo and Rossetti 2000). Instead, each attribute of the graspable object is represented in itself and contributes to the motor con1guration of the arm and hand. The above hypothesis implies that the cortical mechanisms for object recognition or for objectoriented action are selectively activated by the task in which the subject is involved. If the task involves recognizing, memorizing, or forming a visual image of an object, only the ventral visual pathway should be activated. If, on the other hand, the task involves 1nger movements for grasping or manipulating an object, the dorsal pathway should be activated. Taken altogether, the 1ndings made on the ability of neurological patients as well as normals to process sensory information speci1cally for action purposes suggest that vision (or somaesthesia) for action and vision for perception can be dissociated. The dissociations found in optic ataxia, visual agnosia, and blindsight further suggest that the neurological substrates for these two functions could be located selectively in the dorsal and in the ventral streams of visual processing.
aapc04.fm Page 73 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
4.3 Neuroanatomy of visual-to-motor connections 4.3.1 Cortical versus subcortical vision About 100 years ago, anatomical studies 1rst suggested the existence of three visual pathways by which the retina was connected to the cortex. Apart from the main route through the lateral geniculate body, von Monakow identi1ed one pathway through the pulvinar and another through the superior colliculus. These pathways terminated in cortical areas outside the striate area (see Polyak 1957). Cajal (1909) described a ‘descending’ or ‘motor’ pathway arising from the fourth layer of the superior colliculus and terminating in the ocular motor nuclei and the adjacent reticular formation. This pathway was thought to carry orienting as well as pupillary re2exes. Subcortical vision was thus considered by Cajal to be pure motor vision. Accordingly, extensive lesions of this structure were shown to produce severe impairment in eye movements and visuomotor behaviour (Sprague and Meikle 1965). The distribution of retinofugal 1bres between the retinogeniculate and the retinotectal pathways was interpreted within the framework of a dichotomy between two visual systems endowed with complementary functions. Schneider (1969) proposed that the geniculostriate pathway is a system essential in the learning of pattern discrimination, and that the retinotectal pathway is a system for mediating spatial orientation. Using hamsters, he dissociated ‘cortical blindness’ from ‘tectal blindness’. Following ablation of visual areas 17 and 18, animals became unable to learn simple pattern discrimination (e.g. vertical vs. horizontal stripes), although they remained able to orient toward stimuli (e.g. sun2ower seeds) presented across their visual 1eld. By contrast, following large undercuttings through the midbrain tectum, spatial orientation ability was lost, whereas pattern discrimination was still possible. This anatomical and functional duality became known as the now classical oppoition between a system specialized for answering the question ‘What is it?’ and another one specialized for answering the question ‘Where is it?’ (Schneider 1969). A model of visuomotor coordination built on the notion of two visual channels for movement control was also presented by Trevarthen (1968). This author studied visuomotor behaviour in split-brain monkeys and concluded that the subcortical visual system subserved ‘ambient’ vision, while the cortical system subserved ‘focal’ vision. Pathological destruction of the visual cortex in humans was classically thought to produce total blindness, except for pupillary responses to light and very crude visual perception limited to sudden changes in illumination. This opinion, however, was called into question on the basis of experimental 1ndings in monkeys. Although destriated monkeys also appeared to be profoundly impaired in their ordinary visual behaviour, they were still able to avoid obstacles and to generate motor responses for reaching objects appearing in, or moving across, their visual 1eld (Humphreys and Weiskrantz 1967). These anatomical 1ndings represented a strong argument for the role of subcortical structures in mediating residual visual function in destriated monkeys. Mohler and Wurtz (1977) showed that partially destriated monkeys, which were able to orient visually toward stimuli presented within their scotoma, lost this ability after subsequent destruction of the retinotopically corresponding zones of the superior colliculi. Thus in monkey, the superior colliculi, and possibly other brainstem areas receiving input from the retina, may play a critical role either in mediating pure ‘subcortical vision’ or in relaying visual input to other structures onto which they project, including extrastriate cortex. In humans, clinical observations suggestive of incomplete or ‘relative’ blindness within scotoma of cortical origin had been mentioned previously by several authors (see Weiskrantz 1986, for
73
aapc04.fm Page 74 Wednesday, December 5, 2001 9:21 AM
74
Common mechanisms in perception and action
review). Systematic experimental evidence of residual visual abilities following lesions of the striate cortex was 1rst reported by Pöppel, Held, and Frost (1973). This experiment used a new methodological approach derived from the monkey studies and based on forced-choice responses. Cortically lesioned subjects were requested not to try to see stimuli that were presented within their scotoma, but rather to turn their eyes or point their hand each time a stimulus was presented (see also Weiskrantz, Warrington, Sanders, and Marshall 1974). The amplitude and direction of the responses de1nitely correlated with target positions. Similar results were obtained by Perenin and Jeannerod (1978) and Ptito, Lepore, Ptito, and Lassonde (1991) in hemidecorticated subjects. In this situation the complete loss of cortex on one side stressed the role of subcortical vision. The fact that subjects tested for ‘blindsight’ remain unaware of the stimuli, and usually experience ‘guessing’ rather than ‘seeing’, would be in accordance with the classical idea that subcortical vision is ‘unconscious’. As can be seen in Fig. 4.5, the subcortical network also projects to cortical visual systems (see Bullier, Schall, and Morel 1994; Girard 1995).
4.3.2 Two cortical visual systems In rodents, lesions of the striate cortex appeared to affect orientating behaviour toward targets located within the rostral visual 1eld, whereas this ability was spared after collicular lesions. By contrast, the superior colliculus was necessary for orienting toward targets placed in the far peripheral visual 1eld (see Goodale 1983). Thus, orientating seems a more complex function than suggested by Schneider’s results, and cannot be completely dissociated from pattern discrimination, especially in the most central parts of the visual 1eld. Later experiments performed on monkeys suggested that both modes of vision were mediated by two diverging corticocortical pathways for processing ‘what’ versus ‘where’. One pathway was the ventral occipitotemporal route, linking striate cortex to prestriate areas and from there reaching inferotemporal cortex on both sides via callosal connections. Interruption of this pathway abolished object discrimination without affecting perception of spatial relations between objects. The other, dorsal, pathway diverged from the ventral one by linking the prestriate areas to the posterior part of the parietal lobe. Interruption of this pathway produced visual spatial disorientation characterized not only by misperception of the relative positions of spatial landmarks (Ungerleider and Mishkin 1982), but also by localization de1cits during object-oriented action (Ungerleider 1995) (see Fig. 4.4). As mentioned above, cases of optic ataxia and visual agnosia have raised the possibility that the anatomical dorsal–ventral division may instead relate to a distinction between the processing of ‘what’ vs. of ‘how’ to direct an action (Goodale and Milner 1992; Jeannerod and Rossetti 1993; Milner and Goodale 1995). This renewed conception of parallel visual systems was no longer based on the modalities of visual coding (what vs. where), but rather on the modes of representation of the target object, that is directly linked to the type of response produced by the subject (what vs. how) (Jeannerod 1994). In monkeys, posterior parietal lesions produce a reaching de1cit, characterized by the fact that the animals usually misreach with the arm contralateral to the lesion in either part of the visual 1eld (e.g. Faugier-Grimaud et al. 1978, 1985; Hartje and Ettlinger 1973). In addition, as discovered by Faugier-Grimaud et al. (1978), after lesions limited to the inferior parietal lobule (monkey area 7), the contralesional 1nger movements are impaired during grasping. These 1ndings are consistent with the properties of neuronal populations recorded in this cortical region; neurones coding for the direction of reaching arm movements were described in this area by Hyvärinen and
aapc04.fm Page 75 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.4 Several sketches of the visual system. Several representative conceptions of the main neural pathways in the visual system. A–C: From the main visual input to area 17, two segregated streams of processing have been described projecting respectively to the posterior parietal cortex (dorsal pathway) and to the inferotemporal cortex (ventral pathway). While the ventral pathway is specialised in processing colour and form and is assumed to play a key role in object identi1cation (‘what’), the dorsal pathway is known to be primarily involved in the computation of places and movement (‘where’) (Morel and Bullier 1990; Ungerleider and Desimone 1986; Ungerleider and Mishkin 1982) and in the sensorimotor processing of the object metrics (‘how’) (Schwartz 1994). D: Both pathways project onto frontal structures involved in action. A. Ungerleider and Desimone (1986) B. from Morel and Bullier (1990) C. from Schwartz (1994) D. Ungerleider (1995) Poranen (1974) and by Mountcastle et al. (1975). More recently, another population of cells, selectively activated during manipulation by the animal of objects of given con1gurations, were described by Taira et al. (1990). The production of typical visuomotor de1cits by lesions and the recording of typical sensorimotor activities in the posterior parietal cortex strengthened the conception of
75
aapc04.fm Page 76 Wednesday, December 5, 2001 9:21 AM
76
Common mechanisms in perception and action
a dorsal visual system specialized for action (review in Sakata and Taira 1994; Jeannerod, Arbib, Rizzolatti, and Sakata 1995; Milner and Dijkerman 2001; Milner and Goodale 1995; Pisella and Rossetti 2000; Rossetti 1998; Sakata et al. 1997).
4.3.3 An occipito-frontal visuomotor network The study of the dorsal–ventral dissociation in the motor context led researchers to distinguish speci1c motor abilities, which are dissociable from conscious experience and preserved by patients with lesions of the ventral stream, from other types of motor responses. The dorsal–ventral distinction 1nally evolved toward the conception of two parallel visual streams in the occipito-frontal network for visual-to-behavioural motor responses (Rushworth et al. 1997; Schwartz 1994; Ungerleider 1995; see Fig. 4.4). Occipito-parieto-frontal networks have been precisely identi1ed for reach and grasp movements (Jeannerod et al. 1995; Rossetti et al. 2000; Sakata et al. 1997; Tanné et al. 1995). However, the visual processing of both the dorsal and the ventral streams has to join the motor structures in order to allow the subject to produce behaviour adapted to his or her environment. Indirect projections of the ventral stream toward the motor regions exist: the temporal area TE can connect the primary motor area after a relay in the prefrontal and then in the premotor regions (Tanné et al. 1995; see Fig. 4.5). The temporal lobe can also be implicated in action via its connections to the basal ganglia. Two types of behavioural arguments support this idea of a ‘dual route’ to visuomotor action (Milner and Dijkerman 2001). The 1rst line of studies distinguished between ‘sensorimotor’ and ‘cognitive’ representations underlying actions, either in normal subjects or in neurological patients. The involvement of the dorsal or ventral stream in action was based on the modes of representation of the goal of the movement: egocentric versus allocentric coding, goal-directed action versus matching, implicit versus explicit processing, grasping of meaningless shape versus meaningful objects. We shall describe this distinction in more detail in the following sections. The second line of studies concerned conditional motor tasks. Not all sensorimotor transformations consist of goal-directed actions and involve the computation of sensorimotor coordinates and the shaping of the hand with respect to object properties. Other aspects of motor behaviour depend on object identity. The functional role of a stream connecting areas involved in object perception and recognition with the motor structures accounts for the usual associations between a speci1c stimulus and a motor behaviour (like braking associated with red lights). Rushworth, Nixon, and Passingham (1997) conclude from lesion studies in monkeys that neither part of the parietal lobe may play a major role in the selection of movements made to arbitrarily and conditionally associated visual stimuli. Relatedly, a patient with bilateral posterior parietal lesion exhibited no dif1culty performing instructed motor responses (stop or redirect ongoing action) to visual stimuli, but lost the automatic visuomotor guidance of action (Pisella et al. 2000; see Fig. 4.14). In search of the detailed neuroanatomical basis for the ventral and dorsal systems, Fig. 4.5 presents an attempt to synthesize the cortical neuronal networks described in monkeys between V1 and M1 allowing visual inputs to be transformed into motor output. Although the dorsal and the ventral streams can be individuated from this network, this illustration displays a possible substrate for common participation of these two systems in action. Strikingly, M1 receives only mixed projections and no pure projections, either from the dorsal system or from the ventral one. Although distinctions can be
aapc04.fm Page 77 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.5 (See also the color plate of this figure.) Overview of the visual-to-motor network. Cortical neuronal networks allowing visual inputs to be transformed into motor output. This illustration displays the possible substrates for dissociation and interactions between ventral and dorsal pathways driving information from V1 to M1. The dorsal and the ventral streams are depicted in green and red, respectively, as well as their efferences. Blue arrows arise from areas receiving convergent dorsal and ventral inputs, either directly or indirectly. Further projections from areas receiving these mixed convergent inputs have also been represented in blue. Even though the posterior parietal cortex and the inferior temporal cortex receive a single direct projection from each other, they were not considered as mixed recipient areas. By contrast, areas in the frontal lobe receive parallel dorsal, ventral, and mixed projections. Interestingly, at the motor end of this network there is no pure projection from either the dorsal or the ventral stream of visual processing. Abbreviations: AIP: anterior intraparietal area; BS: brainstem; Cing.: Cingulate motor areas; d: dorsal; FEF: frontal eye 1eld; FST: 2oor of the superior temporal sulcus; Hipp.: Hippocampus; LIP: lateral intraparietal area; M1: primary motor cortex; MIP: mesial intraparietal area; MST: medial superior temporal area; MT: medio-temporal area; PF: prefrontal cortex; PM: premotor cortex; SC: superior colliculus; SEF: supplementary eye 1eld; SMA: supplementary motor area; STS: superior temporal sulcus; STP: superior temporal polysensory area; TE: temporal area; TEO: temporo-occipital area; v: ventral; V1: primary visual cortex; VIP: ventral intraparietal area. (updated from Rossetti, Pisella, and Pélisson 2000—derived from Colby et al. 1988; Morel and Bullier 1990; Schall et al. 1995; Schwartz 1994; Tanné et al. 1995; Van Hoesen 1982.) described between cortical and subcortical vision, between dorsal and ventral stream, and between two occipito-frontal routes, the important point raised here is that all subsystems considered in these distinctions are interconnected.
77
aapc04.fm Page 78 Wednesday, December 5, 2001 9:21 AM
78
Common mechanisms in perception and action
4.4 The gap between anatomy and behaviour The above two lines of evidence make a strong case for the dissociability perception and action, on the one hand, and for the interconnection of visual-to-motor networks, on the other. The examples of dissociation presented here suggest that there must be two independent visual systems which can give rise either to action or to conscious perception. However, taken as a whole, the anatomical data do not support as clear a segregation between two major pathways as they are often considered to establish. Instead, Fig. 4.5 suggests that interaction between the two streams can take place at many levels before visual information reaches the motor output, and that a pure input from the dorsal stream onto the motor areas cannot be isolated within the visuomotor network. A crucial point to note here is that most of the evidence used to support a dissociation between two visual subsystems comes from animal lesion or neuropsychological patients, whereas most of the arguments for an interaction between these two systems comes from experimental work performed in normal subjects. Is it possible to bring together these two lines of evidence? There must be ways to go beyond the surface of this lack of a direct anatomical correlate of a behavioural dissociation. If there is no doubt, as argued above, that perception and action can be dissociated (at least in some circumstances) then the anatomical data presented in Fig. 4.5 should be revised or improved. In particular, additional features of this anatomical network have to be identi1ed, which would explain this behavioural dissociation. Conversely, if, as argued above, there are numerous interconnections between the dorsal and the ventral anatomical pathways, then interactions should be observed between the two behavioural responses.
4.5 Anatomy is right Numerous examples of interaction between action and perception systems will be listed below, suggesting that the anatomical evidence for an interconnection between the two systems is followed by functional correlates. Altogether four types of cognitive→sensorimotor interaction can be observed, whereas one type of reciprocal sensorimotor→cognitive interaction has been described (see Fig. 4.6).
4.5.1 Perception can trigger action Since they don’t perceive objects, blindsight patients would never initiate spontaneous actions toward these objects. Their motor ability in pointing and orienting objects has always been observed in forced-choice experiments, where the action was initiated upon a go-signal (see Rosetti 1998; Weiskrantz et al. 1974, 1989). The same observation has been made for numbsense patients (Paillard et al. 1983; Rossetti et al. 1995, 2001). This experimental detail has a strong theoretical impact. It means that in order to allow the sensorimotor system to release an action, the cognitive system has to provide instructions to initiate this action.
4.5.2 Perception can inhibit action We have seen that perception and sensorimotor processing can be dissociated in action-blindsight and numbsense patients. It is interesting to note, however, that the preserved motor abilities of these patients were disrupted when the elaboration of a cognitive representation of the action goal was activated during the action (see Rossetti 1998). When asked to produce a verbal response simultaneously
aapc04.fm Page 79 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Cognitive representation
Triggers
Configures
Structures
Inhibits
Sensori-motor transformation
Stimulus
Response
Fig. 4.6 Summary of the various types of interactions observed between the sensorimotor and the cognitive representations of a stimulus leading to behavioural responses.
to the action, the motor performances of blindsight patients dropped to chance level (Fig. 4.7). And the same observation has been made with a numbsense patient, for both tactile and proprioceptive targets (Fig. 4.7): the simultaneous motor + verbal condition was responsible for random responses (see Rossetti et al. 1995). Experiments in normals have also shown that the co-activation of a verbal representation during a motor response changes the con1guration of the movement endpoint errors. Immediate pointing toward proprioceptive targets was tested with blindfolded subjects by Rossetti and Régnier (1995). For each target, constant and variable errors were computed. Variable errors were assessed by a con1dence ellipse of the endpoint distribution (Fig. 4.8). On each trial a target was presented on one out of six possible locations lying on a circle centred in the starting point. Because subjects were trained to point to these positions in a preliminary session and to associate a number (from 1 to 6) to each target, they could mentally extract the pattern of the target array and use it to encode the target location in an allocentric frame. In this case (as for delayed action, see Fig. 4.10), the distribution of endpoints would tend to align with this target array, that is, perpendicular to movement direction. If they encoded the target position in an egocentric reference frame, then their pointing distribution should remain unaffected by the target array and should be elongated in the movement direction (as in Vindras et al. 1998). In a ‘motor’ condition, subjects simply had to point toward the proprioceptive target after the encoding 1nger had just been removed from it. In a ‘motor + verbal’ condition, subjects similarly pointed toward the same proprioceptive targets but were instructed to give simultaneously a forced-choice verbal response of the target number. In the ‘motor’ condition, as in the condition of arbitrary ‘number’ verbalization, the orientations of the ellipse main axes were randomly distributed (Fig. 4.9(a)). This lack of in2uence from the context of the target array was interpreted as the pure
79
aapc04.fm Page 80 Wednesday, December 5, 2001 9:21 AM
80
Common mechanisms in perception and action
Fig. 4.7 Effect of verbalization on the reach performance of a patient with a lesion in the primary visual area and a patient with a lesion of the primary somaesthesic afferents (tested for touch and proprioception). The vision modality was assessed by testing the reach performance of the blindsight patient PJG toward visual targets presented in his blind visual 1eld. The reaching performance of the numbsense patient JA was evaluated toward tactile and proprioceptive stimuli on the affected side. In these three modalities, the correct reaching responses decreased when a forcedchoice verbalization of the target was produced simultaneously to the immediate motor response. (Adapted from Rossetti 1998.)
aapc04.fm Page 81 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.8 Constant and variable pointing errors. Endpoint recording was made of each individual movement, and each target was used to compute constant and variable pointing errors. Constant errors were measured in direction (angle) and amplitude relative to the ideal reach that would hit the target. Variable errors were assessed by con1dence ellipses (95%) of the scatter of 1nger end positions. The ellipse surface provided an estimate of the global pointing variability. The orientation of the ellipse major axis was computed relative to mean movement direction (angle beta). This most pertinent parameter revealed the sensorimotor or cognitive representation underlying the pointing movement. This parameter was shown to be affected by the various conditions of target coding (review: Pisella et al. 1996; Rossetti 1998, see Fig. 4.9) and by the delay of response (Rossetti and Régnier 1995, see Fig. 4.10.)
activation of a sensorimotor representation in this condition (Rossetti and Régnier 1995). In the ‘motor + verbal’ condition (with a speci1c spatial verbalization of the ‘target number’), the orientation of the ellipse, perpendicular to movement direction (see Fig. 4.9(a)), was interpreted as the result of the in2uence of an allocentric representation of the target position. Therefore the cognitive integration of the whole target pattern played a role in the immediate action only when a spatial verbal representation was activated (Rossetti 1998). In order to demonstrate that this result could not simply be attributed to an attentional bottleneck or dual task effect, several control experiments were performed (Fig. 4.9(b), and see Fig. 4.20 in the recipes provided in section 4.7). This type of cognitive →sensorimotor interaction effect was later con1rmed by the application of this motor–verbal paradigm to the Roelofs effect (Bridgeman 1997, 2000, this volume, Chapter 5).
81
aapc04.fm Page 82 Wednesday, December 5, 2001 9:21 AM
82
Common mechanisms in perception and action
Fig. 4.9 Effect of verbalization on the orientation of con1dence ellipses. (a) Histogram of the beta distribution for immediate (delay = 0s) and delayed pointing movements (delay = 8s) when subjects verbally report a number during their pointing responses. This number can result from downward counting (‘number’ condition) or from a guess about the target location (‘target number’ condition). With the ‘target number’ verbalization speci1c to the spatial location of the target, ellipses orientation tended to be more aligned with the arc array (beta reaches 90 deg). In contrast, with the arbitrary ‘number’ verbalization, the in2uence of the context of target presentation appears only for delayed pointing movements. Immediate movements are coded in an egocentric reference frame, independent of the target array as in the condition without simultaneous verbalization [schematized in (b)]. (b) Schematization of the in2uence of various verbalizations on the ellipse orientation of immediate pointings. Without verbalization, ellipses are randomly oriented. Ellipses tended to align with the arc array (beta reaches 90 deg in mean) only for the speci1c verbalization of the target spatial location (‘target code’) but not for the two arbitrary verbalizations: arbitrary number (‘downward counting’) or learned ‘texture code’ of the target. (Derived from Pisella et al. 1996; Rossetti 1998 and Rossetti and Régnier 1995.)
aapc04.fm Page 83 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.10 Effect of delay on the orientation of con1dence ellipses of pointing errors. (a) Histogram of the beta distribution for immediate (delay = 0s) and delayed pointing movements (delay = 8s) when targets were presented on an arc array or a line array. For immediate pointing, the con1dence ellipses are in2uenced by the direction of the pointing movement (as shown in Fig. 4.9). After a delay, ellipse orientation tends to align with the target array, revealing an allocentric coding of the target location. In the case of the arc array, beta tends to reach 90deg for delayed pointing movements. On the contrary, in the case of a line array aligned with the direction of the movement, ellipses orientation doesn’t change between immediate and delayed pointing movements (remains about 180deg). These results for 8s delay are schematized in (b). (b) Schematization of the in2uence of the target array on the ellipse orientation of delayed pointings. The result shown in (a) was replicated with various target arrays. (Derived from Pisella et al. 1996; Rossetti 1998 and Rossetti and Régnier 1995.)
83
aapc04.fm Page 84 Wednesday, December 5, 2001 9:21 AM
84
Common mechanisms in perception and action
As in the case of blindsight and numbsense, the observations made in normals con1rm that when a motor and a cognitive representation of a target are simultaneously elaborated, the cognitive representation seems to impose, at any level before the output, its spatial parameters on the final output.
4.5.3 Recognition of the objects to grasp con1gures the motor program The preserved motor performances in blindsight patients are classicaly observed after a training session performed in the intact visual 1eld (e.g. Rossetti 1998). This suggests that a con1guration of the motor act by the ventral stream is a preliminary step necessary for blind sensorimotor processes. Non-motor parameters have been shown to in2uence action in normal subjects after learned associations between objects’ features and motor parameters. Clever experimentation demonstrated that after such semantic associative learning, the processing of the colour of an object (Haffenden and Goodale 2000b), or of words written on an object, can interfere with the sensorimotor processing of the physical properties pertinent for grasp, and consequently affect the grip size. The size–weight illusion provides another such example (Flanagan and Beltzner 2000). In the same vein, Jeannerod et al. (1994) reported that a patient with a bilateral lesion of the dorsal stream (AT) exhibited normal actions toward familiar objects, whereas her actions toward similar unfamiliar 3D forms were severely impaired. These results suggest that goal-directed actions could be con1gurated by the visual recognition pathway. Even for normal subjects, the motor parameters of everyday actions made toward familiar objects can therefore be programmed on the basis of the knowledge about the usual physical characteristics of these objects.
4.5.4 Sensorimotor representation structures spatial cognition Both visuomotor adaptation in neglect patients and the natural ageing process have been shown to affect the elaboration of explicit cognitive representations of spatial information (Pisella and Rossetti 2000; Bhalla and Prof1tt 2000). The therapeutic improvement of neglect symptoms mediated by a pointing procedure of adaptation to a prismatic deviation (Rossetti et al. 1998b, 1999a) has demonstrated that the spatial cognition of brain-damaged patients can be restructured by adapting sensorimotor transformations. This profound action was shown to affect sensorimotor coordination and classical neuropsychological testing through visuomotor tasks (e.g. cancellation task, copying task, bisection task), as well as pure cognitive representation such as mental imagery (Rode, Rossetti, Li, and Boisson 1999; Rode et al. 2001). In addition, we have shown recently that this type of structuring effect of visuomotor adaptation can also alter spatial cognition (midline judgements) in normal individuals (Colent et al. 2000). Bhalla and Prof1tt (2000) too have shown that evaluation of the slope of a hill was in2uenced by subjects’ age; that is, by a subject’s adaptation to his/her reduced sensorimotor and power abilities. This effect is not observed over a shorter time range such as after jogging. This is coherent with Piagetian theory about child development, that is, the brain is structured by sensorimotor associations and learning (‘sensorimotor scheme’), which are progressively transferred to cognition (considered as covert action schemes). As shown by these examples of intermediate or long-term interactions between action and cognition, a much larger time-scale seems to be necessary to observe an in2uence from the sensorimotor level to the cognitive level. It is likely that the cerebellum is involved in this type of profound revision of these several levels of space representation, because this brain structure is the best candidate for the neurological substrate of adaptation (review: Jeannerod and Rossetti 1993). But other structures may be involved as well, as
aapc04.fm Page 85 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
suggested by the lack of contextual in2uence on delayed pointing performed by blindfolded subjects in the experimental situation depicted in Fig. 4.10 (Rossetti et al. 1996). Based on this short synthesis of the empirical data available on the interplay between cognition and action systems, it appears that explicit and implicit processing of space involved in action cannot be considered as fully dissociated from cognition and that there are multiple possible interferences between them. These interactions are not symmetrical. Cognitive→sensorimotor interactions are observed at the level of the duration of a single slow action, whereas sensorimotor→cognitive interactions can be observed over a much longer time-scale (as a result of many successive actions).
4.6 Anatomy is not sufficient Whereas our knowledge of brain anatomy is mainly based on monkey data and does still need development and re1nement, human behaviour can be observed directly. If there is a mismatch between neuroanatomy and behaviour, then we have to look at anatomy in a more functional way. Let us now consider the neuroanatomical networks connecting sensors and effectors in this light. The dissociation between a dorsal and a ventral stream of visual processing has been initially based on the connectivity of visual areas. More recently, the functional properties of the cells participating in each of these streams have been described. Recent single-unit studies in the monkey also provide evidence for a temporal dissociation between two visual pathways. Comparison of visual response latencies at various locations in the monkey visual system have led Nowak and Bullier (1997) to distinguish two groups of visual areas. Parietal areas from the dorsal stream, projecting onto the premotor cortex, exhibit particularly short visual latencies (about 40–80ms) as compared with other pre-striate areas. In their extensive review of the literature, the dorsal pathway projections towards frontal areas is therefore referred to as the ‘Fast Brain’, whereas temporal areas are described as the ‘Slow Brain’ (about 100–150ms). As stressed by Nowak and Bullier (1997), the visual latencies do not match a hierarchical model of purely anatomical organization, but rather follow the distribution of magnocellular and parvocellular inputs. The speed of the occipito-parietal processing seems to be explained by the faster conductivity of the magnocellular channel almost exclusively activating this dorsal stream and by the numerous bypass connections existing in it, whereas the ventral stream seems to be connected in a more serial fashion (Nowak and Bullier 1997; Schall, Morel, King, and Bullier 1995). In addition, neuroanatomical tracing has shown that parietal areas of the dorsal stream project directly to the dorsal premotor cortex, whereas the ventral stream projects only indirectly to ventral premotor cortex via the ventral prefrontal cortex (Schmolesky et al. 1998; Schwartz 1994; Tanné et al. 1995; see Fig. 4.5). Can we now identify some behavioural correlates of the functional properties of neuroanatomical projections?
4.6.1 Immediate versus delayed actions Differences between immediate and delayed actions have been reported in normal and braindamaged subjects. With respect to the temporal issues raised here, it is interesting to note that visuomotor performance (how) in brain-damage patients and healthy subjects can depend upon the delay and the speed of the motor response.
85
aapc04.fm Page 86 Wednesday, December 5, 2001 9:21 AM
86
Common mechanisms in perception and action
The effect of a delay can be easily tested in normals with various simple tasks. In general, there is a global decrease in performance when the delay introduced between the stimulus presentation and the response is increased. This effect is mainly observable in terms of an increase in response variability. Interestingly, it has also been observed that the effect of delay duration is not linear (see Fig. 4.11(a)). Using a simple experimental design, Rossetti et al. (1994) had subjects point with various delays to visual targets 2ashed on a monitor. Nine target locations were used, and organized along an arc centred on the starting position (see Rossetti 1998). Several accuracy parameters were investigated (Fig. 4.8). First, the global variability, as assessed by the surface of the con1dence ellipse 1tting the movement endpoints, continuously increased with the delay. Second, the evolution of the orientation of the main axis of the con1dence ellipses 1tted for each target followed instead a two slope function: it tended to be aligned with movement direction in the absence of a delay and then rapidly increased for the 500ms delay (see Fig. 4.11(b)). Between the 500ms and the 8s delay, a nearly horizontal plateau was reached, with ellipse orientation tending to be aligned with the target array, that is orthogonal to movement direction (see Rossetti et al. 2000: Fig. 4.10). Third, the orientation of the constant error vector in space also followed a similar two-slope trend. As shown in Fig. 4.11(a), it is rather striking that experiments investigating the effect of a delay on eye movement accuracy in the monkey made similar observations (Krappmann 1998; White, Sparks, and Stanford 1994). Although the parameters used in the monkey saccade experiment were not identical to the one used in the human pointing experiment, it is interesting to observe that a similar time course could be observed in both studies. These results indicate that a different type of sensorimotor process is at work in the immediate and in the delayed condition. A short-lived egocentric representation of the target location seems to be used to guide immediate actions. However, an allocentric coding of the visual target seems to participate in the delayed action, which is affected by the global spatial context of the experiment that has been extracted by a trial-to-trial integration over time. In addition, similar results have been observed for delayed pointing to proprioceptively de1ned targets (Fig. 4.10) (Pisella et al. 1996; Rossetti and Procyk 1997; Rossetti and Régnier 1995; Rossetti et al. 1996). Neuropsychological data have also reported the effect of delay on the motor performances of patients with a lesion of the dorsal or ventral stream. The agnosic patient DF can correctly reach and grasp objects that she cannot describe, but loses this preserved motor ability when her action is delayed by only 2s (Goodale et al. 1994). Goodale et al. reported that many kinematic landmarks of the grasping movement were affected by a 2-s delay introduced between stimulus viewing and movement onset. In particular, the opening and closure of the 1nger grip was altered and maximal grip size was reduced as compared with normal movements. Strikingly, movements delayed by 30s and pantomimed movements performed beside the object were similar to those observed after 2s. Conversely, an ataxic patient (AT) described by Milner et al. (1999a) performed imprecise reach and grasp movements when instructed to act immediately to objects, but she was (paradoxically) improved when a delay was introduced between the stimulus presentation and the pointing response (see also Milner and Dijkerman 2001). Action-blindsight and numbsense have also been shown to be disrupted when a delay is introduced between the stimulus and the response (see Rossetti 1998, Fig. 4.12). These results converge towards the idea that when action is delayed and the object has disappeared, the parameters of object position and characteristics that are used by the action system can only be accessed from a sustained cognitive representation. This type of representation obviously relies on different reference frames with respect to the immediate action system. Furthermore, the neuropsychological data suggest that the dorsal stream is able to build a short-lived sensorimotor representation of the target that is only available for immediate actions.
aapc04.fm Page 87 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
(a)
Monkey 333 Monkey 340 Mean variable error (deg)
Mean constant error (deg)
6
4
2
0 4 2 Delay (sec)
6
90
60
30
0
0
1
2
3 4 5 Delay (sec)
6
2
0 4 2 Delay (sec)
0
Mean variable error orientation (deg)
(b)
Mean constant error orientation (deg)
0
4
7
8
6
90
60
30
0
0
1
2
3 4 5 Delay (sec)
6
7
8
Fig. 4.11 Time course of error distributions. Effect of increasing delay intervals (between the target extinction and the go-signal) on the precision of two types of motor responses. Constant errors (left) and variable errors (right) de1ned in Fig. 4.8 were plotted as a function of the delays. Plotted points are the errors averaged across all target locations. Similar tendency was observed in two studies on (a) ocular and (b) manual motor errors. (a): Constant saccadic errors showed a drastic increase between 0 and 1s and stabilized for longer delays in the two monkeys (White et al. 1994). Variable errors follow the same evolution with delays at least for one monkey. (b): Both constant and variable pointing errors toward visual targets increased sharply in humans between 0 and 1s and then reached a plateau. (Drawn from Rossetti et al. 1994.)
4.6.2 Fast versus slow actions Following the initial 1nding that movements could be updated on-line unbeknown to the subject (Pélisson et al. 1986), related studies stressed the high speed of motor correction and investigated the delay of subjective awareness of the perturbations (Castiello and Jeannerod 1991; Castiello, Paulignan, and Jeannerod 1991). In these experiments, a simple vocal utterance (‘Tah!’) was used
87
aapc04.fm Page 88 Wednesday, December 5, 2001 9:21 AM
88
Common mechanisms in perception and action
Fig. 4.12 Effect of delayed tasks on action-numbsense. Action-numbsense was tested in JA for tactile stimuli delivered to the hand. Four different sessions tested the effect of 4 delays introduced between the stimulus delivery and the go-signal: 0, 1, 2, and 4 s. In each session 48 stimuli were randomly delivered to 6 possible locations (stars). The number of correct responses (black dots) decreased abruptly from an above chance (immediate and 1s) to a close-to-chance level (2 and 4s). This result suggests that the residual somatosensory information used to direct JA’s response was available only for a very short period (about 1s). (From Rossetti 1998.) by the subject to signal his or her awareness of the object perturbation. Comparison of the hand motor reaction time and the vocal reaction time showed that the vocal response consistently occurred after the motor corrections. As in preliminary experiments (Paulignan et al. 1991), the onset of motor correction was about 110 ms after the object displacement, and about 280 ms after the change in object size. However, the vocal responses occurred in both cases about 420 ms after the object’s perturbation. It was concluded that conscious awareness of the object perturbation lagged behind the motor reaction to this perturbation. These results also stressed the important role played by time factors with respect to the action–perception debate. The spontaneous variation of response speed in a patient with action-blindsight allowed us to note that faster sessions gave rise to a more signi1cant performance than the slower sessions (see Rossetti 1998). A similar effect was described in an experiment investigating the effect of movement time on the type of action control (Pisella et al. 2000: Exp. 1). In a ‘location–stop’ pointing experiment, one green target was initially presented and subjects were requested to point at it at instructed rates. This target remained stationary on 80% of the trials or could unexpectedly jump to the right or to the left at the time of movement onset. Subjects were instructed to point at the target, but to systematically interrupt their ongoing movement when the target jumped. The direction of the target
aapc04.fm Page 89 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.13 An automatic pilot for the hand. (a) Schematization of the ‘location–stop’ protocol, predictions, and results. Left column: Protocol. During a pointing task with a 300-ms movement duration constraint, the target could be displaced unexpectedly to the right or to the left at movement onset (in 20% of the trials). The instruction was to stop the ongoing movement whenever the target jumped. Central column: Predictions. Slow movements should allow subjects to stop their movement and not to touch the screen, whereas too-fast movements should not be affected by the target jump and thus should touch the programmed target location. Right column: Results. Three types of motor responses were successively observed when movement time increased. As expected, subjects touched the 1rst target location for the fastest movements, whereas
89
aapc04.fm Page 90 Wednesday, December 5, 2001 9:21 AM
90
Common mechanisms in perception and action
jump was thus irrelevant for this task. A strict compliance with the ‘stop’ instruction would imply that subjects would either succeed in stopping their movement or fail to interrupt their action and therefore reach the primary position of the target (Fig. 4.13(a)). In striking contrast to this prediction, a signi1cant percentage of corrective movements were performed in the direction of the target jump in spite of the ‘stop’ instruction (Figs 4.13(a), 4.16). After touching the displaced target, subjects were fully aware of their mistakes and spontaneously expressed a strong frustration. We explored whether the ongoing hand movement was corrected or interrupted with respect to movement times. Sampled movement times ranged from about 100 to 450ms with a Gaussian distribution; they corresponded to movement speeds because the distance between targets and starting point was constant in the experiment. Fig. 4.13(b) shows the number of corrected movements with respect to movement durations. Since they occurred in a restricted temporal window, escaping the slower processes of voluntary interruption, the involuntary corrections resulted from a failure to inhibit an automatic process of on-line visuomotor guidance. This ‘automatic pilot’ (see also Place 2000), systematically activated during movement execution, led subjects to produce disallowed corrective movements over a narrow range of movement times between about 150 to 300 ms. Over a given temporal window (about 200–240 ms), the same rate of correction was found in this location–stop condition and in a control ‘location–go’ condition where subjects had to correct the action in response to the target jump. Only movements slower than 300ms could be fully controlled by voluntary processes. In contrast to this normal pattern, a patient with a bilateral lesion of the posterior parietal cortex (Fig. 4.14) showed a lack of on-line automatic corrective processes, whereas intentional motor processes were preserved (Pisella et al. 2000: Exp. 3). This allows us to conclude that fast movements are controlled by a posterior parietal ‘automatic pilot’ (PAP) located in the dorsal stream. By contrast, slow movements are controlled by intentional motor processes that remain largely independent of the posterior parietal cortex. Accordingly, frontal patients tested on the same tasks exhibited a complete loss of intentional inhibition of their automatic corrections (see Pisella et al. 2000). Thus the notion of automatic pilot extended that of ‘hand-sight’ in the sense that it refers not only
they had enough time to intentionally stop their movement during slow trials. However, an intermediate class was also observed, in which subjects performed a signi1cant number of unwilled corrections. (b) Distribution of the unwilled automatic corrections performed in response to unexpected target jumps. Comparison with a control ‘location–go’ condition in which another group of subjects faced the same stimuli set but were instructed to perform corrections in response to the target jumps. The percentage of corrected pointing responses was calculated with respect to the total number of perturbed and unperturbed trials performed by 6 subjects. Corrected movements in response to target jumps appeared for the movement duration of 150ms and became signi1cant with respect to the motor variability observed in fast, unperturbed trials (speed–accuracy trade-off law) for the movement duration of 200 cms in both location-stop and location–go conditions. Automatic corrections were produced by the location–stop group until movement times of about 300ms, which allow voluntary control to fully prevail over automatic visual guidance. A total of 9% of all the perturbed trials were redirected toward the second location of the pointing target in this stop condition. For movement durations between 200 and 240ms, the correction responses were produced at the same rate by the location–go group (in accordance with instruction) and by the location–stop group (irrepressible reaction despite the instructed stop response). This indicates that these fast motor corrections result from the same automatic pilot in both groups of subjects. (Adapted from Pisella et al. 2000.)
aapc04.fm Page 91 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.14 Speci1c disruption of automatic corrections following a bilateral parietal lesion. This set of 1gures illustrates the pointing performance of three control subjects compared with a patient, IG with bilateral optic ataxia, in the ‘location–stop’ and in a ‘location–go’ condition where correcting movement toward the second target location was instructed. For each condition, the horizontal bars indicate the 95% con1dence intervals of movement time computed for all stationary targets. In the lower part of the 1gure, the vertical dotted line indicates the upper edge of the 95% con1dence interval of movement time computed for all non-interrupted perturbed trials (displayed as a dotted horizontal bar). When correction was instructed (upper panel), control subjects mainly produced corrections without increasing their movement time with respect to unperturbed trials (horizontal bar), whereas most of the corrections produced by the patient IG caused a large increase of movement duration. When the stop response was instructed (lower panel), the patient produced no corrective responses, whereas about 10% of the perturbed trials elicited disallowed corrective responses in the controls. For the interruption response, the patient exhibited a performance similar to normal subjects (vertical dotted lines). (Adapted from Pisella et al. 2000.)
91
aapc04.fm Page 92 Wednesday, December 5, 2001 9:21 AM
92
Common mechanisms in perception and action
Fig. 4.15 Interaction between reaction time and movement on-line control. Mean latencies of the responses given to target jumps with respect to three classes of movement times during a pointing task. Motor correction was instructed in response to target perturbation in location. A post-hoc classification of corrected versus un corrected movements showed that they corresponded to significantly different reaction time values. to unconscious visual processing by the action system, but also to an autonomous use of visual information which bypasses and even counteracts intention.
4.6.3 Interaction between movement preparation and execution In the ‘location–go’ condition described above, subjects faced the same unpredictable target jumps occurring at movement onset but were instructed to redirect their ongoing movement toward the second location of the target. Figure. 4.15 shows that whether perturbed trials were corrected following the target jump depended on their movement times and their reaction times. Three motor phases were identi1ed with respect to movement times. Movements faster than 150ms were ballistic and always reached the initial location of the target. No reaction was observed in response to the target jump. For durations between 150 to 300 ms, both corrected and uncorrected movements were observed. On-line reaction to the perturbation became possible but errors were still observed. All movements lasting more than 300 ms correctly responded to the target jump. Figure 4.15 shows that for the same movement time interval, corrected pointings tended to exhibit shorter latencies, that is, shorter durations available for movement programming. These shortlatency movements may have been less precisely or less rigidly programmed and they were consequently more sensitive to the on-line visuomotor guidance. In contrast, movements that bene1ted
aapc04.fm Page 93 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
from a longer and 1ner programming phase were less reoriented on-line by the automatic pilot in response to the target jump. Long-latency movements appeared therefore to be more rigidly executed as they were better programmed. Accordingly, they showed no 2exibility to respond to the target perturbation and reached the initial target location.
4.6.4 Conclusion As a conclusion, in addition to the anatomical connectivity between areas, we have to take the timing of the activation of cortical networks into account. As the action system seems to bene1t from overall faster processing than the identi1cation system, dissociation between action and cognition can result from time factors. Timing and speed-constraints seem to be the keys for a ventral/dorsal dissociation in the cerebral network. The participation of a given area in an anatomical network does not imply that it is absolutely necessary to the task being considered. Temporal constraints may be a reason for a connected area not to participate in a process (but other reasons can be evoked, such as a particular con1guration of the network, as shown at the beginning of Section 4.6).
4.7 Recipes The current chapter is based on a controversy between behavioural evidence and neuroanatomical data and we have proposed that temporal variables can provide a way to reconcile these two lines of evidence. However, numerous other parameters have been evoked throughout the previous sections. We found it useful to take an inventory of these parameters and attempt to classify them in order to compile a user’s manual for researchers interested in the relationship between sensorimotor and cognitive processes. For example, these directions for use should be particularly helpful in the 1eld of visual illusions, where a controversy is growing between the arguments for dissociation versus integration of the two visual systems. To enable valid comparisons between experiments, attention should be paid to the use of comparable conditions among the many axes that will be listed below. These recipes could also prove bene1cial in a rehabilitation context. In patients with one of the two systems damaged, they can be used to set optimal conditions allowing them to achieve better performance by recruiting residual visual-to-motor networks. Activating these residual functions could provide a means to activate the organization of compensatory mechanisms, and thus become a 1rst step of re-adaptation processes.
4.7.1 Brain lesion One obvious way to speci1cally isolate the dorsal stream is to study an individual with a lesioned ventral stream. Reciprocally, one obvious way to speci1cally isolate the ventral stream is to study an individual with a lesioned dorsal stream. As shown above, the neuropsychological double-dissociation observed between optic ataxia and visual agnosia has provided a key argument for building up the notion of a perception/action dichotomy on the top of the ventral/dorsal segregation. In the absence of one of the two visual systems, the processing performed by the other one can be expressed in the most pure fashion. In the same way, primary sensory lesions, such as encountered in action-blindsight and action-numbsense, seem speci1cally to restrict visual processing to the action system. It would be tempting to propose that it should be easier to study the properties of the dorsal stream in patients with blindsight rather than in patients with a bilateral ventral stream lesion. Indeed, neurologists
93
aapc04.fm Page 94 Wednesday, December 5, 2001 9:21 AM
94
Common mechanisms in perception and action
much more frequently encounter hemianopsia, which is a prerequisite to blindsight, than the kind of ventral stream lesion exhibited by DF (see Milner and Goodale 1995). However, the dissociation observed in this patient is more pronounced than that observed in action-blindsight patients (see Rossetti 1998). Although both types of patient fail to produce signi1cant verbal guesses about the object property being tested, Fig. 4.2 illustrates the greater variability observed in the motor task for a blindsight patient. This difference can be attributed to the different lesion sites. A V1 lesion is responsible for a full loss of cortical visual inputs, whereas a ventral lesion should leave the dorsal stream intact. In the case of blindsight, the only input to the dorsal stream appears to arise from subcortico-cortical projections (see Fig. 4.5), which obviously do not provide a normal input. Conversely, a difference may be noted between the two types of patient for the matching task, the performance also being poorer in action-blindsight than in visual agnosia (see Fig. 4.2).
4.7.2 Speci1c inputs As speci1c dorsal versus ventral brain lesions can result in a pure activation of one of the two visual systems, the functional properties of the anatomical network suggest that some visual features can selectively activate these systems. For example, it is known that most of the neuronal activity related to colour processing is found in the ventral stream, whereas the processing of location would be more speci1c to the dorsal stream (Heywood, Shields, and Cowey 1988; Ungerleider and Mishkin 1982). In a pointing experiment, Pisella et al. (2000: Exp. 2) tested whether the parietal automatic visuomotor guidance would extend to other visual attributes than target jump. In particular, would it also be observed when the change in target location is encoded through a chromatic perturbation? To test this, a green and a red target were presented simultaneously in the two positions used for the location–stop task. The subjects were instructed to point at the green one, and the colour of the two targets could be unexpectedly interchanged at movement onset. As the ‘location–stop’ group, subjects in the ‘colour-stop’ group were instructed to interrupt their ongoing movement in response to the perturbation. In contrast with the ‘location-stop’ group, no automatic corrective movements were observed in the ‘colour–stop’ group (Fig. 4.16). It was concluded that only intentional corrections can be produced on the basis of a colour cue and that the visuomotor transformations of the hand’s ‘automatic pilot’ may be speci1c to location processing and spatial vision. This speci1city for ‘dorsal attributes’ can be related to the partition of magnocellular and parvocellular inputs between the dorsal and the ventral stream. Another related explanation is that this speci1city is due to the processing time required for ‘ventral attributes’, which is not compatible with the expression of automatic processes. Indeed, irrepressible visuomotor corrections result not only from their automaticity (the processes of visuomotor guidance are inherent to movement execution) but also from their high speed relative to the slow process of voluntary control. It may be hypothesized that normal subjects produce unintentional corrections because the slow intentional inhibition process from the frontal lobe leaves enough time for the fast automatic corrective processes to alter the motor output. It is interesting to consider the strong contrast between the absence of a colour effect on the PAP (Pisella et al. 2000) and the signi1cant colour guesses performed by blindsight patients (Stoerig and Cowey 1992). The main difference between these two experimental situations seems to lie with the type of response investigated. In the case of the Pisella et al. study, movements are all programmed and initiated before the relevant event (target change in colour or location) appears. Therefore the target perturbation is only relevant to movement guidance and it is found that colour processing does not affect on-line motor control. In the case of colour-blindsight, simple responses have to be
aapc04.fm Page 95 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
simply initiated (and there is no proper execution control in simple key pressing). This discrepancy suggests that different sensory coding can be involved in different motor responses, and speci1cally in action preparation versus execution. In the same way, the use of simulated object jumps in conditions where a coloured light indicated which object to grasp from an array of several objects (e.g. Paulignan et al. 1991) should be reinterpreted in the light of the Pisella et al. (2000) results. The abrupt changes observed in the trajectories obtained by Paulignan et al. indicate that the ongoing movement was interrupted and replaced by a secondary movements directed towards the new object to grasp. Indeed, conditions where a real object displacement (and not a colour code) was triggered at movement onset gave rise to much smoother trajectories (Gréa et al. 2000), which are compatible with the activation of the automatic pilot. This is con1rmed by the observation that these corrections are completely disrupted following a posterior parietal lesion (Gréa et al. 2002). To summarize, using colour stimuli is more likely to activate the cognitive than the sensorimotor system. The on-line control of action has speci1c access to metric object properties, but remains unaffected by categorical properties such as colour. The issue of depth apprehension will be addressed in a further section (context processing).
4.7.3 Speci1c outputs The study of the dorsal–ventral dissociation in the motor context enables us to distiguish speci1c motor abilities, which are preserved by visual agnosia and blindsight patients, from other types of motor responses. These patients, with impairment of the ventral stream, remain able to direct an action and even to adapt the hand to unidenti1ed or even unseen objects. The preserved, immediate goal-directed action thus seems to be implemented in the dorsal stream. In contrast, matching, delayed, slow, and indirect motor responses are performed very poorly by these patients. Several types of indirect (in space) action have been tested. In action-numbsense, the patient was unable to indicate where he was stimulated on a drawing, whereas he pointed at an above chance level to these unperceived stimuli in a direct pointing task (Rossetti et al. 1995, 2001; Fig. 4.3). A related type of response is derived from the classical antisaccade paradigm. When subjects are requested to make a reverse movement with respect to what would be a direct action, they usually perform with a reduced accuracy. Interestingly, the pattern of errors observed for visually guided antisaccades is similar to that of the memory guided antisaccades (Krappmann 1998). This result is again coherent with the observation that movements shifted in space share their representation with movements shifted in time with respect to direct action. This idea was further con1rmed by experiments investigating antisaccades in a visual agnosic patient (DF). Dijkerman et al. (1997) observed that DF was performing accurate saccades in natural conditions but was strongly impaired for delayed and antisaccades (Milner, Dijkerman, and Carey 1999b). More recently the same idea was applied to hand pointing. As in the Pisella et al. study (2000), it was shown that fast movements are under the control of an automatic process. Interestingly this automatic guidance could participate in the direct pointing performance, but counteracted the anti-pointing task derived from the antisaccade paradigm (Day and Lyon 2000). Schneider and Deubel (this volume, Chapter 30) have also shown that the automatic orientation of attention that is observed around the location of the target of a saccade being prepared, is not observed in the case of antisaccades. Other experiments have also shown that anti-pointing performance in response to masked stimuli would be poorer than direct pointing (Price et al. 2001).
95
aapc04.fm Page 96 Wednesday, December 5, 2001 9:21 AM
96
Common mechanisms in perception and action
Fig. 4.16 Speci1c inputs. (a) In response to target jumps, unwilled correction movements occurred even when countermanded (location–stop condition). Corrections were performed to a signi1cant extent as compared to unperturbed trials despite the opposite instruction. (b) A colour switch between two targets (‘colour-cued’ target jump) was not able to elicit automatic corrections. Although the change in target location was physically the same as in the ‘location– stop’ condition, no signi1cant corrections toward the new green target were observed compared with unperturbed trials. In this condition, responses to the perturbation always complied with the stop instruction. (Adapted from Pisella et al. 2000.) Another type of action to consider is pantomime. In the Goodale et al. (1994) study of delayed actions in DF and normals, it was observed that the kinematic structure of pantomimed movements was similar to that of movements delayed by 2 or 30s. This study further supported the view that brain mechanisms underlying perceptual representations are quite independent of those activated during action, and stressed the necessity for motor representations to have an on-line access to target parameters. The effect of subtle changes in the type of motor output requested from subjects can also strongly affect the type of processing involved. A rather clear example has been provided by Bridgeman (this volume, Chapter 5) when he compared communicative pointing to instrumental pointing. Subjects
aapc04.fm Page 97 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
could be asked either to point at an object or to really act on it to push it down. The latter condition provided a stronger difference from the perceptual tasks used in the same experiments, suggesting that the former type of response could result from a less pure activation of the sensorimotor system. One should note that the issue of the reference frames used by the two systems (an egocentric one for the action system and an allocentric one for the perceptual system) has been more or less implicitly addressed in several sections of the present review, and is strongly dependent on the type of task performed (for recent reviews, see Desmurget et al. 1998; Goodale and Haffenden 1998). From the strong effect of the response on the type of process involved in an action, it seems easier to activate the cognitive mode rather than purely the sensorimotor mode of action control. Any experimental conditions departing from the natural, direct goal-directed action appear to less selectively activate both sensorimotor and cognitive systems in various proportions. As will be addressed below, this is true not only for spatial but also for temporal factors. One interesting parameter to consider when different tasks are compared is the respective amount of programming versus on-line control that is required by the task. Clearly reaching tasks, directly aimed at a real object (Day and Lyon 2000; Goodale et al. 1986; Pisella et al. 2000; Prablanc and Martin 1992), would generate the maximal activation of on-line control processes.
4.7.4 Delaying the action ‘Delaying an action can transform a theory.’ Agreeing with this quote from Goodale et al. (1992), the review presented in the previous sections demonstrates that there are numerous examples of a dramatic effect of delay on sensorimotor processes. Probably time appears to represent the most crucial dimension in the control of sensorimotor–cognitive interactions. Two conclusions can be reached on this issue: (1) sensorimotor representations are short-lived; (2) cognitive representations take over when a delay is introduced. In terms of recipes, one should note that the titration of the effect of delay on action seems to depend on the sensory modality. As shown in Fig. 4.12, the crucial delay to switch from a dominant sensorimotor activation to a dominant cognitive control would lie between 1 and 2 s in the tactile modality. Other experiments with the same action-numbsense patient showed that this delay was of about 4-s for the proprioceptive modality (Rossetti 1998; Rossetti et al. 2001). In the visual modality, most experiments used delays of several seconds (e.g. Bridgeman, this volume, Chapter 5; Goodale et al. 1994; Milner et al. 1999a). A 2-s delay appears to be long enough to strongly affect the motor output. Fewer experiments tested delays below this value, but it seems that a delay value of about 500ms would be suf1cient to generate sensible changes in the type of motor output (see Fig. 4.11; see Krappmann 1998; Rossetti 1998; White et al. 1994). There is a large amount of converging evidence, arising from three sensory modalities, that the sensorimotor representation can only be expressed within a short delay following stimulus presentation, providing a very simple recipe to study sensorimotor versus cognitive control of action. Another aspect worth mentioning here is that introducing a delay between the stimulus and the response should rapidly disrupt the ability to process on-line control of the action.
4.7.5 Speed constraints In the same way as normal actions can be delayed, a speed constraint can be added on any given task in order to explore short time-range phenomena. For example, the Pisella et al. (2000) and the Day and Lyon (2000) experiments showed that only very fast movements could be considered as
97
aapc04.fm Page 98 Wednesday, December 5, 2001 9:21 AM
98
Common mechanisms in perception and action
fully driven by an automatic pilot. Different values have been reported in the literature about the minimum movement duration compatible with on-line corrections (see Day and Lyon 2000; Pisella et al. 2000; Rossetti 1998). This value depends on the type of task, on movement amplitude, and on the experimental set-up. For example the value of about 200ms reported by Pisella et al., obtained for vertical pointing against gravity (and thus with higher torque values) on a computer screen, is higher than the usual value obtained in the horizontal plane, closer to 100ms (e.g. Desmurget et al. 1996; Gréa et al. 2000; Prablanc and Martin 1992). In terms of recipe, values of movement time between 200 and 250ms in the vertical plane (Pisella et al. 2000) and between 125 and 160ms in the horizontal plane (Day and Lyon 2000) seem to correspond to movements dominated by the parietal automatic pilot for the hand.
4.7.6 Visual illusions The two experiments on the effect of static visual illusions on action presented earlier in this chapter (Aglioti et al. 1995; Gentilucci et al. 1996) have given rise to a flurry of research on this topic. Other related experiments, using illusion-like phenomena linked to motion perception, are summarized by Bridgeman (2000). Until now we have referred to only a few experiments on visual illusion and action, but we have counted at least 12 articles published on this topic between 1999 and mid 2000. Most of these studies replicated the initial 1ndings made by the 1rst two papers, namely that the effect of visual illusions appears to be stronger on perceptual reports than on action performance. Nevertheless, a number of interesting features emerge from this hot topic, which may help the reader to prepare his own illusion recipe. First, a variety of visual illusions have been investigated, for which there is a relatively weak effect of the illusion on action. In addition to the Titchener’s circle (or Ebbinghaus size-contrast illusion) effect on grasping (Aglioti et al. 1995; Haffenden and Goodale 1998, 2000a) and pointing (Fisher, unpublished), the Müller–Lyer illusion has been studied during both pointing (Gentilucci et al. 1996) and grasping tasks (Otto-de Haart, Carey, and Milne 1999; Westwood, Chapman, and Roy 2000). Grip aperture during prehension was also shown to remain unaffected by the Ponzo illusion (Jackson and Shaw 2000) or less affected than perception by the horizontal–vertical illusion (Vishton et al. 1999: Exp. 1, but see below). Other illusions have been used to show that the positioning of the grasp remained less sensitive than perceptual tasks (Ellis, Flanagan, and Lederman 1999). Also, the developmental aspect of these effects was addressed by investigating the sensitivity of children to illusion (Kovacs 2000). It was found that children were less sensitive to the perceptual illusion than adults. The authors suggested that this may indicate a slower maturation of the ventral stream with respect to the dorsal stream of visual processing, which would be another way of supporting the dissociation between the two visual systems. Second, several limits of the relative insensitivity of the action system to visual illusion should be considered. In order for a visual illusion to produce no effects on action, one has to speculate that the neurological substrate that is responsible for generating the illusion lies somewhere around the ventral visual system. However, some illusions have been described as affecting very early visual processes, even at the retinal level (see Gentilucci et al. 1996). If an illusion does affect the visual processes taking place between the retina and the primary visual cortex, then the effect of this illusion (or at least of some of its components) would feed both the dorsal and the ventral stream, and one would expect action to be affected by this illusion. In the same vein, one has to address the
aapc04.fm Page 99 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
issue of the visual subsystem that is involved in the action. For example, it has been shown that the Ponzo illusion would affect grip size but not the force programmed to lift a target object (Jackson and Shaw 2000; see also Brenner and Smeets 1996). It is interesting to note that grip size is one of the metric properties of space that has been proposed to be processed within the dorsal stream (see Milner and Goodale 1995; Rossetti 1998), whereas the object weight has to be inferred from previous learning of the relationship between objects’ appearance and their weight. This association has to involve other structures than just the dorsal stream, for example the ventral stream and its connections to memory systems (see Jeannerod et al. 1994). Therefore it involves more off-line processing (prior to the action) than on-line motor guidance. Third, the issue of relative versus absolute coding of object metrics has been raised by Pavani et al. (1999), Vishton et al. (1999), and Franz et al. (2000). They observed that previous studies with the size-contrast illusion (the Titchener circles) presented subjects with two stimuli to compare in the perceptual task, whereas only one object was used in the pointing task (Aglioti et al. 1995). They performed experiments in which only one illusory object was presented at a time. Pavani et al. (1999) and Franz et al. (2000) used the size-contrast illusion and found that both perceptual and motor responses were affected by the illusion. Vishton et al. (1999), using the horizontal–vertical illusion, extended the initial observation made on the size-contrast illusion when two stimuli were presented at once, but showed that the perceptual effect could be suppressed when only one element of the display was presented. They also observed that grip scaling was strongly in2uenced by the illusion when subjects had to scale their movements to both vertical and horizontal dimensions of the triangular 1gure. The more complex visuomotor processing required by this latter task suggests that it may be assimilated to matching tasks in which the hand has to reproduce an object size between two 1ngers. In contrast to natural grasping action, these tasks have been proposed to involve ventral stream processing (Haffenden and Goodale 1998; Milner and Goodale 1995). Other experiments have shown that such tasks are much more in2uenced by illusory context than natural direct actions (Westwood et al. 2000). However, it remains that a relative judgement made between two stimuli seems to be more sensitive to illusion than the absolute estimate required by a simple action to one stimulus. Fourth, another possible confound has been raised to account for the discrepancy observed between several studies. Haffenden and Goodale (2000a) have explored the effect of the targetsurrounding elements gap in the Titchener illusion. They 1rst showed that a smaller target-2anker gap produced an apparent effect on grip scaling (see also Pavani et al. 1999). Then they investigated the effect of two neutral rectangular 2ankers that could be presented along the horizontal or the sagittal axis of the object. While a perceptual effect was observed in the two conditions, a motor effect appeared only when the 2anker lay on the sagittal direction, that is when it could interfere with the two grasping 1ngers. The effect of these 2ankers on action also varied with the target2anker distance, suggesting that elements from the 2-D stimulus could be considered as potential obstacles by the action system. Alternatively the difference between the effects of a visual context aligned with the depth or the frontal axis presented on figure 4.17 may explain such a result. These findings are likely to explain the variety of the effects reported in the literature. Altogether there seems to exist a reproducible effect of visual illusion on perception while the action system is less sensitive to illusory context. The variety of results found in the literature suggests that careful attention should be paid to the design of the stimuli, however, since the perceptual and the action systems may be sensitive to different stimulus properties. For example, the action system appears to be sensitive to motion-induced illusions (Smeets and Brenner 1995) and to depth cues (see below).
99
aapc04.fm Page 100 Wednesday, December 5, 2001 9:21 AM
100
Common mechanisms in perception and action
4.7.7 Context processing A tentative explanation for the effects of illusions on perception and action is that only the former system would be in2uenced by visual context. Arguments for this interpretation come from experiments performed with illusory set-ups. For example, the Roelofs effect was used by Bridgeman to compare perceptual and motor effects (Bridgeman, this volume, Chapter 5). As for illusions, a pronounced effect was found for perceptual estimates, whereas no signi1cant in2uence was observed on pointing responses. It has therefore been argued that the sensorimotor system would not be in2uenced by the visual context, as this seems to be the case for immediate (vs. delayed) pointing (e.g. Bridgeman 1997, 2000, this volume, Chapter 5; Rossetti 1998). However, several examples can be found of an in2uence of visual context on action. Several levels of context complexity may be considered here: intrinsic, extrinsic, and inferred contexts. First, the minimum level of visual context that can be tested during a simple action seems to be the size of a target. When subjects have to point to targets of different sizes, they exhibit a spontaneous modulation of their movement time as a function of both target size and distance. The expression of this relationship has been provided by Fitts (1954). If one accepts that the size of a target is the simplest context in the task of pointing to the centre of the target, then the effect of target size on pointing may be considered as an effect of intrinsic context. Not only normals but numerous types of brain damaged patients do comply with Fitts’s speed–accuracy trade-off law. This is also true for schizophrenic patients, who are known to be impaired for context processing (Saoud et al. 2000). Second, the effect of target size applies to sequential movements. The 1rst stroke of a two-stroke sequence is altered even if the size of only the distal target of a two-stroke movement is manipulated (e.g. Marteniuk et al. 1987). This robust effect even applies to schizophrenic patients (Saoud et al. 2000). Third, another example is derived from the experiments designed by Bridgeman about the Roelofs effect, where an illusory displacement of a target, caused by a change in position of a surrounding rectangular frame, affected phenomenological experience of target position, but not pointing responses (Bridgeman 1991; review in: Bridgeman 2000, this volume, Chapter 5). One of the conclusions from this work was that the sensorimotor system would be insensitive to the visual context. However, this interpretation stands in contrast with the reported 1nding that visual context in2uences the perception of the egocentric distance of a visual stimulus (e.g. Coello and Magne 2000). In order to unravel this issue, a recent study addressed the question of whether the dissociation between verbal identi1cation and reaching holds when the illusory target’s displacement is radial instead of horizontal, as was the case in the original report (Bridgeman 1991). The task for the participants was to verbally determine whether the second target was presented in the same location as the 1rst one, or to make a non-visually controlled pointing movement towards the second target. The results showed that a similar illusion of target displacement was obtained with verbal reports when the illusion stemmed from a horizontal or radial rectangle displacement (thus con1rming Bridgeman’s 1ndings). However, the striking outcome was that no effect of the rectangle position was observed in motor performance, but a very signi1cant effect was obtained with the displacement made along the radial axis (Fig. 4.17). This latter effect con1rmed the involvement of visual context in distance but not direction perception (Coello and Magne 2000). In addition, the amount of the illusion-induced effect was identical in the perceptual and the motor tasks. This 1nding has an important implication for theories of visual perception, in the sense that it shows that the distinction between visual perception for identi1cation and visual perception for action is task-and-stimulus
aapc04.fm Page 101 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.17 Context processing in the depth dimension. Participants were shown for 0.4s a visual target (diameter 8 mm) presented at 23cm and centred according to a radial or horizontal rectangle. Black and white symbols depict the results obtained for the two test-targets. Then both the target and the rectangle vanished for a period of 0.5ms, before reappearing for 0.4s at a similar or different location. Furthermore, the target was presented alone, or inside an uncentred rectangle (96×38mm), displaced either on one side or the other with respect to the target. The task was to verbally determine whether the 2-D target was in the same location as the 1rst, or to make a nonvisually controlled pointing movement towards it. Perceptual and motor responses exhibited the same bias only when the rectangular frame was presented in the radial orientation: the pointing distance was overestimated when the perceptual matching of the target was overestimated, and vice versa. (From Coello et al. 2001.) dependent. It should be emphasized though, that the absence of the visual target at the time of movement completion may also have reduced the in2uence of on-line motor control on movement guidance. Other experiments have shown that the depth dimension has a speci1c status with respect to the relations between perception and action. Dissociations can be observed between cognitive and sensorimotor responses to depth stimuli (see Dijkerman et al. 1998). However, interesting comparisons can be made concerning the ability of DF (the case with bilateral visual agnosia) to perform orientation judgements. She seems to perform better on perceptual judgements of orientation in depth under binocular viewing conditions than she would have when judging the orientation of an object (or slot) in the picture plane (Dijkerman et al. 1996; Goodale et al. 1991). On the basis of DF’s performance it seems that there is less difference between perception and action with regard to the orientation of objects in the depth plane than in the frontal plane (see also Carey, Dijkerman, and Milner 1998). Altogether, the effect of context on action seems to strongly depend on the experimental procedure. As shown by simple or sequential pointing, intrinsic properties of the goal of an action do affect motor parameters. More distal contexts, such as those responsible for the generation of static illusions or illusory movement, appear to be less effective on action, unless the depth dimension is considered. As mentioned in the visual illusion recipe section, the most relevant issue here appears to be the
101
aapc04.fm Page 102 Wednesday, December 5, 2001 9:21 AM
102
Common mechanisms in perception and action
level of processing concerned in each type of context. Intrinsic context such as target size or position in depth refers to a primary metric property of objects, whereas more complex contexts, such as a target array or an illusory set-up, imply integration mechanisms over time or space. In the latter case of the complex integration of a spatio-temporal context (cf. Fig. 4.17), only delayed actions are strongly affected. One should pay attention to these parameters when building an experiment to investigate the role of context on action.
4.7.8 Actual versus represented stimulus A remaining question about the difference between immediate and delayed tasks such as those presented in Fig. 4.10 is whether the observed effects can be attributed to the duration of the target presentation or to the delay from target onset to the response. In other words, does the availability of the target just prior to the initiation of the action in the immediate condition account for the difference between the two conditions? This question was explored in a simple experiment where subjects had their target 1nger held in place for a given delay prior to the go-signal versus shortly positioned on the target location, followed by a similar delay. A 2-s memory delay was suf1cient for the cognitive representation to affect the pointing distribution (as for the 5-s delay shown in Fig. 4.11). In contrast, no such effect was observed for the 2-s duration of the target presentation; that is, when no memorization was required (see Fig. 4.18). In this case, both types of visual representation are available: (1) the cognitive one, because the 2-s delay enabled the subject to encode the target location with respect to the target array, and (2) the on-line sensorimotor one, because the object has just disappeared upon the movement onset. The result (Fig. 4.18) shows that only the sensorimotor one seems to contribute to the motor output. Therefore priority seems to be given to the sensorimotor mode of processing when it is available (as shown by the Pisella et al. 2000, results described earlier). This result indicates that the effect of delay cannot be solely attributed to the slowness of the cognitive system, but is also due to the absence of a real target object that has to be represented because sensorimotor processes are short-lived (see Rossetti 1998). A similar logic can be used to explore the nature of the representation involved in action in neurological patients. In order to test for the ability of optic ataxia patients to process visuomotor information on-line, we performed another experiment with our patient with a bilateral posterior parietal lesion (IG). When IG was asked to delay her pointing or grasping action, she improved her poor ability to perform accurate actions (review in: Milner and Dijkerman 2001). Knowing that the effect of a memory delay is different from that of a long presentation of a stimulus in normals (see Fig. 4.18), we investigated this possibility with IG. She exhibited a better performance in both tasks, suggesting that both the long presentation and the delay enabled her to form a representation of the object that could be used by the action system. Given her lesion, this representation was postulated to be formed via the ventral stream (see also Milner et al. 1999a). An interesting question that leads from this is whether it is possible to generate a con2ict between this sustained cognitive representation and short-lived motor representations. Together with Prof. David Milner and Dr Chris Dijkerman, an experiment was designed in which an object was presented for 2s, then hidden behind a panel for 8s, then shown again. This procedure improved her grip scaling compared with an immediate action condition. Then a special session was performed in which, in some trials, the small object could be unexpectedly replaced by a large one, or vice versa. The speci1c question asked here was whether the grip formed by IG would follow the size of the present object or that of
aapc04.fm Page 103 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.18 Long presentation versus memory delay. Mean of ellipse orientations (beta in deg, see Fig. 4.8) as a function of various delays between target onset and go-signal, with the target remaining present during the delay (delay = ‘target duration’) or not (‘memory delay’). When the target remained present, for immediate pointings (control), 500ms or 2s delays, the orientation of the ellipses remained approximately around beta = 135deg, i.e. as for immediate pointing movements. In the absence of the target as soon as there was 2 s of memory delay the beta tended to displace toward the orientation of 90deg (illustrated by the interrupted line). This graph shows that the ellipse orientation 1tting the pointing distribution is contingent upon the presence or absence of the target during the delay. the internal representation formed after the presentation of the initial object in the same trial. The results clearly showed that her grip was initially scaled to her internal representation rather than to the actual object she was aiming at (see Fig. 4.19). Control subjects tested in the same conditions exhibited an on-line grip formation that was adapted to the object present in front of them at the time of the grasp (see also Milner and Dijkerman 2001). In addition to the effect found on grip size, the time to maximal grip aperture was reached earlier in the large→small condition than in the large condition for each of the six control subjects, whereas IG exhibited a similar value in the two conditions. On the particular trials where she ‘correctly’ opened her hand widely on ‘small→large’ catch trials, the wide grip aperture actually occurred abnormally late in the movement. This strongly suggests
103
aapc04.fm Page 104 Wednesday, December 5, 2001 9:21 AM
104
Common mechanisms in perception and action
Fig. 4.19 Represented versus actual object. This 1gure presents the four steps constituting one noncongruent trial of the delayed grasping task, producing a kind of object-size perturbation. In these non-congruent trials, occurring only in 20% of the trials, the object of size A presented before a 8-s delay is replaced by an object of size B for the initiation of the grasping movement. The 1gure illustrates the results obtained for controls and for patient IG, who has bilateral optic ataxia. IG produced a movement with a grip size based on the 1rst object visualized. She executed her movement quite rigidly as it was programmed during the memory delay, before 1nally adapting her grasp to the second object. Control subjects were less in2uenced in their grip size by the 1rst object presented, they rapidly adapted their 1nger aperture to object B and seemed to behave as in the case of immediate movements toward object B. Control subjects can use the on-line information to perform their movement, whereas the patient with lesion of the dorsal stream seemed to use slower and more rigid sensorimotor processes. that she was unable to process the change in size that had occurred during the delay fast enough to be able to update her ongoing movement. These results clearly con1rm that sensorimotor and cognitive representation have different time constraints: while sensorimotor representations can be elaborated in real time and are very short-lived, the cognitive representation needs more time to emerge and can be evoked off-line in the organization of action.
4.7.9 Motor and perceptual co-activation We have referred earlier to experiments where there appears to be a simultaneous activation of the sensorimotor and the cognitive systems in normal individuals and patients (Bridgeman 1997;
aapc04.fm Page 105 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.20 The speci1c effect of object feature verbalization. As in Fig. 4.9, subjects were requested to produce verbal estimates during a simple pointing to a proprioceptive target. A change in the distribution of the con1dence ellipse main axes was only observed when the verbalization concerned the target location (relevant to the action) and not when it concerned the target texture. Rossetti and Régnier 1995; reviews in Bridgeman 2000; Rossetti 1998). In a further experiment derived from the Rossetti and Régnier study, subjects were trained to identify two target features: location and texture (Pisella et al. 1996). During this training, a texture could be presented on any target location, without any systematic association between location and texture. Then on two different days, experimental sessions where performed in which subjects were requested, on each trial, to provide a verbal guess about the current target being presented. The number of verbal errors made during the pointings was similar in the two conditions, suggesting that the two tasks were equally dif1cult. When the guess applied to target texture, no speci1c effect was observed on ellipse orientation (see Fig. 4.20). By contrast, a signi1cant modulation of the endpoint spatial distribution was observed when the guesses applied to the target location. This result suggests that for an interaction to be observed between the cognitive and the motor system, experimental conditions have to activate verbal representation of particular features that are relevant to the action being performed. Only in such conditions does the representation elaborated in the cognitive system seem to overwrite the sensorimotor one. Another observation worth taking into consideration before trying this recipe was made on our action-blindsight (Rossetti 1998) and numbsense (Rossetti 2001) observations. We have seen earlier that the simultaneous motor–verbal task disrupted action performance. We have also had several opportunities to note that, once the subject was asked to provide a verbal guess at the end of a trial, then the following series of trial would show a disruption of performance as well. This is as if the only fact that the patient knew he might have to describe the stimulus activated cognitive processes which interfered with the residual sensorimotor ability.
105
aapc04.fm Page 106 Wednesday, December 5, 2001 9:21 AM
106
Common mechanisms in perception and action
4.7.10 Complex recipes Several of the ingredients described in the previous recipes can be combined into more complex recipes. For example, the effect of illusion and that of delays can be coupled. A detailed subject-bysubject analysis of the experiment testing the effect of the Roelofs effect on action showed that only half of the subjects exhibited a motor effect of the visual illusion (Bridgeman 1991). This observation became all the more interesting when it was observed that interposing an 8-s delay before the response forced all of the subjects to use spatial information that is biased by the perceptual illusion (Bridgeman 2000), replicating an earlier 1nding made on eye movements (Wong and Mack 1981). This result suggested that subjects might switch from motor to cognitive modes of sensory processing at differing delays after stimulus offset. In an elegant experiment, Gentilucci et al. (1996) also showed that introducing delays between line observation and onset of movement proportionally increased the effect of the Müller–Lyer illusion on the pointing response (see Fig. 4.21). It is also very interesting to notice that the in2uence of the illusion becomes particularly noticeable as the delay between stimulus presentation and movement onset increases. Another combination of ingredients from the previous recipes was made between a speci1c input and a speci1c output, namely colour and movement interruption. To co-activate the direct sensorimotor representation of the target responsible for automatic on-line motor control and the higher level cognitive representation of the same stimulus, we designed an experimental device producing double perturbations of a visual target at movement onset. These double perturbations involved a change in location, mainly processed in the dorsal stream and known to trigger a fast updating of the ongoing action, and a simultaneous change in colour, known to be mainly processed in the ventral stream and implying a categorization process. Unperturbed trials (80%), simple perturbations in either location or colour, and double perturbations were intermixed in each session (Pisella et al. 1998b; Pisella et al. 1999). Subjects had to point to the green targets, and to redirect their movement if the green target had jumped. The change in colour was associated with an instruction to immediately stop the ongoing movement. The double perturbation contrasts a combination of a dorsal visual input (location) and a dorsal motor response (automatic visuomotor guidance of a goal-directed movement) with a combination of a ventral visual input (colour) and a ‘ventral’ motor response (stop is a conditional motor response to colour change). On the one hand, a faster processing of the location attribute as with compared with the colour attribute is expected (Pisella et al. 1998; Rossetti et al. 1997). On the other hand, the ‘dorsal’ visuomotor guidance inherent to goal-directed execution is swift and automatic, contrary to the arbitrary association of a stop response with the red colour (Pisella et al. 2000). The double-perturbation condition therefore brings together the association of speci1c inputs and speci1c responses that offers the greatest temporal difference, allowing us to dissociate the systematic corrections driven by the dorsal ‘automatic pilot’ from the intentional motor control of the movement execution. A long temporal window of automatic behaviour should therefore be observed. The results of six subjects con1rmed these predictions. For pointing movements with durations shorter than about 200 ms, no effect of the perturbation was observed. For movement times ranging from 200 to about 280 ms, subjects behaved fully automatically in response to double-perturbed trials: they systematically redirected their movement toward the forbidden displaced target and touched it as frequently as when correction was actually instructed (in response to the simple location perturbation). The instructed response to movement interruption appeared progressively only for slower movements (Fig. 4.22). But 1nally, in this condition, subjects produced automatic disallowed corrections for movement times ranging from 200 to 450ms (a total of 15% of all trials) and expressed a strong frustration linked to the sensation of being unable to control their action intentionally.
aapc04.fm Page 107 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Fig. 4.21 Visual illusions and delayed action. Pointing biases induced by the two con1gurations of the Müller–Lyer illusion. Movement amplitude tended to increase in the open con1guration and to decrease in the closed con1guration, i.e. in the same direction as the perceptual illusion. Values plotted on this 1gure were normalized by subtracting the value obtained for the control con1guration. The effect of the illusion on pointing was very weak in the full-vision condition. It is noticeable that the effect of the illusion on movement amplitude increased when less information was available to the subject and when a delay was introduced between the stimulus presentation and the response. (From Rossetti 1998 based on Gentilucci et al. 1996.) These two latter cases of sensorimotor and cognitive coactivations took place at different timescales. In most of the data reviewed here, it has been shown that interactions between sensorimotor and cognitive processing of spatial information can occur only from the cognitive to the sensorimotor system (Bridgeman 1999; Rossetti 1998). In motor + verbal tasks tested with normals and patients, for example, the response production was affected in an ‘all or none’ pattern, the semantic representation of the target overwhelming the sensorimotor representation. More recent experiments have shown that the reverse pattern of interaction can be observed on a longer time-scale, through
107
aapc04.fm Page 108 Wednesday, December 5, 2001 9:21 AM
108
Common mechanisms in perception and action
Fig. 4.22 Double perturbation experiment. (a) Histograms of the percentage of corrected movements for each type of trials. Corrections were produced as instructed in response to the simple target jump (location perturbation) but also, despite the associated stop instruction, in response to the double perturbation. (b) The automatic corrections escaping subjects’ intentional motor control in response to the double perturbation were produced for movement times between 200 and 450ms. Surprisingly, in a temporal window including movement durations from 200 to 280ms, subjects showed the same rate of correction in accordance (in the simple location perturbation) and in contradiction with instruction (in the double perturbation). This result indicated that corrections from this restricted temporal window surely result from the same automatic processes of on-line visuomotor guidance.
adaptative changes (Bhalla and Prof1tt 2000; Rossetti et al. 1998b). This suggests that several time ranges have to be considered in the discussion of interactions between sensorimotor and cognitive processing of spatial information. As reviewed above, the effect of time can be fully explored by comparing immediate and delayed responses. However, some portion of reaction times inevitably pertains to the production of motor responses, whether produced immediately or delayed. Although
aapc04.fm Page 109 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
reaction times may be short (about 250ms in normals), one cannot exclude the possibility that this period of time is likely to favour an early effect of delay and/or of early semantic representations on sensorimotor processes. The experiments with speci1c target perturbations during ongoing aiming actions allowed us to explore possible interactions within a shorter time range following the stimulus onset. These series of experiments led us to distinguish programming and control of sensorimotor processes with respect to the direction of interactions between explicit and implicit processes. Programmed sensorimotor processes are more likely to be overwhelmed by cognitive processes. Automatic processes of on-line motor control, however, seem to escape to the in2uence of cogitive processes during a given temporal window following the target perturbation. In short, in order to obtain more involvement of the cognitive system in a motor output one may combine several of the ingredients listed above. Such complex recipes typically should associate a time constraint (delayed or speeded action) with any one of the spatial dimension parameters (illusion, context, speci1c stimuli, or ouputs).
4.8 Concluding remarks We have examined the link between behavioural evidence for a dissociation of the visual brain in two sensorimotor and cognitive subsystems and the neuroanatomical evidence for a complex and interconnected visual-to-motor network. We have found not only that neuropsychological evidence favours two separate visual systems, but also that it indicates that action, under speci1c circumstances, can be organized from one or the other of these two systems, con1rming that there is ‘a dual route to visuomotor action’ (Milner and Dijkerman 2001). The numerous results exposed here about the in2uence of delay or verbalization of speci1c inputs or outputs, in normals and patients, strongly support the existence of two distinct ways of encoding spatial information for action. Then we have provided recipes to control for the respective contribution of the two visual-to-motor systems to a given motor output. As a conclusion, it appears useful to consider a few neglected aspects of this complex organization of action toward visual goals.
4.8.1 A gradual time effect Our review of the literature demonstrates that an effect of time can be observed over several time ranges. Figure 4.23 attempts to describe these time ranges. Let us 1rst consider the level of a single action. For the shorter reaction time, and for tasks performed without delay, we have seen that implicit processing provides the only representation that can affect the motor outcome. This exclusive in2uence of short-lived sensorimotor representations may be carried out by the dorsal stream of visual processing. Then the voluntary control, which is activated more slowly, can in2uence the motor response, and gradually supplant the pure sensorimotor system. This in2uence may be supported by the ventral stream, probably through indirect projections made to the motor cortex via the prefrontal and premotor cortex. Different results can be reached if we consider a broader time-scale, which allows for slow adaptative processes to take place through sensorimotor plasticity. Both visuomotor adaptation in neglect patients (Rossetti et al. 1998b, 1999b) and ageing (see Bhalla and Prof1tt 2000) have been shown to affect the elaboration of explicit cognitive representation of spatial information.
109
aapc04.fm Page 110 Wednesday, December 5, 2001 9:21 AM
110
Common mechanisms in perception and action
Stimulus
Single action
Plasticity
Online only ... Voluntary control ... Adaptation
Implicit only Dorsal
Explicit Ventral?
Input Implicit
Time
Explicit
Cerebellum?
Fig. 4.23 Time constraints and implicit–explicit processing. Tentative synthesis of the temporal constraints on the relationship between sensorimotor and cognitive processing. As reviewed in the present chapter, sensorimotor processing can be performed faster than cognitive processing, which is particularly useful for on-line motor control. The plausible neural substrate for this particular speed seems to be the dorsal pathway. Slower action can be supervised by voluntary control, which implies a transfer from the explicit to the implicit system. For example, the verbalization of the movement goal location during the action induces a loss of the implicit residual abilities observed in blindsight and numbsense patients and a change in the reference frame used to control the action. Interestingly, these effects are similar to the effect of a delay between the stimulus delivery and the action. These interactions might involve the posterior or the anterior portion of the ventral pathway. To evoke a transfer from the implicit to the explicit mode of sensory processing, one has to study a more extended time-scale. This time-scale allows for plasticity to gradually develop and thus implies that several movements are involved. Prism adaptation provides an example of such transfer from an implicit sensorimotor level to an explicit space representation. (From Pisella and Rossetti 2000.)
4.8.2 An action–perception gradient Although the dissociation between perception and action found in neurological patients is often used to argue for a clear segregation between two visual systems, it should be noticed that the performance exhibited by these patients cannot be taken as identical to normal performance. The motor production obtained in visual agnosia, and especially in blindsight and numbsense patients, is much more variable than that of normals. In our view, this suggests that both visual systems have to cooperate in order to produce normally accurate behaviour. On the one hand, neuropsychological investigations have shown that matching responses (such as indicating an object size or orientation with the hand) provide similar results to purely perceptual tasks (e.g. Goodale and Milner 1992; Goodale et al. 1991; Jeannerod and Rossetti 1993; Milner and Goodale 1995; Milner et al. 1998; Perenin and Rossetti 1996). On the other hand, an elegant series of experiments investigating slope perception in normals (see Bhalla and Prof1tt 2000) have shown a dissociation between ann (action-like) matching response and a perceptual–verbal report. The distinction between instrumental and communicative pointing described by Bridgeman (this volume, Chapter 5) also emphasizes such a progression from pure sensorimotor responses to verbal reports, via intermediate processes such as pointing to designate. Taken as a whole these results therefore
aapc04.fm Page 111 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
suggest that motor matching responses may lie somewhere in between the action and perception sides of vision, thus offering a third term to the dichotomy that is usually defended (Rossetti, Pisella, and Pélisson 2000). If one accepts the idea of a continuum between a pure sensorimotor system and a pure perceptual system, then the results summarized in this review suggest that the main parameter of this gradient would be the amount of on-line processing participating in a given task. Thus the sensorimotor end of this continuum would be pure on-line motor control. The only way to observe pure sensorimotor activation appears to be to favour the hand automatic pilot. A pure automatic pilot produces a greater output variability than a natural behaviour, because an unconstrained movement is usually slow enough to also allow some participation of the cognitive system in the action. In addition, this automatic pilot is not capable of planning action. By contrast, a pure ‘perceptual control of action’ (Milner and Dijkerman 2001) is only seen for very slow actions or in patients with damage to the dorsal system. This indicates that both routes to action have to cooperate under natural circumstances, even if each of them in turn can dominate a given aspect of behaviour. For example, a movement can be initiated toward a target that has been selected by the cognitive system and then be corrected on-line by a sensorimotor system such as the parietal automatic pilot. The example of the Ponzo illusion shows that the force applied to lift an object at the end of the movement is not controlled by the same mechanism as the grip aperture (Brenner and Smeets 1996). Nevertheless, both systems participate in the action, grip force probably simply being speci1ed before the action is initiated while grip aperture is controlled on-line. As a whole, performing an action cannot be considered as only a sensorimotor process and, once again, the two ends of the perception–action gradient do participate in daily behaviour.
4.8.3 A dorsal–ventral gradient? Within the purest ‘vision for action’ system there seem to be an anatomical gradient between the area responsible for space-based action and those responsible for object-based action. Electrophysiological data in monkeys (Sakata and Taira 1994) and lesion studies in human subjects (Binkofski et al. 1998) seem to agree that, within the intraparietal cortex, the more rostral part is primarily involved in simple actions such as pointing, whereas the more ventral part is involved in grasping. This anatomical organization seems to constitute the 1rst step of a more global gradient between two extremes whose prototypes would be visual pointing (simple space-based action) and object recognition (object-based cognitive processing) (see Pisella and Rossetti 2000; Rossetti et al. 2000). As argued above, the action end of this continuum would be the parietal automatic pilot (Pisella et al. 2000). At the neuropsychological level, the dichotomy of the visual system is best argued from the double dissociation between optic ataxia and visual agnosia (see Goodale and Milner 1992; Jeannerod and Rossetti 1993; Milner and Goodale 1995; Rossetti 1998). However, the very notion of this ‘double-dissociation’ may be questioned. Unlike visual agnosia, it should be kept in mind that optic ataxia does not impair patients in everyday life. In order to observe a motor de1cit one has to require the patients to perform goal-directed actions in peripheral vision, that is, in non-natural conditions. This condition is well-known to increase the need for on-line control, because it allows only for a crude target position encoding prior to movement onset. The main problem encountered by these patients in central vision may be the automatic sensorimotor control performed by the PAP.
111
aapc04.fm Page 112 Wednesday, December 5, 2001 9:21 AM
112
Common mechanisms in perception and action
In this case, one has to consider that the double-dissociation lies between cognitive identi1cation and automatic motor control rather than between cognitive processing and action in general. Along this line, it is interesting to consider the effect of lesions lying in the brain area just between the focus of lesions producing optic ataxia and that of lesions producing visual agnosia, that is the temporo-parietal junction. The best-documented de1cit following such a lesion in a human is hemispatial neglect. On the one hand, neglect has been considered to be dissociated from optic ataxia, in that visuomotor de1cits reported for the latter are not observed for the former (Milner 1997). On the other hand, visual neglect has been shown to involve some de1cit on the action side (e.g. Mattingley et al. 1998), which implies that neglect can also be distinguished from visual agnosia. To conclude, we propose that the action-perception gradient can be matched onto a dorsal–ventral gradient. The end of this latter gradient would be represented by the PAP.
4.8.4 Implications for visual awareness One of the key distinctions described between sensorimotor and cognitive processing is the poor level of visual awareness associated with the sensorimotor processing. Whereas movement guidance can be based on a partial analysis of the target attributes, identi1cation and consciousness imply that a binding of all object properties is achieved (see Revonsuo and Rossetti 2000). However, it would be oversimplistic to identify action and implicit processing on one side and identi1cation and consciousness on the other. Not all implicit processing takes place in the posterior parietal cortex, and not all actions rely only on this structure. Unconscious control of more complex sequential actions may involve other structures, such as the basal ganglia (e.g. Rolls 1999). It has also been known for a long time that unconscious processes, either at a low level (Helmholtz) or at a high semantic level (Freud), can affect not only actions but many aspects of cognition as well, and that semantic processes are not necessarily conscious. Still, particular aspects of sensory processing for the purpose of action, and especially for movement guidance, remain fully independent from conscious perception. Therefore action can be directed to unconsciously perceived stimuli, even though conscious awareness allows for a perceptual control of action. Speci1cally, consious visual perception and intention select the goal for the action, but the realization of the action may escape from their control. In the same way as unconscious processes may participate in goal speci1cation, as is clearly shown by the Simon effect (see Hommel 2000), such processes play an even stronger role in the action control. Although dissociation has proved to be a useful tool to understand the basic functions of the brain, which have been atomised by most scienti1c approaches, the understanding of complex functions requires a more global account of how our mind works—an account that implies a synthetic rather than an analytic approach (see Rossetti and Revonsuo 2000b). We propose that the temporal dimension is one of the keys to understanding complex interconnected networks such as the visual brain. Because of the links between the perception–action debate and issues of conscious awareness and intentionality, we suggest here that the strength of temporal factors may be relevant to these issues as well. As proposed by Milner and Dijkerman (2001), it may be that the role of consciousness is primarily to delay action in order to gain behavioural ef1ciency. Further, if an animal becomes able to slow down, delay, or inhibit immediate actions, it may also become able to reach for a better (hidden or internal) goal. Restricting the use of automatic processes such as the PAP to the regulation of action can improve the execution of intentional actions, without interfering with decision processes.
aapc04.fm Page 113 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Acknowledgements The authors wish to thank Yann Coello, Chris Dijkerman, Glyn Humphreys, Robert MacIntosh, David Milner, Denis Pélisson, Gilles Rode, Caroline Tilikete, Alain Vighetto, and an anonymous referee for comments and stimulating discussions on the issues presented here. This work was supported by INSERM, CNRS, Programme Cognitique (COG118), the Center for Consciousness Research (University of Arizona), and by the McDonnell Pew Foundation.
Note 1. . . . Contrasting with naïve conceptions of perception as a pure bottom-up process, the idea has been proposed by von Helmholtz that perception results from unconscious inductive inferences. Although physiological studies of the visual system have long been focused on how visual images are constructed through hierarchically organized stages of processing, the same idea of a dialogue between bottom-up and top-down processes is now being applied to the understanding of vision. This two-way description of vision and perception in general is also widely acknowledged by psychologists and philosophers, so much so that the idea that ‘there is no such thing as immaculate perception’ has been defended (Kosslyn and Sussman, 1995). The most cited experimental evidence for the implication of descending in2uences on perception is the case of ambiguous 1gures, for which perception can alternate between two possible interpretations of the visual input, even through the memorized image can be subjected to other interpretation. Visual illusions are also often considered as a clear example of the interpretation (and contamination) of retinal information involved in perception. (Rossetti 1999, p. 141.)
References Aglioti, S., DeSouza, J.F.X., and Goodale, M.A. (1995). Size-contrast illusions deceive the eye but not the hand. Current Biology, 5(6), 679–685. Bar, M. (2000). Conscious and nonconscious processing of visual identity. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 153–174. Amsterdam: Benjamins. Bhalla, M. and Prof1tt, D.R. (2000). Geographical slant perception: Dissociation and coordination between explicit awareness and visually guided actions. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 99–128. Amsterdam: Benjamins. Binkofski, F., Dohle, C., Posse, S., Stephan, K.M., Hefter, H., Seitz, R.J., and Freund, H.-J. (1998). Human anterior intraparietal area subserves prehension: A combined lesion and functional MRI activation study. Neurology, 50, 1253–1259. Brenner, E. and Smeets, J.B.J. (1996). Size illusion in2uences how we lift but not how we grasp an object. Experimental Brain Research, 111, 473–476. Bridgeman, B. (1991). Complementary cognitive and visuomotor image processing. In G. Obrecht and L.W. Stark (Eds.), Presbyopia Research: From molecular biology to visual adaptation, pp. 189–198. New York: Plenum Press. Bridgeman, B. (1992). Conscious vs unconscious processes: The case of vision. Theory Psychology, 2(1), 73–88. Bridgeman, B. (2000). Interactions between vision for perception and vision for behavior. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 17–39. Amsterdam: Benjamins.
113
aapc04.fm Page 114 Wednesday, December 5, 2001 9:21 AM
114
Common mechanisms in perception and action
Bridgeman, B., Hendry, D., and Stark, L. (1975). Failure to detect displacement of the visual world during saccadic eye movements. Vision Research, 15, 719–722. Bridgeman, B., Lewis, S., Heit, F., and Nagle, M. (1979). Relation between cognitive and motor-oriented systems of visual perception. Journal of Experimental Psychology: Human Attention and Performance 5, 692–700. Bridgeman, B., Kirch, M., and Sperling, A. (1981). Segregation of cognitive and motor aspects of visual function using induced motion. Perception and Psychophysics, 29, 336–342. Bridgeman, B., Peery, S., and Anand, S. (1997). Interaction of cognitive and sensorimotor maps of visual space. Perception and Psychophysics, 59(3), 456–469. Bullier, J., Girard, P., and Salin, P.-A. (1994). The role of area 17 in the transfer of information to extrastriate visual cortex. In A. Peters and K.S. Rockland S. (Eds.), Cerebral cortex 10, pp. 301–330. New York: Plenum Press. Bullier, J., Schall, J.D., and Morel, A. (1996). Functional streams in occipito-frontal connections in the monkey. Behavioural Brain Research, 76, 89–97. Cajal, S.R. (1909). Histologie du système nerveux de l’homme et des vertébrés. Paris: Maloine. Campbell, F.G. and Wurtz, R.H. (1978). Saccadic ommission: Why we do not see a grey-out during a saccadic eye-movement. Vision Research, 18, 1297–1303. Carey, D.P., Dijkerman, H.C., and Milner, A.D. (1998). Perception and action in depth. Consciousness and Cognition, 7 (3), 438–453. Castiello, U. and Jeannerod, M. (1991). Measuring time to awareness. Neuroreport, 2, 797–800. Castiello, U., Paulignan, Y., and Jeannerod, M. (1991). Temporal dissociation of motor responses and subjective awareness. A study in normal subjects. Brain, 114, 2639–2655. Coello, Y. and Magne, P. (2000). Determination of target position in a structured environment: Selection of information for action. Eur. J. Cog. Psychol., 12, 489–519. Coello, Y., Magne, P., and Plenacoste, P. (2000). The contribution of retinal signal to the speci1cation of target distance in a visuo-manual task. Current Psychology Letters, 3, 75–89. Coello, Y., Richaud, S., Magne, P., and Rossetti, Y. (2001) Vision for location discrimination and vision for action: Anisotropy in the induced Roelofs effect. Unpublished manuscript. Colby, C.L., Gattas, R., Olson, C.R., and Gross, C.G. (1988). Topographic organisation of cortical afferents to extrastriate visual area PO in the macaaque: A dual tracer study. Journal of Comparative Neurology, 269, 392–413. Colent, C., Pisella, L., Bernieri, C., Rode, G., and Rossetti, Y. (2000). Cognitive bias induced by visuomotor adaptation to prisms: A simulation of unilateral neglect in normals? Neuroreport, 11, 9, 26, 1899–1902. Day, B.L. and Lyon I.N. (2000).Voluntary modi1cation of automatic arm movements evoked by motion of a visual target. Experimental Brain Research, 130(2), 159–168. Dehaene, S., Naccache, L., Le Clec, H. G., Koechlin, E., Mueller, M., Dehaene-Lambertz, G., van de Moortele, P.-F., and Le Bihan, D. (1998). Imaging unconscious priming. Nature, 395, 597–600. Desmurget, M. and Prablanc, C. (1997). Postural control of three-dimensional prehension movements. Journal of Neurophysiology, 77, 452–464. Desmurget, M., Prablanc, C., Rossetti, Y., Arzi, M., Paulignan, Y., Urquizar, C., and Mignot, J.C. (1995). Postural and synergic control for three-dimensional movements of grasping. Journal of Neurophysiology, 74, 905–910. Desmurget, M., Prablanc, C., Arzi, M., Rossetti, Y., Paulignan, Y., and Urquizar, C. (1996). Integrated control of hand transport and orientation during prehension movements. Experimental Brain Research, 110, 265–278. Desmurget, M., Epstein, C.M., Turner, R.S., Prablanc, C., Alexander, G.E., and Grafton, S.T. (1999). Role of the posterior parietal cortex in updating reaching movements to a visual target. Nature Neuroscience, 2(6), 563–567. Dijkerman, H.C. and Milner, A.D. (1998). The perception and prehension of objects oriented in the depth plane. II. Dissociated orientation functions in normal subjects. Experimental Brain Research, 118, 408–414. Dijkerman, H.C., Milner, A.D., and Carey, D.P. (1996). The perception and prehension of objects oriented in the depth plane. I. Effect of visual form agnosia. Experimental Brain Research, 112, 442–451. Dijkerman, H.C., Milner, A.D., and Carey, D.P. (1997). Impaired delayed and anti-saccades in a visual form agnosic. Experimental Brain Research, Sup. 117, 566 (abstract). Driver, J. and Mattingley, J.B. (1998). Parietal neglect and visual awareness. Nature Neuroscience 1(1), 17–22. Eimer, M. and Schlaghecken, F. (1998). Effects of masked stimuli on motor activation: Behavioral and electrophysiological evidence. Journal of Experimental Psychology: Human Perception and Performance, 24(6), 1737–1747. Ellis, R.R., Flanagan, J.R., and Lederman, S.J. (1999). The in2uence of visual illusions on grasp position. Experimental Brain Research, 125(2), 109–114.
aapc04.fm Page 115 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Faugier-Grimaud, S., Frenois, C., and Stein, D.G. (1978). Effects of posterior parietal lesions on visually guided movements in monkeys. Neuropsychologia, 16, 151–168. Faugier-Grimaud, S., Frenois, C., and Peronnet, F. (1985). Effects of posterior parietal lesions on visually guided movements in monkeys. Experimental Brain Research, 59, 125–128. Fitts, P.M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 47, 381–391. Flanagan, J.R. and Beltzner, M.A. (2000). Independence of perceptual and sensorimotor predictions in the size– weight illusion. Nature Neuroscience, 3(7), 737–741. Franz, V.H., Gegenfurtner, K.R., Bülthoff, H.H., and Fahle, M. (2000). Grasping visual illusions: No evidence for a dissociation between perception and action. Psychological Science, 11, 20–25. Gentilucci, M., Chief1, S., Daprati, E., Saetti, M.C., and Toni, I. (1996). Visual illusion and action. Neuropsychologia, 34(6), 369–376. Girard, P. (1995). Anatomic and physiologic basis of residual vision after damage to the primary visual area. Rev. Neurol., (Paris), 151:8–9, 457–465. Goodale, M.A. (1983). Neural mechanisms of visual orientation in rodents: Target vs places. In A. Hein and M. Jeannerod (Eds.), Spatially oriented behavior, pp. 35–62. New York: Springer-Verlag. Goodale, M.A. and Haffenden, A. (1998). Frames of reference for perception and action in the human visual system. Neuroscience and Biobehavioral Reviews, 22, 161–172. Goodale, M.A. and Milner, A.D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15(1), 20–25. Goodale, M.A. and Servos, P. (1992). Now you see it, now you don’t: How delaying an action can transform a theory. Behavioral and Brain Sciences, 15(2), 335–336. Goodale, M.A., Pélisson, D., and Prablanc, C. (1986). Large adjustments in visually guided reaching do not depend on vision of the hand or perception of target displacement. Nature, 320(6064), 748–750. Goodale, M.A., Milner, A.D., Jakobson, L.S., and Carey, D.P. (1991). A neurological dissociation between perceiving objects and grasping them. Nature, 349, 154–156. Goodale, M.A., Jakobson, L.S., and Keillor, J.M. (1994). Differences in the visual control of pantomimed and natural grasping movements. Neuropsychologia, 32, 1159–1178. Gréa, H., Desmurget, M., and Prablanc, M. (2000). Postural invariance in three-dimensional reaching and grasping movements. Experimental Brain Research, 134, 2, 156–162. Gréa, H., Pisella, L., Rossetti, Y., Prablanc, C., Desmurget, M., Tilikete, C., Grafton, S., Vighetto, A. (2002). A lesion of the posterior parietal cortex disrupts on-line adjustments during aiming movements. Neuropsychologia, in press. Haffenden, A. and Goodale, M.A. (1998). The effect of pictorial illusion on prehension and perception. Journal of Cognitive Neurosciences, 10, 122–136. Haffenden, A. and Goodale, M.A. (2000a). Independent effects of pictorial displays on perception and action. Vision Research, 40(10–12), 1597–1607. Haffenden, A. and Goodale, M.A. (2000b). Perceptual associations and visuomotor programming. Journal of Cognitive Neuroscience, 12, 6, 950–964. Hartje, W. and Ettlinger, G. (1973). Reaching in light and dark after unilateral posterior parietal ablations in the monkey. Cortex, 9, 346–354. Heywood, C.A., Shields, C., and Cowey, A. (1988). The involvement of the temporal lobes in colour discrimination. Experimental Brain Research, 71(2), 437–441. Hommel, B. (2000). Intentional control of automatic stimulus-response translation. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 221–242. Amsterdam: Benjamins. Humphreys, N.K. and Weiskrantz, L. (1967). Vision in monkey after removal of striate cortex. Nature, 215, 595–597. Humphreys et al. (2000). A P18, in press. Hyvärinen, J. and Poranen, A. (1974). Function of the parietal associative area 7 as revealed from cellular discharges in the alert monkey. Brain, 97, 673–692. Jackson, S.R. (2000). Perception, awareness and action: Insights from blindsight. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 73–98. Amsterdam: Benjamins. Jackson, S.R. and Shaw, A. (2000). The Ponzo illusion affects grip-force but not grip-aperture scaling during prehension movements. Journal of Experimental Psychology: Human Perception and Performance, 26(1), 418–423.
115
aapc04.fm Page 116 Wednesday, December 5, 2001 9:21 AM
116
Common mechanisms in perception and action
Jakobson, L.S., Archibald, Y.M., Carey, D.P., and Goodale, M.A. (1991). A kinematic analysis of reaching and grasping movements in a patient recovering from optic ataxia. Neuropsychologia, 29, 803–809. Jeannerod, M. (1986). The formation of 1nger grip during prehension. A cortically mediated visuomotor pattern. Behavioural Brain Research, 19, 99–116. Jeannerod, M. (1988). The neural and behavioural organization of goal-directed movements. Oxford: Oxford University Press. Jeannerod, M. (1994). The representing brain: Neural correlates of motor intention and imagery. Behavioral and Brain Sciences, 17(2), 187–245. Jeannerod, M. and Rossetti, Y. (1993). Visuomotor coordination as a dissociable function: Experimental and clinical evidence. In C. Kennard (Ed.), Visual perceptual defects, Baillière’s clinical neurology, international practise and research, pp. 439–460. Baillière Tindall/Saunders. Jeannerod, M., Decety, J., and Michel, F. (1994). Impairment of grasping movements following a bilateral posterior parietal lesion. Neuropsychologia, 32(4), 369–380. Jeannerod, M., Arbib, M.A., Rizzolatti, G., and Sakata, H. (1995). Grasping objects: The cortical mechanisms of visuomotor transformation. Trends in Neurosciences, 18(7), 314–320. Jeeves, M.A. and Silver, P.H. (1988). The formation of 1nger grip during prehension in an acallosal patient. Neuropsychologia, 26, 153–159. Komilis, E., Pélisson, D., and Prablanc, D. (1993). Error processing in pointing at randomly feedback-induced double-step stimuli. Journal of Motor Behavior, 25(4), 299–308. Kosslyn, S.M. and Sussman, A.L. (1995). Roles of imagery in perception: Or, there is no such thing as immaculate perception? In M.S. Gazzaniga (Ed.), The cognitive neurosciences, pp. 1035–1042. Cambridge, MA: MIT Press. Kovacs, I. (2000). Human development of perceptual organization. Vision Research, 40(10–12), 1301–1310. Krappmann, P. (1998). Accuracy of visually directed and memory-guided antisaccades in man. Vision Research, 38, 2979–2985. Marteniuk, R.G., MacKenzie, C.L., Jeannerod, M., Athenes, S., and Dugas, C. (1987). Constraints on human arm movement trajectories. Canadian Journal of Psychology, 41(3), 365–378. Matin, E., Clymer, A.B., and Matin, L. (1972). Metacontrast and saccadic suppression. Science, 178, 179–182. Mattingley, J.B., Husain, M., Rorden, C., Kennard, C., and Driver, J. (1998). Motor role of human inferior parietal lobe revealed in unilateral neglect patients. Nature, 392, 179–182. Milner, A.D. (1998). Streams and consciousness: Visual awareness and the brain. Trends in Cognitive Sciences, 2(1), 25–30. Milner, A.D. and Dijkerman, C. (2001). Direct and indirect visual routes to action. In B. de Gelder, E.H.F. de Haan, and C.A. Heywood (Eds.), Varieties of unconscious processing: new 1ndings and new comparisons. Oxford University Press, in press. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action (Oxford Psychology Series 27). Oxford: Oxford University Press, 248 pp. Milner, A.D., Harvey, M., and Pritchard, C.L. (1998). Visual size processing in spatial neglect. Experimental Brain Research, 123, 192–200. Milner, A.D., Paulignan, Y., Dijkerman, H.C., Michel, F., and Jeannerod, M. (1999a). A paradoxical improvement of misreaching in optic ataxia: New evidence for two separate neural systems for visual localization. Proceedings of the Royal Society of London B, 266, 2225–2229. Milner, A.D., Dijkerman, H.C., and Carey, D.P. (1999b). Visuospatial processing in a pure case of visual-form agnosia. In N. Burgess, K. Jeffery, and J. O’Keefe (Eds.), The hippocampal and parietal foundations of spatial cognition, pp. 443–466. Oxford: Oxford University Press. Mohler, C.W. and Wurtz, R.H. (1977). Role of striate cortex and superior colliculus in visual guidance of saccadic eye movements in monkeys. Journal of Neurophysiology, 40, 74–94. Mon-Williams, M. and Bull, R. (2000). The Judd illusion: Evidence for two visual streams or two experimental conditions? Experimental Brain Research, 130(2), 273–276. Morel, A. and Bullier, J. (1990). Anatomical segregation of two cortical visual pathways in the macaque monkey. Visual Neuroscience, 4, 555–578. Mountcastle, V.B., Lynch, J.C., Georgopoulos, A., Sakata, H., and Acuna, C. (1975). Posterior parietal association cortex of the monkey: Command functions for operations within the extrapersonal space. Journal of Neurophysiology, 38, 871–908. Neumann, O. and Klotz, W. (1994). Motor responses to nonreportable, masked stimuli: Where is the limit of direct parameter speci1cation? In C. Umiltà and M. Moscovitch (Eds.), Attention and performance XV: Conscious and nonconscious information processing, pp. 123–150. Cambridge, MA: MIT Press.
aapc04.fm Page 117 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Nowak, L. and Bullier, J. (1997). The timing of information transfer in the visual system. In J. Kaas, K. Rochland, and A. Peters (Eds.), Extrastriate cortex in primates, pp. 205–241. New York: Plenum Press. Otto-de Haart, E.G., Carey, D.P., and Milne, A.B. (1999). More thoughts on perceiving and grasping the Müller–Lyer illusion. Neuropsychologia, 37(13), 1437–1444. Paillard, J. (1987). Cognitive versus sensorimotor encoding of spatial information. In P. Ellen and Thinus-Blanc (Eds.), Cognitive processes and spatial orientation in animal and man, pp. 43–77. Dordrecht: Nijhoff. Paillard, J. (1991). Motor and representational framing of space. In J. Paillard (Ed.), Brain and space, pp. 163–181. Oxford: Oxford University Press. Paillard, J., Michel, F., and Stelmach, G. (1983). Localization without content. A tactile analogue of ‘Blindsight’. Archives of Neurology, 40, 548–551. Paulignan, Y., MacKenzie, C.L., Marteniuk, R.G., and Jeannerod, M. (1991). Selective perturbation of visual input during prehension movements. 1. The effect of changing object position. Experimental Brain Research, 83, 502–512. Pavani, F., Boscagli, I., Benvenuti, F., Rabuffetti, M., and Farne, A. (1999). Are perception and action affected differently by the Titchener circles illusion? Experimental Brain Research, 127(1), 95–101. Pélisson, D., Prablanc, C., Goodale, M.A., and Jeannerod, M. (1986). Visual control of reaching movements without vision of the limb. II. Evidence of fast unconscious processes correcting the trajectory of the hand to the 1nal position of a double-step stimulus. Experimental Brain Research, 62, 303–311. Perenin, M.-T. (1997). Optic ataxia and unilateral neglect: Clinical evidence for dissociable spatial functions in posterior parietal cortex. In P. Thier and H.O. Karnath (Eds.), Parietal lobe contribution to orientation in 3D space, pp. 289–308. Berlin, Heidelberg, New York: Springer Verlag. Perenin, M.-T. and Jeannerod, M. (1978). Visual function within the hemianopic 1eld following early cerebral hemidecortication in man. I. Spatial localisation. Neuropsychologia, 16, 1–13. Perenin, M.-T. and Rossetti, Y. (1996). Grasping in an hemianopic 1eld. Another instance of dissociation between perception and action. Neuroreport, 7(3), 793–797. Perenin, M.-T. and Vighetto, A. (1988). Optic ataxia:6ain, 111(3), 643–674. Pisella L. and Rossetti Y. (2000). Interaction between conscious identi1cation and non-conscious sensorimotor processing: temporal constraints. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 129–151. Amsterdam: Benjamins. Pisella, L., Attali, M., Frange, H., Régnier, C., Gaunet, F., and Rossetti, Y. (1996). Représentations en action: une même représentation spatiale pour la mémorisation et la verbalisation? In Actes du 6e colloque de l’Association pour la Recherche Cognitive, pp. 37–43. Villeneuve d’Ascq. Pisella, L., Arzi, M. and Rossetti, Y. (1998a). The timing of color and location processing in the motor context. Experimental Brain Research, 121, 270–276. Pisella, L., Rossetti, Y., and Arzi, M. (1998b). Dorsal vs. ventral parameters of fast pointing: effects of stimulus attribute and of response type. European Journal of Neuroscience, 10(sup.10), 192 (Abstract). Pisella, L., Tilikete, C., Rode, G., Boisson, D., Vighetto, A., and Rossetti, Y. (1999). Automatic corrections prevail in spite of an instructed stopping response. In M. Grealy and J.A. Thomson (Eds.), Studies in perception and action, pp. 275–279. Mahwah, NJ: Erlbaum. Pisella, L., Gréa, H., Tilikete, C., Vighetto, A., Desmurget, M., Rode, G., Boisson, D., and Rossetti, Y. (2000). An automatic pilot for the hand in the human posterior parietal cortext: Toward a reinterpretation of optic ataxia. Nature Neuroscience, 3(7), 729–736. Place, U.T. (2000). Consciousness and the zombie within: A functional analysis of the blindsight evidence. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 295–330. Amsterdam: Benjamins. Polyak, S. (1957). The vertebrate visual system. Chicago: University of Chicago Press. Pöppel, E., Held, R., and Frost, D. (1973). Residual visual function after brain wounds involving the central visual pathways in man. Nature, 243, 295–296. Prablanc, C. and Martin, O. (1992). Automatic control during hand reaching at undetected two-dimensional target displacements. Journal of Neurophysiology, 67, 455–469. Price, M.C. (2001). Now you see it, now you don’t. Preventing consciousness with visual masking. In P.G. Grossenbacher (Ed.), Finding consciousness in the brain: A neurocognitive approach (Advances in Consciousness Research, Vol. 8), pp. 25–60 . Amsterdam: John Benjamins, pp. 25–60. Price, M.C. et al. (2001). Manuscript submitted for publication. Ptito, A., Lepore, F., Ptito, M., and Lassonde, M. (1991). Target detection and movement discrimination in the blind 1eld of hemispherectomized patients. Brain, 114, 497–512.
117
aapc04.fm Page 118 Wednesday, December 5, 2001 9:21 AM
118
Common mechanisms in perception and action
Revonsuo, A. and Rossetti, Y. (2000). Dissociation and interaction: Windows to the hidden mechanisms of consciousness. In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 351–366. Amsterdam: Benjamins. Riddoch, M.J., Humphreys, G.W., Edwards, M.G. (2000). Visual affordance and object selection. In: J. Driver and S. Monsell (Eds.), Control of cognitive processes (Attention and Performance XVIII). Rode, G., Rossetti, Y., Li, L., and Boisson, D. (1999). The effect of prism adaptation on neglect for visual imagery. Behavioural Neurology, 11, 251–258. Rolls, E.T. (1999). A theory of consciousness and its application to understanding emotion and pleasure. In: The brain and emotion, pp. 244–265. Oxford: Oxford University Press. Rossetti, Y. (1998). Implicit short-lived motor representation of space in brain-damaged and healthy subjects. Consciousness and Cognition, 7, 520–558. Rossetti, Y. (1999). In search of immaculate perception: Evidence from motor perception of space. In S. Hameroff, A. Kaszniak, and D. Chalmers (Eds.), Towards a science of consciousness, pp. 141–148. Cambridge, MA: MIT Press. Rossetti, Y. (2000). Implicit perception in action: Short-lived motor representations of space. In P.G. Grossenbacher (Ed.), Finding consciousness in the brain: A neurocognitive approach, pp.131–179. Amsterdam: Benjamins. Rossetti, Y. and Procyk, E. (1997). What memory is for action: The gap between percepts and concepts. Behavioral and Brain Sciences, 20(1), 34–36. Rossetti, Y. and Régnier, C. (1995). Representations in action: Pointing to a target with various representations. In B.G. Bardy, R.J. Bootsma, and Y. Guiard (Eds.), Studies in perception and action III, pp. 233–236. Mahwah, NJ: Erlbaum. Rossetti, Y. and Revonsuo, A. (Eds).(2000a). Beyond dissociation: Interaction between dissociated implicit and explicit processing, 366 pp. Amsterdam: Benjamins. Rossetti, Y. and Revonsuo, A. (2000b). Beyond dissociations: Recomposing the mind-brain after all? In Y. Rossetti and A. Revonsuo (Eds.), Beyond dissociation: Interaction between dissociated implicit and explicit processing, pp. 1–16. Amsterdam: Benjamins. Rossetti, Y., Lacquaniti, F., Carrozzo, M., and Borghese, A. (1994). Errors of pointing toward memorized visual targets indicate a change in reference frame with memory delay. Unpublished manuscript. Rossetti, Y., Rode, G., and Boisson, D. (1995). Implicit processing of somaesthetic information: A dissociation between where and how? Neuroreport, 6, 506–510. Rossetti, Y., Gaunet, F., and Thinus-Blanc, C. (1996). Early visual experience affects memorization and spatial representation of proprioceptive targets. Neuroreport, 7(6), 1219–1223. Rossetti, Y., Pisella, L., and Arzi, M. (1997). Stimulus location is processed faster than stimulus colour. Perception, 26(suppl.), 110 (abstract). Rossetti, Y., Pisella, L., Rode, G., Perenin, M.-T., Régnier, C., Arzi, M., and Boisson, D. (1998a). The fast-brain in action versus the slow-brain in identi1cation. European Brain and Behaviour Society Meeting, Cardiff, UK. Rossetti, Y., Rode, G., Pisella, L., Farné, A., Li, L., Boisson, D., and Perenin, M.-T. (1998b). Prism adaptation to a rightward optical deviation rehabilitates left hemispatial neglect. Nature, 395, 166–169. Rossetti, Y., Rode, G., Pisella, L., and Boisson, D. (1999a). Plasticité sensori-motrice et récupération fonctionnelle: Les effets thérapeutiques de l’adaptation prismatique sur la négligence spatiale unilatérale. Médecine/ Sciences, 15, 239–245. Rossetti, Y., Rode, G., Pisella, L., Farne, A., Ling, L., and Boisson, D. (1999b). Sensorimotor plasticity and cognition: Prism adaptation can affect various levels of space representation. In M. Grealy and J.A. Thomson (Eds.), Studies in perception and action, pp. 265–269. Mahwah, NJ: Erlbaum. Rossetti, Y., Pisella, L., and Pélisson, D. (2000a). Eye blindness and hand sight: Temporal aspects of visuomotor processing. Visual Cognition, 7, 785–808. Rossetti, Y., Rode, G., and Boisson, D. (2000b). Numbsense and blindsight. In B. de Gelder, E. de Haan, and C. Heywood (Eds.), Varieties of unconscious processing. Oxford: Oxford University Press, in press. Rossetti, Y., Rode, G., and Boisson, D. (2001). Numbsense: A case study and implications. In B. De Gelder, E. De Haan, and C. Heywood (Eds.), Varieties of unconscious processing. Oxford: Oxford University Press, in press. Rushworth, M.F.S., Nixon, P.D., and Passingham, R.E. (1997). Parietal cortex and movement, I. Movement selection and reaching. Experimental Brain Research, 117, 292–310. Sakata, H. and Taira, M. (1994). Parietal control of hand action. Current Opinion in Neurobiology, 4, 847–856. Sakata, H., Taira, M., Kusunoki, M., Murata, A., and Tanaka, Y. (1997). The parietal association cortex in depth perception and visual control of hand action. Trends in Neurosciences, 20(8), 350–356.
aapc04.fm Page 119 Wednesday, December 5, 2001 9:21 AM
Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions
Saoud, M., Coello, Y., Dumas, P., Franck, N., d’Amato, T., Dalery, J., and Rossetti, Y. (2000). Visual pointing and speed/accuracy trade-off in schizophrenia. Cognitive Neuropsychiatry, 5(2), 123–134. Schall, J.D., Morel, A., King, D.J., and Bullier, J. (1995). Topography of visual cortex connections with frontal eye 1eld in macaque—Convergence and segregation of processing streams. Journal of Neuroscience, 15, 4464–4487. Schmolesky, M.T., Wang, Y., Hanes, D.P., Thompson, K.G., Leutgeb, S., Schall, J.D., and Leventhal, A.G. (1998). Signal timing across the macaque visual system. Journal of Neurophysiology, 79, 3272–3278. Schneider, G.E. (1969). Two visual systems. Science, 163, 895–902. Schwartz, A.B. (1994). Distributed motor processing in cerebral cortex. Current Opinion in Neurobiology, 4, 840–846. Smeets, J.B.J. and Brenner, E. (1995). Perception and action are based on the same visual information. Journal of Experimental Psychology: Human Perception and Performance, 21, 19–31. Sprague, J.M. and Meikle, T.H. (1965). The role of the superior colliculus in visually guided behavior. Experimental Neurology, 11, 115–146. Stoerig, P. and Cowey, A. (1992). Wavelength discrimination in blindsight. Brain, 115, 425–444. Taira, M., Mine, S., Georgopoulos, A.P., Murata, A., and Sakata, H. (1990). Parietal cortex neurons of the monkey related to the visual guidance of hand movements. Experimental Brain Research, 83, 29–36. Tanné, J., Boussaoud, D., Boyer-Zeller, N., and Rouiller, E. (1995). Direct visual pathways for reaching movements in the macaque monkey. Neuroreport, 7, 267–272. Taylor, T.L. and McCloskey, D. (1990). Triggering of preprogrammed movements as reactions to masked stimuli. Journal of Neurophysiology, 63, 439–446. Taylor, J.L. and McCloskey, D.I. (1996). Selection of motor responses on the basis of unperceived stimuli. Experimental Brain Research, 110, 1, 62–66. Trevarthen, C.B. (1968). Two mechanisms of vision in primates. Psychologische Forschung, 31, 299–337. Ungerleider, L.G. (1995). Functional brain imaging studies of cortical mechanisms for memory. Science, 270 (5237), 769–775. Ungerleider, L. and Desimone, R. (1986). Cortical projections of visual area MT in the macaque. Journal of Comparative Neurology, 248, 190–222. Ungerleider, L. and Mishkin, M. (1982). Two cortical visual systems. In D.J. Ingle, M.A. Goodale, and R.J.W. Mans1eld (Eds.), Analysis of motor behavior, pp. 549–586. Cambridge, MA: MIT Press. Van Hoesen, G.W. (1982). The parahippocampal gyrus: New observations regarding its cortical connections in the monkey. Trends in Neurosciences, 345–350. Vindras, P., Desmurget, M., Prablanc, C., and Viviani, P. (1998). Pointing error re2ect biases in the perception of the initial hand position. Journal of Neurophysiology, 79(6), 3290–3294. Vishton, P.M., Rea, J.G., Cutting, J.E., and Nuñez, L.N. (1999). Comparing effects of the horizontal–vertical illusion on grip scaling and judgment: Relative versus absolute, not perception versus action. Journal of Experimental Psychology: Human Perception and Performance, 25(6), 1659–1672. Weiskrantz, L. (1986). Blindsight. A case study and implications. Oxford: Oxford University Press. Weiskrantz, L., Warrington, E.K., Sanders, M.D., and Marshall, J. (1974). Visual capacity in hemianopic 1eld following a restricted occipital ablation. Brain, 97, 709–728. Westwood, D.A., Chapman, C.D., and Roy, E.A. (2000). Pantomimed actions may be controlled by the ventral visual stream. Experimental Brain Research, 130(4), 545–548. White, J.M., Sparks, D.L., and Stanford, T.R. (1994). Saccades to remembered target locations: An analysis of systematic and variable errors. Vision Research, 34(1), 79–92. Wong, E. and Mack, A. (1981). Saccadic programming and perceived location. Acta Psychologica, 48, 123–131.
119
aapc05.fm Page 120 Wednesday, December 5, 2001 9:32 AM
5 Attention and visually guided behavior in distinct systems Bruce Bridgeman
Abstract. Recent research from several laboratories has revealed two distinct representations of visual space: a cognitive or ‘what’ system specialized for perception, and a sensorimotor or ‘how’ system specialized for visually guided behavior. To know how these two pathways normally operate and cooperate, they must be studied in normal humans. This has become possible with the development of psychophysical methods that isolate the two pathways and measure spatial information separately in each. The pathways are distinguished by the response measure, a symbolic response probing the cognitive system and an isomorphic motor reaction probing the sensorimotor system. The two systems represent visual space in different ways, the cognitive system relying on context even when that strategy leads to errors of localization, while the sensorimotor system possesses a quantitative calibration of visual space that is insensitive to context. Only the contents of the cognitive system are accessible to awareness, operationally de1ned as the ability to describe visual information. When con2icts arise between cognitive and sensorimotor information, it is the cognitive information that is available for making judgments and decisions. In this context, only the cognitive system can direct attention to particular objects or regions in the visual 1eld, and only that system can initiate behaviors based on current goals. The sensorimotor system has calibrated egocentrically based spatial information, but cannot initiate actions in the service of behavioral goals. Attention serves as a pathway for the cognitive system to motivate actions, which are then carried out under the guidance of sensorimotor information.
5.1 Introduction—cognitive and sensorimotor visual systems It seems obvious that to interact effectively with an object, we must perceive its location and properties accurately. We have the impression that vision is a uni1ed sense, with all of its richness and variety tied to a single, coherent whole. Perceived positions of objects and surfaces, color, motion, and control of action are smoothly integrated. This intuition is deceptive, however: several lines of evidence now have demonstrated that humans can achieve accurate motor behavior despite experiencing inadequate or erroneous perceptual information from the same environment at the same time. Under some conditions, perception is not required to visually guide an action. The following reviews the accumulation of several decades of empirical work to test this idea. The review is designed to complement the review by Rossetti and Pisella (this volume, Chapter 4), and therefore some important lines of research are given less attention here. The earliest experiments on separation of cognitive and sensorimotor systems were done in animals and in human neurological patients. Hamsters with a lesioned superior colliculus could perform a simple pattern recognition task, but could not run a maze. Other hamsters, with visual cortex lesions, could run the maze but could not do the pattern recognition task (Schneider 1967).
aapc05.fm Page 121 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Monkeys with lesions of the striate visual cortex could not perform pattern recognitions in their scotomic 1elds, but could perform many visually guided behaviors (Trevarthen 1968). The result implied that in primates, as in hamsters, pattern recognition and visually guided behavior could be affected separately by selective lesions. Subsequent work extended some of these observations to humans. Although human patients with scotomata from damage to the visual cortex fail to report the presence of objects in their ‘blind’ areas, they are able to point at or direct their eyes to these objects with little error (Pöppel, Held, and Frost 1973; Sanders, Warrington, Marshall, and Weiskrantz 1974; Weiskrantz 1996). Weiskrantz has termed the phenomenon ‘blindsight’ because the patients were perceptually blind in the affected 1eld but retained some ability to guide actions by sight, and even to pick up some other kinds of limited visual information without awareness. Ungerleider and Mishkin (1982) modi1ed Schneider’s dichotomy in the context of monkey neurophysiology into cortical what versus where systems, assigning the what to the inferotemporal cortex and the where to the posterior parietal cortex. Bridgeman (1991) revised the dichotomy again, noting that meaningful what questions can be asked of both pathways—one merely receives different answers from them under some circumstances. Paillard (1987) described a similar distinction, introducing the terms cognitive and sensorimotor, which are used here and in Rossetti and Pisella (this volume, Chapter 4). According to these views, visually guided behavior such as grasping or reaching is handled by a sensorimotor pathway that takes information from early vision and processes it in a pathway separate from the one that underlies the rich spatial sense of perception. Perception is de1ned here as sensory information that is actually experienced or, more operationally, information that can be described and remembered by the observer. According to this de1nition, if a visual stimulus is masked in such a way that an observer denies seeing it, the stimulus is not considered to be perceived, even if it can affect later perceptual judgments or actions. Broader de1nitions, asserting that any information input to an organism that can affect behavior represents perception, lead to the conclusion that perception also occurs in insects, one-celled protozoa, and even thermostats. At this point the de1nition would become so broad as to add nothing to the discussion of human neurological organization. Further, the experiments described below show that under some conditions normal humans can simultaneously hold two contradictory spatial values for the same stimulus, one perceived and the other not perceived, without becoming aware of the con2ict and without resolving it. According to Milner and Goodale (1995), a ventral channel mediates perception (what), while a dorsal channel subserves visually guided behavior (how). This dual arrangement allows spatially directed behavior to be rapid and ef1cient, implemented by a dedicated processor operating solely on the here-and-now goal of action. The cognitive pathway, in contrast, specializes in recognizing and remembering the identities of objects and patterns and their spatial interrelationships, based on comparisons with prior knowledge. The dorsal/ventral or parietal/temporal summary of the neuroanatomy of the two pathways is oversimpli1ed, however, since some cortical regions that should be identi1ed with the ‘ventral’ system are anatomically dorsal to the striate cortex, though the pathways remain anatomically distinct. The language is still useful to discuss the two systems, with the provisos that the anatomy is more complex than the terms imply and that there are information links between the systems at several levels (Rossetti and Pisella, this volume, Chapter 4). Since the sensorimotor and cognitive pathways normally lead to motor actions and perceptual experiences that are consistent with one another, tests of dissociability require conditions that disturb
121
aapc05.fm Page 122 Wednesday, December 5, 2001 9:32 AM
122
Common mechanisms in perception and action
this congruence as a result of either experimental intervention in normal subjects or certain types of brain injury in neurological patients. For example, Goodale, Milner, Jakobson, and Carey (1991) described a patient who was unable to identify the orientation of a slot perceptually, but could correctly place objects in it. Milner and Goodale (1995) review many other instances of behavior in the absence of perception, as well as perception in the absence of a behavioral capability (i.e. double dissociation). For example, some patients show visual apraxia, an inability to reach for and grasp objects appropriately despite being able to identify them. This de1cit is not the result of general motor damage, since grasping that is not guided by vision remains normal. In our interpretation, information in the cognitive pathway is unavailable to control accurate grasping. Conversely, Rossetti and Pisella (this volume) report patients who can grasp objects appropriately but are unable to describe them. However, neither the possession of two anatomically disparate visual streams nor evidence of perception–action dissociation in brain-damaged patients guarantees that such a dissociation applies to the intact brain, for a system that is uni1ed in normals might become fragmented after brain damage. For example, the injured brain might erect a ‘1rewall’ to preserve at least a portion of its usual function. Thus, rigorous evidence for perception–action dissociation in normal humans can be obtained only by studying normal subjects.
5.1.1 Cognitive and sensorimotor visual systems Several methods have been used to tease apart cognitive and sensorimotor systems in normal humans. Early experiments on the separation of systems showed that normal subjects were unable to perceive jumps of targets that take place during saccadic eye movements (a cognitive-pathway function). But the subjects could still point accurately to the new locations of the same targets (a sensorimotor-pathway function), even if their pointing movements were controlled open-loop (Bridgeman, Lewis, Heit, and Nagle 1979; Goodale, Pélisson, and Prablanc 1986). In these conditions accurate information about the new location of the target was entering the nervous system, but was not available to perception. Since each pathway could be probed without affecting the representation in the other, one can conclude that the two pathways must be storing spatial information independently. Bridgeman and Stark (1979) refuted the possibility that this result was due to differing response criteria by showing that the dissociation between perception and action occurred even with a criterion-free forced-choice perceptual measure. A more rigorous way to separate cognitive and sensorimotor systems is by double dissociation, introducing a signal only into the sensorimotor pathway in one condition and only into the cognitive pathway in another (Bridgeman, Kirch, and Sperling 1981). A 1xed target was projected in front of a subject, with a frame surrounding it. When the frame was displaced left or right, subjects saw an illusion of stroboscopic induced motion—the target appeared to jump in the opposite direction. After target and frame were extinguished, the subjects pointed to the last target position. They always pointed to the same location, regardless of the direction of the stroboscopic induced motion. The illusion did not affect pointing, showing that the illusory displacement signal was present only in the cognitive system. Another condition of the same experiment inserted displacement information selectively into the sensorimotor system by nulling the cognitive signal. Each subject adjusted the real target jumps until the target appeared stationary, with a real displacement in phase with the background jump equaling the induced displacement out of phase with the background. Thus, the cognitive pathway
aapc05.fm Page 123 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
speci1ed a stable target. Nevertheless, subjects pointed in different directions when the target disappeared in the left position than when it disappeared in the right position, showing that the difference in real target positions was still represented in the sensorimotor pathway. This is a double dissociation because the apparent target displacement in the 1rst condition affected only the cognitive measure, while the real displacement in the second condition affected only the sensorimotor measure.
5.1.2 Experiments and ambiguities Apparent dissociations might appear if a moving stimulus is sampled at different times for different functions, even though a uni1ed visual representation underlies each function. A target evaluated at a longer latency, for example, will be sampled when it has moved further along its path. Recently, methods have been developed, using static illusions, that can test dissociations of cognitive and sensorimotor function without the possible confounding effects of motion. One such method is based on the Ebbinghaus illusion, also called the Titchener circles illusion. A circle appears to be larger if it is surrounded by smaller circles than if it is surrounded by larger circles. The Ebbinghaus illusion has been applied to cognitive/sensorimotor dissociation by making the center circle into a threedimensional poker-chip-like object and asking subjects either to judge the size of the circle or to grasp it (Aglioti, DeSouza, and Goodale 1995). The grasp was adjusted closer to the real size of the circle than to its illusory size. Subjects were able to see their hands in this experiment, however, so it is possible that subjects adjusted their grasp not to the non-illusory true size of the circle, but to the visible error between the grasp and the edge of the circle. The adjustments did not occur until just before the movement was completed, nearly 2 s after it started. In a subsequent experiment that avoids the feedback confound, Haffenden and Goodale (1998) measured the illusion either by asking subjects to indicate the apparent size of a circle or to pick it up, in both cases without vision of hand or target. Both measures used distance between thumb and fore1nger as the dependent variable, so that output mode was controlled, and only the source of the information varied. The illusion appeared for both estimations but was much smaller for the grasp, indicating that the sensorimotor system was relatively insensitive to the illusion. The interpretation of these results has been called into question by Franz, Gegenfurtner, Bülthoff, and Fahle (2000), who failed to replicate the smaller grasp-adjustment effects in the motor condition. Goodale has responded with new experiments showing that it is primarily the physical distance between the test circle and the inducing circles that affects grasp aperture, while perceived size of the test circle is affected primarily by the size contrast between the test and inducing circles. In previous experiments, inducing circle size and distance from the test circle had been confounded. A different method for contrasting grasp and perception, using the Müller–Lyer illusion, showed that while the illusion is signi1cantly smaller when measured with grasp than with perception, there is some illusion under both conditions (Daprati and Gentilucci 1997). Again, relatively slow grasp movements may be responsible, and vision of both hand and stimulus was allowed. The difference dissipated when the observers were forced to delay their behavior, indicating a short-lived sensorimotor representation consistent with other results described below. In summary, there is behavioral evidence in normal subjects for a distinction between processing in two visual pathways, but we still know very little about processing in the sensorimotor pathway. In addition, there is a contrast in the parameters examined, some methods addressing the properties of objects and others their locations. But the saccadic suppression and induced motion methods are
123
aapc05.fm Page 124 Wednesday, December 5, 2001 9:32 AM
124
Common mechanisms in perception and action
vulnerable to the ambiguities of sampling a moving target. If information used for perception is sampled from a uni1ed visual representation at a different time than information used for action, one could explain some differences between perceptual and motor measures without resorting to a two-visual-systems hypothesis. The illusion methods use static stimuli, but show a quantitative rather than a qualitative distinction between cognitive and sensorimotor processing, and thus are vulnerable to scaling and distortion artifacts. A new method overcomes these limitations, producing large and consistent contrasts between cognitive and sensorimotor systems, differentiated by response measure. The dissociation is based on another perceptual illusion, the Roelofs effect: if a rectangular frame is presented off-center, so that one of its edges is directly in front of the subject, that edge will appear to be offset in the direction opposite the rest of the frame. A rectangle presented on the left side of the visual 1eld, for example, with its right edge in the center, will appear less eccentric than it is, and the right edge will appear somewhat to the right of the subject’s center (Roelofs 1935). We have extended and generalized the Roelofs effect to apply it to the study of the two-visualsystems theory. First, the frame need not have one edge centered in front of the subject; illusions occur whenever the frame is presented asymmetrically in the visual 1eld. Second, if a target is presented within the offset rectangle, its location tends to be misperceived in the direction opposite the offset of the frame. Misperception of frame position induces illusions of target position; this is an induced Roelofs effect, but will be called simply a Roelofs effect here. In our experiments, the motor task is isomorphic with stimulus position. This means that there is a continuous, 1:1 relationship between target position and hand position when the subject touches the target. If the target deviates 5 deg. to the right, the hand does also, and no remapping or symbolic operation intervenes between stimulus and response. Roelofs effects can be observed reliably if subjects describe the target’s position verbally, a task that addresses the cognitive system. A jab at the target, however, made just after it disappears from view, is not affected by the frame’s position. This task addresses the sensorimotor system. Motor behavior for many subjects remains accurate despite the perceptual mislocalization (Bridgeman 1992; Bridgeman, Peery, and Anand 1997). Here a question arose because the Roelofs result showed a dissociation for only about half the subjects, while the earlier studies based on saccadic suppression and on induced motion showed dissociations for all subjects. The earlier work did not require a direct position judgment of the perceptual system, however, but only an indication of whether a target had moved or not. The perceptual task was a simple detection, not a position discrimination, and the relatively undemanding nature of the task may have enabled subjects to stay in a more direct motor mode for the motor task. Indeed, in the Roelofs experiments the subjects felt that the motor trials were less dif1cult than the cognitive trials, because no decision had to be made. Pointing was perceived to be easier, possibly because a representation inaccessible to consciousness was doing the work. Since the experiments described below follow up on earlier studies (Bridgeman et al. 1997), we were able to take advantage of the results of those studies to improve our experimental design. In the earlier studies we presented targets in 1ve different positions. With both cognitive and sensorimotor measures, though, the responses to the 1ve positions fell close to a straight line; nearly all of the variance in responses as a function of target position was accounted for by a linear regression. Thus the positions were redundant, and in the current experiments we did not need to present 1ve target positions: two target positions would give us the same information, and allow us to increase the number of trials per condition.
aapc05.fm Page 125 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Though the motor task is formally isomorphic, it can also be interpreted as a communicatory act. In effect the observer might be informing the experimenter by pointing where he thinks the target is located, so that the task might be closely linked to cognitive representations. An alternative is to require an instrumental act, in which a subject must do something to the world rather than simply indicate a position to another person. Behavior with a purely instrumental goal might be different from behavior with a communicatory goal, even if both the stimuli and the motor movements themselves are identical. Thus in our next experiment subjects jabbed a three-dimensional target object, a long vertical bar, pushing it backward and making a clicking noise. Their intention was not to communicate anything, but only to hit the bar. With this improvement in our technique we achieved a cleaner separation of cognitive and motor systems. For a quick jab at a three-dimensional target, rather than a pointing motion, almost all subjects showed independence from Roelofs effects in immediate action, along with the robust Roelofs effects that we have observed previously in verbal estimation of position.
5.2 Preliminary experiment: dissociating cognitive and sensorimotor pathways Using these improved techniques, we began the job of characterizing the psychophysics of the sensorimotor system. A preliminary experiment (Bridgeman, Gemmer, Forsman, and Huemer 2000) is necessary to interpret the results of the main experiment. Because many of the methods and procedures are common to the two experiments, they will be described in detail.
5.2.1 Method Observers sat with heads stabilized before a white hemicylindrical screen that provided a homogeneous visual 1eld 180° wide × 50° high. A lever box located in front of the screen presented 5 vertical white levers. The center lever, marked with a black stripe, functioned as the target. Each lever was hinged at its base and spring-loaded. A long baf2e hid the microswitch assembly without revealing the position of the lever array. In the motor condition, the task was to jab the black target rapidly with the right fore1nger. The remaining levers served to record the locations of inaccurate responses. A jab between the locations of two levers would trip both of them, as the distance between the edges of the levers was about 7 mm, less than the width of a 1nger. A rectangular frame 38° wide was projected via a galvanic mirror under computer control, either centered on the subject’s midline, 6° left, or 6° right of center. Inside the frame, the lever array occupied one of two positions, 3.5° left of center or 3.5° right of center. On each trial the frame and target were positioned in darkness during the intertrial interval. Then a computer-controlled shutter opened for 1 s. Re2ected light from the projected frame made the screen and the levers visible as well. As soon as the shutter closed, the observer could jab the target or verbally indicate its position in complete darkness. Responses were recorded by the computer on an absolute scale (lever 1, 2, 3, 4, or 5).
5.2.1.1 Cognitive measure For the cognitive system the observer verbally estimated the position of the target spot on the center lever. The choices were ‘far left’, ‘left’, ‘center’, ‘right’, or ‘far right’, so that the response was a
125
aapc05.fm Page 126 Wednesday, December 5, 2001 9:32 AM
126
Common mechanisms in perception and action
5-alternative forced choice. In the present series of experiments the cognitive measure serves as a control to assure that a cognitive illusion is present, differentiating the cognitive and sensorimotor systems. Instructions in the verbal condition emphasized egocentric calibration. Quoting from the instructions that were read to each observer, ‘In this condition you will be telling the experimenter where you think the target is in relation to straight ahead.’ Further, ‘If the target looks like it’s directly in front of you, you will indicate this by saying “center”.’ Thus center was de1ned in terms of the subject’s body rather than the apparatus or the frame. Each subject received at least 20 trials of practice with no frame present, so that only egocentric information could be used in the judgment.
5.2.1.2 Sensorimotor measure The observer rested the right fore1nger on a foam pad mounted on the centerline of the apparatus just in front of the chin rest, then jabbed the target with the fore1nger as soon as the target disappeared. Thus both cognitive and sensorimotor measures were open-loop, without error feedback. Before the experimental trials began, observers practiced jabbing the target—some were reluctant to respond vigorously at 1rst for fear of damaging the apparatus. Subjects then received at least 10 practice trials in the jab condition. 5.2.1.3 Trial execution A computer program randomly selected target and frame positions, with the exception that an identical set of positions could not occur on two successive trials. In each trial one of the two target positions and one of the three frame positions was presented, exposed for 1 s, and extinguished. Since the projected frame provided all of the illumination, target and frame exposure were simultaneous. A computer-generated tone told the subject to respond. For no-delay trials the tone sounded as the shutter extinguished the frame, while on other trials the tone began after a delay. The delay trials, while intermixed with no-delay trials, were aimed at a different experimental question, and will not be considered further here. Two target positions × three frame positions × two response modes × three delays resulted in 36 trial types. Each trial type was repeated 10 times for each subject, resulting in a data base of 360 trials/subject. There was a brief rest and a chance to light adapt after each block of 60 trials. Data were collated on-line and analyzed statistically off-line. Two-way ANOVAs were run for each subject and each response mode. Factors were frame position and target position. Summary statistics were analyzed between subjects. Nine University of California undergraduates participated in the experiment, all right-handed with normal or corrected-to-normal visual acuity. Four were male and 1ve female.
5.2.2 Results 5.2.2.1 Cognitive measure The induced Roelofs effect, measured as a main effect of frame position, was signi1cant under all conditions. Observers tended to judge the target to be further to the left than its actual position when the frame was on the right, and vice versa. Six of seven individual subjects showed a signi1cant Roelofs effect, and the magnitude of the Roelofs effect averaged across subjects was 2.23° (S.E. 52 min arc).
aapc05.fm Page 127 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Fig. 5.1 Jabbing at a target under three stimulus conditions. Position of the background frame does not affect behavior. Drawn from data of Bridgeman et al. (2000).
5.2.2.2 Sensorimotor measure The results can best be summarized with the generalization that subjects hardly ever missed the target, regardless of target position or frame position (Fig. 5.1). Seven of eight subjects showed no signi1cant Roelofs effect. Averaged across subjects, the magnitude of the Roelofs effect was 20 min arc (S.E. 22 min arc). 5.2.2.3 Comparing the two measures Overall, there was a signi1cant difference between cognitive and motor measures, as expected from the robustness of Roelofs effects with the cognitive measure and the absence of Roelofs effects with the motor measure. The sizes of the Roelofs effects under various conditions can be compared by measuring the difference between average response with the target on the right and with the target on the left in Fig. 5.1. The cognitive measure shows a large and consistent deviation, replicating Bridgeman et al. (1997), while the sensorimotor measure (illustrated) shows no deviation. 5.2.3 Discussion This experiment showed that the sensorimotor pathway can code veridical information about target position, unaffected by visual context, even when perception shows an illusion of position. The rules are different for the two systems. Cognition is conscious and must use context, even when that leads to errors of localization. The sensorimotor system, in contrast, is insensitive to context, and its spatial values are held unconsciously. Con2icting spatial values can exist in the two systems simultaneously. These results contrast with results obtained previously with the Roelofs method (Bridgeman et al. 1997). In the earlier experiments, using a projected target and frame, only half of the subjects pointed
127
aapc05.fm Page 128 Wednesday, December 5, 2001 9:32 AM
128
Common mechanisms in perception and action
to the targets in the motor measure without the in2uence of a Roelofs effect, while in the present experiments almost all of them did. The difference between the current study and the earlier one is that, with the addition of the mechanical target, subjects are more prone to execute an instrumental rather than a communicative action. Dissociation occurred here in the frontal plane, where grasp directions differed only on the x-axis. Rossetti and Pisella (this volume, Chapter 4) have found a critical limitation of this phenomenon, for grasp in the z-axis (changing distance from the observer) is affected by context. Depth cues are very different from the more retinally based x-axis information, and Rossetti and Pisella’s result may be due to a context effect in the convergence movements that help to localize targets in depth. Recordings of eye movements during a depth grasping experiment would be necessary to test this hypothesis. This experiment by itself does not prove that the normal brain possesses a true sensorimotor map of visual space, though. A possible mechanism of the sensorimotor store is that subjects might perform the motor action by 1xating the target visually when it is visible, then pointing where they are looking when the target is extinguished. This would mean a zero-dimensional storage of information of spatial information limited to the location of a single point, held in gaze position rather than in an internal neurological register. Further, since oculomotor 1xation is a good measure of spatially selective attention, 1xating the target also facilitates attention to it. If this interpretation is correct, subjects should be unable to perform the motor task if they are prevented from ever 1xating the target. In the next experiment, extending the Roelofs effect paradigm, we seek to control for possible attention and 1xation effects by preventing observers from 1xating the target. This is the motivation of the main experiment, testing whether it is necessary for spatial values in the sensorimotor system to be held in an internal neurological store.
5.3 Main experiment: gaze position and the motor pathway In a condition where subjects are not allowed to 1xate the target in a Roelofs-effect design, one can form two hypotheses. If the sensorimotor pathway normally stores target position only as a gaze angle, then it cannot use spatial information from gaze position and will be forced to call upon the cognitive pathway for spatial location information. If the pathway includes a true map of visual space, however, context-free spatial information would be available even from a target that has never been 1xated. We monitor eye movements to be sure that subjects never 1xate the target. Further, we prevent covert orienting to the target by requiring subjects to perform a continuous oculomotor task throughout the exposure period. In this way we break the normally tight relationship between 1xation and spatial allocation of attention.
5.3.1 Method For this experiment we need 1xation points that de1ne eye movements, but give the subject no information about target or frame positions. A pair of 1xation points is added to the display, in positions statistically uncorrelated with target or frame positions, to elicit horizontal saccades.
5.3.1.1 Apparatus In order to present the target, frame, and 1xation points simultaneously, and also to improve the accuracy or our motor recordings, we moved to an electronic apparatus with all stimuli displayed on
aapc05.fm Page 129 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Fig. 5.2 Aiming at a target in the electronic apparatus. The display screen appears to the observer to be located at the plane of the touch pad. Contact with the pad offers 800-pixel resolution, compared to the 1ve pixels of the preliminary experiment.
a CRT screen. The screen is mounted at the top of the apparatus, with its face down, and is viewed through a front-surface mirror mounted at a 45° angle in front of the eyes, so that the display appears to be in the frontal plane directly in front of the subject. A touch pad mounted vertically in the apparent plane of the display records jab responses made with a stylus, at an 800-pixel horizontal resolution (Fig. 5.2). The frame’s width is 24° and its height is 12°. The saccade targets are 2° diameter circles, 23° apart, displayed 2.5° above the frame. Because of the smaller available stimulus aperture, frame positions are at 4° left of center, center, and 4° right of center. Target positions, are 2° left and 2° right of center. Two target positions × 3 frame positions × 3 1xation point positions yielded 18 trial types. Gaze position was monitored continuously by a Bouis photoelectric infrared eye tracker aligned to the left eye. The head was stabilized with a bite bar attached to the frame of the tracker. With this apparatus, eye position can be measured in two dimensions at a 400 Hz sampling rate in complete darkness.
5.3.1.2 Procedure Except for the change in apparatus, the procedure and design are similar to those in Experiment 1, with the following additions. Before each trial block, the eye tracker was calibrated by having subjects 1xate each corner of the frame in its centered position. At the start of each trial, subjects were instructed to look up, above the frame’s edge. When the eye monitor indicated upward gaze, the experimenter triggered the computer to display the trial, while the subject alternately 1xated the two 1xation points, alternating as quickly as possible as long as the points were visible (Fig. 5.3). Continuous saccades were required to prevent surreptitious attention shifts to the target position. Target, frame, and 1xation points appeared and disappeared simultaneously. For analysis, the mean response averaged across subjects for each target, frame, and 1xation point position was entered into a factorial ANOVA. This format trades some power for the ability to compare cognitive and sensorimotor data directly, with equal power in each measure, despite the different number of observers in each condition. Cognitive and sensorimotor conditions were analyzed separately, and then combined into a single analysis. Two target positions × three frame positions × three 1xation point positions resulted in 18 trial types. Each trial type was repeated 10 times for each subject, so that each cell in the analysis is based on 10 observations/subject × the number of subjects in the respective condition.
129
aapc05.fm Page 130 Wednesday, December 5, 2001 9:32 AM
130
Common mechanisms in perception and action
Fig. 5.3 Stimuli and saccadic eye movement scanpath in the gaze experiment. Eye movements alternate as many times as possible between the left and the right 1xation points during the 1-s exposure period. In this example, the saccade 1xation points are biased to the right. In other trials, they were centered or biased to the left.
Seven University of California undergraduates participated in the cognitive condition, and six in the motor condition, all right-handed with normal or corrected-to-normal visual acuity. Each subject was run in only one condition, cognitive or motor.
5.3.2 Results In general, signi1cance levels were lower in this experiment than in the preliminary experiment because of greater variability, though mean effect sizes were comparable. Preventing direct visual 1xation reduced the quality of the spatial information available.
5.3.2.1 Cognitive measure For this analysis the three 1xation point positions were considered as repetitions of each target/ frame condition. In an ANOVA with target and frame as the factors, the cognitive observers showed a signi1cant effect of target position, F(1, 12) = 30.88, p = 0.0001, and a marginally signi1cant effect of frame, F(2, 12) = 3.74, p = 0.0547. A Fisher’s PLSD test for the frame at a signi1cance level of 5% showed that the difference between position estimates at frame positions of 4° left and 4° right of center was signi1cant at p = 0.018, mean difference = 1.40°, critical difference = 1.12°. The interaction was not signi1cant at p < 0.05 (Fig. 5.4). 5.3.2.2 Sensorimotor measure The motor observers, in contrast, showed no Roelofs effect, frame F(2, 12) < 1, p = 0.96, but had a statistically signi1cant target effect, F(1, 12) = 404.78, p < 0.0001. There was no signi1cant interaction. Thus the motor behavior remained independent of frame position (Fig. 5.5). 5.3.2.3 Comparing the measures When cognitive and sensorimotor data were combined in a single ANOVA with target, frame, and measure as the factors, the measure factor was signi1cant, F(1, 24) = 5.46, p = 0.028. The only signi1cant
aapc05.fm Page 131 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Fig. 5.4 Verbal estimates of target position without direct 1xation on the target. Perceived position is biased by the position of the background frame, even though the instructions did not mention the frame. Error bars +/− 1 S.E., where not shown, were smaller than the size of the corresponding symbol. interaction at p < 0.05 was frame × measure, F(2, 24) = 3.55, p = 0.045, showing that the frame effect was larger in the cognitive condition than in the sensorimotor condition.
5.3.3 Discussion The single most important 1nding is that preventing direct 1xation on the target does not cause a Roelofs effect in motor activity. Since the observers in the motor condition showed no Roelofs effect, while those in the cognitive condition did, we can conclude that the sensorimotor representation was controlling the jab for the motor observers. The result indicates that the sensorimotor representation is at least two-dimensional, a true map rather than a simple matching of gaze and jab positions. This experiment shows that oculomotor 1xation and spatially selective attention are not responsible for accurate pointing behavior in an illusory visual context.
5.4 Decision and the motor pathway At this point we are certain that the sensorimotor pathway can represent spatial information without the disturbing in2uence of a biased visual context that invariably affects perception. Further, the information used for motor control is held in a true neural map of visual space. What, then, is the function of the cognitive system? Surely it has functions beyond representing a more error-prone version of the sensorimotor map. One unique function of the cognitive system is its ability to make decisions, to determine which of several alternative behaviors to execute. This is a nonlinearity in vision, because a decision once made is an all-or-none affair, no longer subject to the subtleties of coding and perception that may have led to the decision. Since the nonlinearity of a decision seems to be a hallmark of the cognitive system, forcing observers to make decisions about which of several possible targets to hit might
131
aapc05.fm Page 132 Wednesday, December 5, 2001 9:32 AM
132
Common mechanisms in perception and action
Fig. 5.5 Motor responses to targets without direct 1xation on the target. Perceived position is not biased by the position of the background frame. Error bars +/− 1 S.E. bring control by the cognitive system into a sensorimotor task, and bring the Roelofs effect along with it. An experiment by Bridgeman and Huemer (1998), using the same apparatus as used in the preliminary experiment but putting targets on the second and fourth bars rather than only the third bar, tested this hypothesis. Subjects were cued at the end of the stimulus display period in each trial whether they would jab the left or the right target bar. The results showed that observers could jab either the left or the right target bar in accordance with a post-stimulus cue, without in2uence of the Roelofs effect. The hypothesis was not con1rmed—no Roelofs effect was found (Fig. 5.6). Observers were able to make a cognitive decision about which target to attend to, and then to use spatial information in the context-free sensorimotor pathway to guide their actions. The jab was not as veridical as that of the result in Fig. 5.1, and the subjects sometimes hesitated before responding, but there was no statistically signi1cant effect of the surrounding frame. The combination of decision and jab was apparently more dif1cult than a simple jab, resulting in larger errors.
5.5 Conclusions The results of these experiments can be interpreted in terms of two visual pathways. One pathway is based on egocentric coordinates to govern motor behavior, while another uses information from visual context to represent spatial relationships in perception. These experiments also lend support to the claim that the price in performance the cognitive pathway must pay in order to take advantage of visual context information is a susceptibility to illusions of spatial context. While we have shown that direct 1xation driven by attentional selection is not the mechanism responsible for accurate visually guided pointing in a context that creates illusory perceptions in the cognitive system, this result shows only that 1xation is not responsible. Other aspects of attention may be responsible for the continued accuracy of motor behavior in these experiments.
aapc05.fm Page 133 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Fig. 5.6 Aiming at a target under three stimulus conditions. Motor responses to two targets are adjusted to be superimposed in the graph. There is no signi1cant Roelofs effect. Redrawn from data of Bridgeman and Huemer (1998). The visual mechanism by which motor behavior is governed has been shown to be extremely robust, both by these and previous studies. Indeed, the reappearance of a Roelofs effect for motor responses after a delay (Bridgeman et al. 1997) shows that the cognitive system can provide information to the motor system when necessary, and this so far appears to be the only form of realtime communication between the two systems. Rossetti and Pisella (this volume; Rossetti and Régnier 1995) review other extensive evidence for the ephemerality of the motor representation under delay. To date there is no evidence that the cognitive pathway has access to spatial location information in the motor pathway, except for longer-term adaptation effects. This observation supports the inference that spatial information can 2ow in only one direction, from cognitive to motor, for immediate control of behavior. In normal visual conditions (motor actions directed at still-visible targets), spatial information remains segregated in the two pathways. However, sensory in2uences that operate at the stage of early vision, before the two pathways divide, will of course, affect both pathways. It is clear that vision begins from a single anatomical base, and ends by affecting either perception or motor behavior. Many of the controversies in this 1eld can be interpreted as arguments about how late the split of these functions occurs. Some prefer to assume a uni1ed system until just before the output stages, while the evidence reviewed here shows separate and distinct representations of visual space for cognitive and motor systems. The cognitive system must take cognizance of spatial context, even when that strategy leads to errors in localization. Extending this principle to the temporal domain, Rossetti and Pisella (this volume) have shown that jabbing a lighted target is also affected by context if the jab is delayed, forcing the observer to rely on the cognitive pathway to control the jab. An array of target lights was arranged in a horizontal arc in front of the observer: errors for delayed jabbing at a given target tended to spread along the arc, in the direction of the other alternatives jabbed in other trials. In another condition, the same observers jabbed targets aligned in a row extending outward from the observer. Now the errors
133
aapc05.fm Page 134 Wednesday, December 5, 2001 9:32 AM
134
Common mechanisms in perception and action
were spread in depth, again in the direction of the rest of the array but orthogonal to the error distribution in the horizontal-arc condition. This difference in error distributions was found even for the point where the two rows intersected, so that the observer was jabbing the same point under each condition. Thus information about context taken from other trials in2uenced aiming at the target point, leading to systematic errors in jabbing. If the jab was immediate, however, observers could use their direct sensorimotor pathway to hit the target with less error, and independent of the context established in other trials. Spatial and temporal context thus follow similar rules, affecting only the spatial coding in the cognitive pathway. This experiment is also signi1cant from a methodological standpoint because it relies on error analysis, rather than spatial illusions, to differentiate cognitive and sensorimotor control. Both these experiments and the Roelofs effect experiments reviewed above require a subject to act on only one target in an otherwise clear or nearly clear 1eld. The real world, however, is always 1lled with a myriad of possible objects to grasp, swat, push, or prod, and the sensorimotor pathway lacks the motivation or plan to decide which of the objects to engage. In hindsight, a sensorimotor system that could not receive speci1c instructions on which of many alternatives is relevant would be of little value. Any other result would have spelled real trouble for the two-visual-systems hypothesis. To return to the questions in the introduction, these results show that attention to a target, in the sense of verbalization of its location, does not always enhance performance. In fact, such performance is subject to illusions that lead to mistakes in localization, while an unconscious pathway can control motor action without suffering from the illusions. Accurate motor localization can occur despite simultaneous mislocalization represented in attended, and remembered, pathways. Vision is not all of a piece, and the location of what we perceive is not necessarily the location of action. The decision to engage the world must come from cognitive sources, but the calibration of the engagement itself can come from separate, unperceived pathways. What seemed uni1ed is segregated in the brain into separate processing streams that follow different rules.
References Aglioti S., DeSouza J.F., and Goodale M.A. (1995). Size-contrast illusions deceive the eye but not the hand. Current Biology, 5, 679–685. Bridgeman, B. (1991). Complementary cognitive and motor image processing. In G. Obrecht and L.W. Stark (Eds.), Presbyopia research: From molecular biology to visual adaptation. New York: Plenum Press. Bridgeman, B. (1992). Conscious vs. unconscious processes: The case of vision. Theory and Psychology, 2, 73–88. Bridgeman, B. and Huemer, V. (1998). A spatially oriented decision does not induce consciousness in a motor task. Consciousness and Cognition, 7, 454–464. Bridgeman, B. and Stark, L. (1979). Omnidirectional increase in threshold for image shifts during saccadic eye movements. Perception and Psychophysics, 25, 241–243. Bridgeman, B., Lewis, S., Heit, G., and Nagle, M. (1979). Relation between cognitive and motor-oriented systems of visual position perception. Journal of Experimental Psychology: Human Perception and Performance, 5, 692–700. Bridgeman, B., Kirch, M., and Sperling, A. (1981). Segregation of cognitive and motor aspects of visual function using induced motion. Perception and Psychophysics, 29, 336–342. Bridgeman, B., Peery, S., and Anand, S., (1997). Interaction of cognitive and sensorimotor maps of visual space. Perception and Psychophysics, 59, 456–469. Bridgeman, B., Gemmer, A., Forsman, T., and Huemer, V. (2000). Processing spatial information in the sensorimotor branch of the visual system. Vision Research, 40, 3539–3552.
aapc05.fm Page 135 Wednesday, December 5, 2001 9:32 AM
Attention and visually guided behavior in distinct systems
Daprati, E. and Gentilucci, M. (1997). Grasping an illusion. Neuropsychologia, 35, 1577–1582. Franz, V., Gegenfurtner, K., Bülthoff, H.H., and Fahle, M. (2000). Grasping visual illusions: No evidence for a dissociation between perception and action. Psychological Science, 11, 20–25. Goodale, M.A., Pélisson, D., and Prablanc, C. (1986). Large adjustments in visually guided reaching do not depend on vision of the hand or perception of target displacement. Nature, 320, 748–750. Goodale, M.A., Milner, A.D., Jakobson, L.S., and Carey, D.P. (1991). A neurological dissociation between perceiving objects and grasping them. Nature, 349, 154–156. Haffenden, A. and Goodale, M. (1998). The effect of pictorial illusion on prehension and perception. Journal of Cognitive Neuroscience, 10, 122–136. Milner, D. and Goodale, M. (1995). The visual brain in action. Oxford: Oxford University Press. Paillard, J. (1987). Cognitive versus sensorimotor encoding of spatial information. In P. Ellen and C. ThinusBlanc (Eds.), Cognitive processes and spatial orientation in animal and man. Dordrecht, The Netherlands: Martinus Nijhoff. Pöppel, E., Held, R., and Frost, D. (1973). Residual visual function after brain wounds involving the central visual pathways in man. Nature, 243, 295–296. Roelofs, C. (1935). Optische Localisation. Archiv für Augenheilkunde, 109, 395–415. Rossetti, Y. and Régnier, C. (1995). Representations in action: Pointing to a target with various representations. In B.G. Bardy, R.J. Bootsma, and Y. Guiard (Eds.), Studies in perception and action III, pp. 233–236. Mahwah, NJ: Lawrence Erlbaum. Rossetti, Y., Rode, G., and Boisson, D. (1995). Implicit processing of somaesthetic information: A dissociation between where and how. Neuroreport, 6, 506–510. Sanders, M.D., Warrington, E.K., Marshall, J., and Weiskrantz, L. (1974). ‘Blindsight’: Vision in a 1eld defect. Lancet, 20, 707–708. Schneider, G.E. (1967). Contrasting visuomotor functions of tectum and cortex in the golden hamster. Psychologische Forschung, 31, 52–62. Trevarthen, C. (1968). Two mechanisms of vision in primates. Psychologische Forschung, 31, 299–348. Ungerleider, L. and Mishkin, M. (1982). Two cortical visual systems. In D. Ingle, M. Goodale and R.J.W. Mans1eld (Eds.), Analysis of visual behavior. Cambridge, MA: MIT Press. Weiskrantz, L. (1996). Blindsight revisited. Current Opinion in Neurobiology, 6, 215–220.
135
aapc06.fm Page 136 Wednesday, December 5, 2001 9:33 AM
6 How the brain represents the body: insights from neurophysiology and psychology Michael S.A. Graziano and Matthew M. Botvinick
Abstract. To reach for the computer mouse, to sit upright in a chair or hold a journal in order to read it, indeed, to do most of the actions that we commonly perform, we rely on a representation of the spatial con1guration of the body. How and where in the brain is the body represented and what are the psychological properties of this body schema? In this article we review 1rst the neurophysiology and then the psychology of the body representation. One 1nding that has emerged from both approaches is that the body representation is not merely a registration of proprioceptive inputs about joint angle. Instead, the brain contains a sophisticated model of the body that is continually updated on the basis of multimodal input including vision, somesthesis, and motor feedback. Neurophysiological studies in the monkey brain show that parietal area 5 is a critical node for processing the body’s con1guration. Neurons in area 5 combine signals from different modalities in order to represent limb position and movement. Psychological studies show that the body schema is used to cross-reference between different senses, as a basis for spatial cognition and for movement planning.
I thrust my arms wildly above and around me in all directions. I felt nothing . . . I cautiously moved forward, with my arms extended, and my eyes straining from their sockets . . . Poe, The Pit and the Pendulum Sometimes, too, just as Eve was created from a rib of Adam, so a woman would come into existence while I was sleeping, conceived from some strain in the position of my limbs. Proust, Swann’s Way ‘Easy!’ I said. ‘Be calm! Take it easy! I wouldn’t punch that leg like that.’ ‘And why not!’ he asked, irritably, belligerently. ‘Because it’s your leg,’ I answered. ‘Don’t you know your own leg?’ Sacks, The Man who Mistook his Wife for a Hat
6.1 Introduction Without an internal representation of the body, a mental model of the relative positions of the head and limbs, we would be unable to perform the most vital or trivial actions; unable to move toward and around the objects that surround the body; unable to process the locations of those objects in relation to the body; disoriented and without any sense of physical self. In the 1rst quote above, the protagonist of the story tries to understand the layout of his environment by touch and by use of his body representation. A touch on his hand will do him no good unless he knows the position of his outstretched arm. In the second quote, the position of the limbs implies the presence and shape of a
aapc06.fm Page 137 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
nearby object. The third quote is about a brain-damaged patient whose internal body representation no longer includes his own leg. The internal representation of the body has been studied on two different disciplinary fronts: a neurophysiological one and a psychological one. Until recently, these two approaches were surprisingly isolated from each other. The psychological approach emphasized the multisensory nature of the body representation. This work demonstrated that vision, touch, and proprioception are combined and cross-referenced in a sophisticated mental schema of the body. Neurophysiology, in contrast, emphasized proprioception, one component of body representation, and focused especially on the use of proprioception in the control of movement. Only recently have these two 1elds become more integrated and begun to converge on similar themes. The psychological studies have turned more toward exploring the spatial coordinate systems that organize the representation of the body and the control of movement. In neurophysiology, recent experiments have focused on how vision, proprioception, and touch are integrated by single neurons in the parietal lobe and premotor cortex. The purpose of the present article is to review both areas of research side by side, providing an overview of each and describing some of the relations between the two. The 1rst half of the article reviews neurophysiological studies on body representation, mainly in the monkey brain. These experiments examine a set of interconnected somatosensory and motor brain areas, emphasizing area 5 in the superior parietal lobe. The second half of the article reviews the psychological data, emphasizing how the body representation coordinates information within and across perceptual modalities. Both parts of the review share certain underlying themes: the representation of the body is multimodal, and it has a close relationship to the representation of space around the body and the control of movement through that space.
6.2 The neuronal basis of the body representation 6.2.1 Proprioceptive pathways from the periphery to area 5 Proprioception begins in receptors primarily in the joints and muscles (for review, see Burgess, Wei, Clark, and Simon 1982; Iggo and Andres 1982). Information about muscle stretch and joint angle is then transmitted through the dorsal column nuclei of the medulla and the ventrobasal complex of the thalamus, to two principal regions of the cerebral cortex, area SI on the postcentral gyrus and area SII in the lateral sulcus (for a review of these pathways, see Mountcastle 1984). Of these two cortical areas, SI is far better studied and understood, perhaps partly because it is on the top of the brain and easier to reach with a recording electrode. In area SI information from deep, proprioceptive receptors arrives mainly in the subregions termed 3a and 2, while information from the cutaneous receptors arrives mainly in subregions 3b and 1 (Kaas, Nelson, Sur, Lin, and Merzenich 1979; Mountcastle 1984). These subdivisions of SI contain even smaller partitions, cortical columns a few millimeters wide, that receive information from only one receptor type. Information-processing columns, now known to exist all over the cortex, were 1rst discovered in SI (Kaas et al. 1979; Mountcastle 1997). SI projects to a wide range of cortical sites including area 5 in the superior parietal lobe. All four subregions of SI project to area 5, but the strongest projection is from subregion 2 (Pandya and Kuypers 1969; Pearson and Powell 1985; Vogt and Pandya 1978). That is, area 5 receives input mainly from the deep, proprioceptive receptors. Most of the work on body representation in the
137
aapc06.fm Page 138 Wednesday, December 5, 2001 9:33 AM
138
Common mechanisms in perception and action
Fig. 6.1 Side view of the macaque monkey brain showing some of the cortical areas involved in representing the physical con1guration of the body. The intraparietal sulcus is shown opened up to expose the buried cortex (stippled area). MIP = medial intraparietal area, AIP = anterior intraparietal area, 5A = anterior area 5, M1 = primary motor cortex, SI = primary somatosensory cortex, SII = second somatosensory cortex. monkey brain has concentrated on area 5. This cortical area is not uniform. As shown in Fig. 6.1, it has several subdivisions, including area MIP on the medial bank of the intraparietal sulcus, V6A on the anterior bank of the parieto-occipital sulcus, a newly suggested area PRR that may overlap V6A and MIP, and a region on the gyral surface termed here Anterior 5 or 5A. Until recently, however, most of the work on area 5 did not distinguish between these different regions. The following sections review the effects of lesions to area 5, its physiology, and its possible role in body representation and the control of movement.
6.2.2 Lesions to area 5 In 1884, Ferrier and Yeo reported the effects of parietal lesions in monkeys (see Gross 1998 for a review of the history). They argued that visual cortex must be located in the parietal lobe and not the occipital lobe, because monkeys with parietal lesions were unable to reach accurately for pieces of food. Some time after, Balint (1909) observed similar behavior in humans with damage to parietal cortex. Balint and others (e.g. Holmes 1918) realized that the de1cit was probably one of sensory-motor integration, spatial attention, or body representation, not vision. Since then, the parietal lobe syndrome in all its spatial, motor, and attentional manifestations has been studied extensively in humans and monkeys (Andersen 1987; Critchley 1953; De Renzi 1982; Holmes 1918; Kolb and Whishaw 1990; Newcombe and Ratcliff 1989). One of the many de1cits often seen in human patients is a disturbance of body representation. For example, some patients will neglect one side of the body, failing to shave or dress on that side; other patients will notice the limbs contralateral to the damaged parietal lobe readily enough, but will mistakenly think that the
aapc06.fm Page 139 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
limbs are not attached to them and belong to someone else. Whether these de1cits are associated with one or another subregion of the parietal lobe is dif1cult to tell in humans, because of the size and uncertain borders of the lesions. Some of these de1cits in body representation are discussed further in the second half of this chapter. Lesions to the superior parietal lobe in monkeys cause de1cits in almost all aspects of somesthesis, not just body representation but touch as well (Ettlinger and Wegener 1958; Moffet and Ettlinger 1970; Ridley and Ettlinger 1975; Ruch, Fulton, and German 1938). Murray and Mishkin (1984) argued that many of these de1cits were the result of accidental damage to SI itself, not to area 5. Their results indicated that lesions carefully restricted to area 5 had minimal effects on texture, roughness, and shape discrimination. In contrast, lesions to areas SI and SII had a devastating impact on all of these behaviors. They proposed that area 5 processes the spatial component of somesthesis, such as the position of the arm, while area SII processes the perceptual, object-recognition, and memory component of somesthesis that comes with feeling an object with the hand. This dichotomy of the somatosensory system was proposed as a parallel to the visual system. In 1982, Ungerleider and Mishkin proposed that the cortical visual system was divided into two components of which the ‘dorsal stream’ subserves spatial vision and the ‘ventral stream’ subserves object recognition and memory. In the view of Murray and Mishkin (1984), area 5 is the dorsal stream, or spatial module of the somatosensory system. Several recent studies by Passingham and colleagues (Rushworth, Nixon, and Passingham 1997a,b) con1rmed that area 5 is necessary for the accurate spatial guidance of the arm, especially in the dark when only somatosensory cues are available. In the next several sections, we describe the properties of single neurons in area 5 and discuss how they might encode the spatial con1guration of the body and help to control movement.
6.2.3 Early single neuron studies of area 5 In 1971, Duffy and Burch1el studied the activity of neurons in area 5 of awake monkeys. They found that most neurons responded to proprioceptive signals—to joint angle and muscle stretch. These cells had highly complex properties. Some combined a tactile receptive 1eld on the skin with a response to joint rotation. Some responded to rotation of more than one joint, and many had bilateral responses. This convergence of different joints and different somatosensory submodalities onto individual neurons was never seen in SI. Typically, neurons in SI have small receptive 1elds on the contralateral side of the body, and respond to stimulation of one class of peripheral somatosensory receptor only. The differences between area 5 and SI led Duffy and Burch1el to suggest that area 5 represents a higher stage in somatosensory processing, and especially in the processing of body representation. Sakata et al. (1973) con1rmed and extended these 1ndings in area 5. These authors made two important original observations, both of which have been largely neglected. First, they found a subset of area 5 neurons that responded to a touch on the hand, but only if the joints of the arm were placed in certain positions. They argued that such neurons would be able to encode the spatial location of a felt object. Second, they found neurons ‘which responded to certain visual as well as to somesthetic stimuli.’ This visual input has never been systematically studied, although it was noted by other investigators (Colby and Duhamel 1991; MacKay and Crammond 1987; Mountcastle et al. 1975). Mountcastle and colleagues (1975) provided the 1rst coherent view of the functions of area 5 and its role in behavior. This landmark paper described the properties of neurons in both area 5, in the
139
aapc06.fm Page 140 Wednesday, December 5, 2001 9:33 AM
140
Common mechanisms in perception and action
superior parietal lobe, and the adjacent area 7 in the inferior parietal lobe. In addition to con1rming the 1ndings of previous studies, this study made the novel discovery of motor functions in the parietal lobe. In area 5, some of the neurons that responded during passive movement of the arm showed greater response when the monkey moved its arm on its own volition. Some neurons showed no somatosensory activity at all, responding only during the monkey’s goal-directed reaches. Another class of neurons responded when the monkey grasped or manipulated objects with its 1ngers. In area 7, neurons responded in association with eye movement. Some responded during active 1xation, others during saccadic or smooth pursuit eye movements. In the words of the authors, ‘These regions receive afferent signals descriptive of the position and movement of the body in space, and contain a command apparatus for operation of the limbs, hands and eyes within immediate extrapersonal space.’ The sensory–motor command hypothesis of Mountcastle et al. was controversial at 1rst, especially as applied to area 7 (e.g. Robinson, Goldberg, and Stanton 1978), but in the past 20 years has gradually gained a quali1ed acceptance. Areas 7 and 5 have now been parceled into more than 10 functionally different areas (for review, see Andersen, Snyder, Bradley and Xing 1997; Colby and Duhamel 1991). Some of these areas are involved in eye movement and 1xation, such as LIP and V6A (Andersen, Bracewell, Barash, Gnadt, and Fogassi 1990; Galletti, Battaglini, and Fattori 1995; Nakamura, Chung, Graziano, and Gross 1999). Some parietal areas are more involved in arm movement, such as areas MIP, PRR, and 7 m (Colby and Duhamel 1991; Ferraina et al. 1997; Snyder, Batista, and Andersen 1997). Parietal area AIP may be involved in grasping objects with the hand (Sakata and Taira, 1994). Even in the human literature, the parietal lobe has come to be viewed as a sensory–motor structure rather than as a purely visual, proprioceptive, or spatial structure (Goodale et al. 1994; Rossetti and Pisella, this volume, Chapter 4; Gallese et al., this volume, Chapter 17). As described in the next section, the single neuron experiments in area 5 that followed Mountcastle focused almost exclusively on the effort to distinguish sensory from motor; perception of limb position from the command to move.
6.2.4 Body representation and movement control in area 5 Area 5 projects to and receives projections from primary motor cortex, premotor cortex, and supplementary motor cortex, among other areas (Johnson et al. 1996; Jones, Coulter, and Hendry 1978; Jones and Powell 1970; Strick and Kim 1978); that is, it is closely connected to the motor system. To what extent is it a sensory structure or a motor structure? As described above, Mountcastle, Lynch, Georgopoulos, Sakata, and Acuna (1975) found neurons that responded best, sometimes only, during active rather than passive movements of the arm. But do these responses represent motor commands, as Mountcastle et al. proposed, or are they somatosensory signals, perhaps enhanced when the monkey is paying attention to its arm? Seal, Gross, and Bioulac (1982) examined this issue in monkeys that were trained to 2ex or extend the elbow joint. These experimenters cut the sensory nerves from the arm, and found that about 38% of the neurons in area 5 still responded just before and during arm movements. These neurons therefore responded independently of any somatosensory stimulation; their activity was internally generated. More recent studies have con1rmed that neurons in area 5, both on the surface and in the intraparietal sulcus, are active during reaching movements (Batista, Buneo, Snyder, and Andersen 1999; Lacquaniti et al. 1995; Snyder et al. 1997). Neurons in a proposed region of the intraparietal sulcus, area PRR, may be especially active in association with reaching (Snyder et al. 1997). These neurons
aapc06.fm Page 141 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
respond in anticipation of the arm movement. In a delayed reaching task, the neurons respond during the delay period after the monkey is instructed where to reach but before the ‘go’ signal. One speculation is that the activity of these neurons represents a motor plan. However, these experiments do not distinguish between activity that ultimately causes a movement and activity that represents an arm position predicted on the basis of motor feedback. Kalaska, Caminiti, and Georgopoulos (1983) found that when monkeys are planning to make an arm movement, area 5 neurons begin to respond on average 60 ms after the neurons in primary motor cortex. Thus at least some of the motor-related activity in area 5 could be the result of efference copy. Though motor in origin, these signals could serve a sensory function, helping to encode body posture. Lesions to motor cortex do not abolish the motor-related activity in area 5 (Bioulac, Burbaud, and Varoqueaux 1995); but it is dif1cult to rule out the possibility that another motor area is sending this efference-copy signal. The critical experiment to determine whether the activity in area 5 encodes body representation or controls movement has not yet been found, and may never be. Such a distinction between sensory and motor now appears to be too simple. Area 5 may contribute to both roles. A more meaningful question might be: How far along the sensory–motor transformation does area 5 lie? For example, how do the response properties in area 5 compare to those in primary motor cortex? Several groups have studied exactly this question. Georgopoulos and Massey (1985) found that the neuronal selectivity for the direction of hand movement was greater in primary motor cortex than in area 5, while selectivity to the static position of the hand in space was greater in area 5 than in primary motor cortex. Kalaska and colleagues (Kalaska and Hyde 1985; Kalaska, Cohen, Prud’homme, and Hyde 1990) trained monkeys to move a handle along speci1c trajectories while external force loads were applied to the handle. In this way, the location and direction of hand movement were dissociated from the muscular forces that the monkey used. The results showed that selectivity to the position and trajectory of the hand through space was greater in area 5 than in primary motor cortex, while selectivity to the muscular forces applied by the arm was greater in primary motor cortex than in area 5. In summary, the differences between area 5 and primary motor cortex are relative, not absolute. Area 5 neurons may play relatively more of a representational role, keeping track of the positions and movements of limbs, while primary motor cortex may play relatively more of a dynamic role, initiating and guiding the movements. However, these two functions overlap extensively. Not only do some area 5 neurons have motor properties, but most primary motor neurons have sensory properties, responding to tactile stimuli and joint rotation (Gentilucci et al. 1988). Indeed, primary motor cortex receives direct projections from almost every stage of the somatosensory system, including SI and even the somatosensory thalamus (for review, see Mountcastle 1984). The somatosensory–motor system, therefore, is organized as a set of highly interconnected nodes that collectively participate in the sensory guidance of movement. The evidence so far suggests that among these many nodes, area 5 is relatively more specialized for encoding the spatial con1guration of the body.
6.2.5 Visual representation of arm position in area 5 The studies reviewed so far focused on the role of proprioception and motor control in body representation. However, other sources of information are also important in body representation. According to psychophysical studies discussed in the second half of this article, vision is sometimes
141
aapc06.fm Page 142 Wednesday, December 5, 2001 9:33 AM
142
Common mechanisms in perception and action
the dominant sense of arm position. Does area 5 use visual input to help encode the position of the arm? Graziano, Cooke, and Taylor (2000) examined the visual representation of arm position in monkey area 5 by manipulating two variables: (1) the position of the monkey’s arm while it was out of view, under an opaque plate; (2) the position of a visible false arm, placed on top of the plate (see Fig. 6.2). The false arm was a monkey arm prepared by a taxidermist and arranged in a realistic fashion, positioned to appear as if it were extending from the shoulder of the experimental monkey. The monkey 1xated on a central spot during these tests. About 25% of the neurons tested in area 5 were signi1cantly affected by the visual position of the false arm. The proportion was signi1cantly higher
Fig. 6.2 Diagram of apparatus for testing whether neurons are sensitive to the felt or seen position of the arm. The monkey’s real arm was held in an adjustable arm holder covered from view while a realistic fake arm was in view. The real arm and the visible fake arm were placed on the left or right resulting in four experimental conditions. The monkey was trained to 1xate on a central lightemitting diode.
aapc06.fm Page 143 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
Fig. 6.3 Convergence of visual and somatosensory signals about arm position on an area 5 neuron. The neuron 1red at a higher tonic rate when the monkey felt its arm to be on the left. It also 1red at a higher tonic rate when the monkey saw the fake arm to be on the left. Each point is an average of 10 trials. Error bars are standard error. See Fig. 6.2 for methods. in MIP (35%) than in 5A (18%), suggesting that there might be a hierarchy of areas in which the visual sense of arm position is more fully developed in MIP (see Fig. 6.1 for location of MIP and 5A). Data from a typical example neuron is shown in Fig. 6.3. The tonic 1ring rate of the neuron was signi1cantly higher when the real arm was on the left. The 1ring rate was also signi1cantly higher when the fake arm was on the left. That is, this neuron integrated the felt position of the real arm and the seen position of the false arm. This result suggests that area 5 neurons encode the position of the arm in a supramodal fashion, using both somesthesis and vision. Similar tests using objects other than a fake arm, such as a white rectangle the same approximate size as the arm, or a piece of fruit to which the monkey appeared to attend, did not affect the activity of the neurons in the same fashion. In the same study, Graziano et al. (2000) found that neurons in SI were not sensitive to the seen position of the false arm. That is, in the ascending somatosensory pathway from the periphery to SI to area 5 and beyond, the 1rst stage at which visual information about arm position is integrated with somatosensory information appears to be in area 5. This 1nding is consistent with the view that area 5 is a central node in representing the con1guration of the body. It receives all necessary signals, including proprioception, motor feedback, and vision, and combines these signals to encode the relative positions of body parts. Area 5 projects to many cortical regions, including premotor and motor cortex, where information about body con1guration would be useful for planning movements. In the following section we describe the properties of neurons in premotor cortex that integrate the body representation with the representation of space surrounding the body. These neurons encode the locations of objects in space relative to the body, perhaps for the purpose of guiding movements.
143
aapc06.fm Page 144 Wednesday, December 5, 2001 9:33 AM
144
Common mechanisms in perception and action
6.2.6 Premotor cortex: a convergence of body representation, visual space, and movement control Neurons in the caudal premotor cortex of monkeys, just posterior to the bend in the arcuate sulcus, process and encode the locations of visual, tactile, auditory, and remembered stimuli and may help to guide movements of the head and arms (Gentilucci et al. 1988; Graziano, Hu, and Gross 1997a,b; Graziano, Reiss, and Gross 1999). About 40% of the neurons are bimodal, responding both to tactile and to visual stimuli (Gentilucci et al. 1988; Graziano et al. 1997a; Rizzolatti et al. 1981). The tactile receptive 1elds are arranged to form a somatotopic map. The visual receptive 1elds are usually adjacent to the tactile ones and extend outward from the skin about 20 cm (see Fig. 6.4). The area therefore contains a somatotopically organized map of the visual space that immediately surrounds the body. For most bimodal cells, the visual receptive 1eld is anchored to the site of the tactile receptive 1eld on the body. When the monkey’s eyes move, the visual response may change magnitude, but the location of the visual receptive 1eld does not change (Fogassi et al. 1992, 1996; Gentilucci et al. 1983; Graziano, Yap, and Gross 1994; Graziano et al. 1997a). If the tactile receptive 1eld is on the head, then rotating the head will cause the visual receptive 1eld to move in tandem with the head (Graziano et al. 1997a,b). If the tactile receptive 1eld is on the arm, moving the arm to different positions will cause the visual receptive 1eld to move in the same direction as the arm (Graziano et al. 1994, 1997a). The arm-related neurons are in2uenced by the sight of a fake arm as well as by the felt position of the real arm (Graziano 1999). In a recent experiment mapping the precentral gyrus, these multimodal neurons were found in a relatively restricted zone in the caudal part of premotor cortex (Graziano and Gandhi 2000). Other studies on more rostral and ventral regions in premotor cortex have found neuronal properties that may be somewhat different; but because of differences in experimental technique, the studies are dif1cult to compare (Mushiake, Tanatsugu, and Tanji 1997). The bimodal neurons in caudal premotor cortex bind together body representation with the visual space surrounding the body and the tactile space on the body. They encode the locations of objects in body-part centered coordinates. One possibility is that these neurons form a mechanism for guiding movements of the limbs and head away from nearby objects, for 2inching and avoiding. That is,
Fig. 6.4 Receptive 1elds of two bimodal, visual-tactile neurons in the polysensory zone in premotor cortex. (A). The tactile receptive 1eld (shaded) is on the snout, mostly contralateral to the recording electrode (indicated by the arrowhead) but extending partially onto the ipsilateral side of the face. The visual receptive 1eld (boxed) is contralateral and con1ned to a region of space within about 10 cm of the tactile receptive 1eld. (B). The tactile receptive 1eld for this neuron is on the hand and forearm contralateral to the recording electrode (indicated by the black dot) and the visual receptive 1eld (outlined) surrounds the tactile receptive 1eld. (Adapted from Graziano and Gross 1998.)
aapc06.fm Page 145 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
these multimodal receptive 1elds may form a type of protective shell around the body, alerting the brain of any potentially noxious object impinging on near space. As described in the following sections, similar interactions among vision, touch, body representation, and the control of movement can be seen in behavioral studies on humans.
6.3 The psychology of the body representation 6.3.1 The body schema In this section, we turn from neurophysiological to psychological work on body representation. One of the most important general 1ndings of psychological experiments is that the body representation involves more than the mere registration of peripheral inputs. Rather, it involves the interpretation of these inputs in the context of a rich internal model of the body’s structure. In what follows, we refer to this internal model as the ‘body schema’. While this term has been used in many different ways in past work (for discussion, see Gallagher 1986), we use the term broadly to mean an implicit knowledge structure that encodes the body’s form, the constraints on how the body’s parts can be con1gured, and the consequences of this con1guration on touch, vision, and movement. The body schema plays a central role in interrelating concurrent perceptual inputs, allowing for the reconstruction of missing information, enabling the detection and resolution of con2icts, and ensuring an integrated, globally consistent, multimodal representation of the body’s con1guration. The body schema may even be used to interpret the seen body con1guration of others (Shiffrar and Pinto, this volume, Chapter 19). In the following sections we discuss different types and combinations of information that are coordinated through the body schema.
6.3.2 Converting proprioceptive inputs into a representation of body position The body schema is not simply a representation of joint angles, but a complex integration of vision, proprioception, touch, and motor feedback. The relative weights applied to these various sources of information probably depend on the quality of information from each source (Stark and Bridgeman 1983). In this section we discuss one source of information about body con1guration; proprioception. In the subsequent sections, we will discuss the interactions between proprioception, touch, vision, and motor feedback. Proprioception derives originally from the local forces acting on muscle spindles, joint receptors, and tendon receptors. However, behavioral data, in line with everyday experience, suggests that this raw sensory information is ultimately combined with knowledge of the body’s segmental structure in order to produce a representation of the body’s current spatial con1guration. One indication of this transformation from simple joint information to a more complex body representation is that human subjects are more accurate at judging the spatial orientation of limb segments than they are at estimating the angles of individual joints (Soechting 1982). Another indication is the phantom limb phenomenon. Here, an amputated arm or leg continues to be experienced as present, and occupying its former location in space. This phenomenon is thought to involve continued input to areas of cortex formerly responsible for representing the position of the missing limb (for a review, see Ramachandran and Hirstein 1998). The fact that this input is translated into a detailed limb representation indicates that peripheral information is interpreted with reference to a centrally maintained model of the body’s form.
145
aapc06.fm Page 146 Wednesday, December 5, 2001 9:33 AM
146
Common mechanisms in perception and action
In order to transform peripheral signals into a representation of posture, the brain must integrate information from different, potentially distant, parts of the body. For example, the judgement of forearm orientation requires information concerning shoulder and elbow positions, at the least. This integration is re2ected in the behavior of neurons in cortical area 5, which as noted above combine information about multiple joints. The same type of integration can be seen in human behavioral studies, such as those performed by Lackner (Lackner 1988; Lackner and Taublieb 1983). Lackner took advantage of an illusion produced by vibration. If the experimenter applies vibration at about 100 Hz to a muscle or its tendon, and prevents movement resulting from the associated re2ex contraction, the subject experiences the illusion of movement around the joints crossed by that muscle. For example, vibration of the biceps produces an illusion of elbow extension, and vibration of the triceps one of elbow 2exion. Lackner (1988) found that under appropriate circumstances vibration of a single muscle group could produce the perception of rather global shifts in posture. For example, for subjects seated on the 2oor with the right hand under the right buttock, biceps vibration produced not only the illusion of arm extension, but also of a tilting of the body toward the left, as would occur if the arm really did extend and push against the 2oor (see also Roll, Roll, and Velay 1991). Similarly, while biceps vibration in the standing position ordinarily produces an illusion of movement only in the vibrated arm, if the subject grasps the left wrist with the right hand, vibration of the right biceps produces the illusion of movement of both arms (Craske, Kenny, and Keith 1984). These effects suggest that proprioceptive information deriving from multiple joints, tendons, and muscles is integrated into an internally consistent model of body position that takes into account the constraints imposed by the body’s structure.
6.3.3 Representing the size and shape of body parts In order to convert joint and muscle information into a representation of body position, the brain’s model of the body must include information about the size and shape of the body’s parts. The brain must also be able to update this model as body shape changes over the course of development. An experiment by Craske, Kenny, and Keith (1984) demonstrated how the perception of the body’s dimensions can be recalibrated to maintain consistency with other sources of information. In this study, subjects seated in the dark extended both arms and used their right index 1nger to touch a position on their left arm. A mechanical device was used to produce a mismatch between the position of the right hand and the location stimulated on the left arm, such that the touch actually occurred 12.5 cm closer to the shoulder. After a period of exposure to this mismatch, subjects reported feeling that their left arm was longer than their right. That is, the representation of arm length had been updated to resolve a con2ict between proprioception and touch. Another demonstration of the same phenomenon was provided by Lackner (1988). In this experiment, each subject was asked to grasp his or her own nose, with eyes closed. Vibration was then applied to the biceps of the grasping arm. The resulting illusion of arm extension was accompanied by a feeling that the nose had become elongated. ‘Oh my gosh!,’ one subject exclaimed, ‘My nose is a foot long! I feel like Pinocchio.’ Lackner’s illusion, like the other phenomena we have described above, highlights two important aspects of the body schema. First, the body schema contains geometrical information about the hinged and segmented structure of the body, such as that extension of the arm means an increase in the distance of the hand from the face, and that the tip of the nose is connected with the rest of the face. Second, the brain will update and even distort this model of the body in order to resolve con2icts of information.
aapc06.fm Page 147 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
6.3.4 Body position and the interpretation of touch One important role for body representation is to support the perception of objects in the environment surrounding the body. For example, in order to perceive the shape, location, and orientation of an object being felt with the hand, it is necessary to have an accurate representation of the hand’s posture and location. This ability to integrate the sense of touch with the body representation might be related to the neurons in area 5 that have combined tactile and proprioceptive receptive 1elds, as discussed above. Psychological experiments in humans have also explored the relationship between touch and body representation. For example, Driver and Grossenbacher (1996) had subjects perform a tactile discrimination task with one hand while attempting to ignore concurrent stimulation to the other hand. They found that subjects performed better when the two hands were held farther apart, indicating that attentional mechanisms were working within a spatial representation of touch that incorporated hand position. Closely related 1ndings come from work on patients with hemispatial neglect after parietal lobe injury. Driver, Mattingley, Rorden, and Davis (1997) studied tactile extinction, the failure to detect the more contralesional of two simultaneous touches on the body. They found that if both hands were touched, the patients showed greater tactile extinction when the two hands were held close together than when they were held far apart. Aglioti, Smania, and Peru (1999) found that for some neglect patients, if the two hands were crossed, such that the left hand was to the right of the trunk and the right hand to the left of the trunk, the tactile neglect switched hands, to remain on the contralesional side of the trunk. In a related study, Moscovitch and Behrmann (1994) asked patients to hold out one hand with the palm up. When both sides of the wrist were touched, the patients neglected the touch on the contralesional side. The patients were then asked to turn the hand so that the palm faced down. Under this condition, the patients neglected a touch on the opposite side of the wrist, that is, still toward the contralateral side of the body. The integration of touch with body position information is especially important for stereognosis, the use of touch to judge the size and shape of objects. Illusions in stereognosis can occur when body position is misperceived. One such illusion, described originally more than two thousand years ago by Aristotle (described in Benedetti 1985), is caused by holding a small ball between the crossed third and fourth 1ngers. This situation produces the perception that the ball has doubled, and that two separate objects are contacting the 1ngertips. Benedetti (1985, 1988) in somewhat more recent studies of this phenomenon showed that the doubling occurs because the tactile input is interpreted as if the 1ngers were uncrossed. This ‘tactile diplopia’ that occurs when objects are explored with the hand in a highly unfamiliar posture disappears if the subject is given extended experience with that hand position (Benedetti 1991). This result may be related to the reorganization in somatosensory cortex that occurs after practice with a particular task (Jenkins et al. 1990) and after changes in hand structure, for example after surgically induced syndactyly (Allard, Clark, Jenkins, and Merzenich 1991).
6.3.5 Coordinating seen and felt body position Just as the body schema links touch and proprioceptive information, it also links proprioception with vision and oculomotor function. One demonstration of this connection is that subjects can 1xate the position of their 1ngertip in the dark, and moreover can track its motion with smooth
147
aapc06.fm Page 148 Wednesday, December 5, 2001 9:33 AM
148
Common mechanisms in perception and action
pursuit eye movements (Jordan 1970). Furthermore, illusions of arm movement can produce illusions of visual motion; if a diode is af1xed to the 1ngertip, and an illusion of arm 2exion is induced by muscle vibration, the light appears to move in the direction of the perceived arm movement (DiZio, Lathan, and Lackner 1993). Conversely, visual inputs can in2uence proprioception, as demonstrated by the phenomenon of visual capture. Here, viewing one’s hand through a prism results in a distortion of proprioceptive perception such that the hand is felt to lie in the location where it is seen (Hay, Pick, and Ikeda 1965; Welch 1986). Another version of visual capture occurs in patients with phantom limbs. If these patients view the intact arm in a mirror, such that its re2ected image appears in the location formerly occupied by the missing limb, then movement of the intact arm can induce the perception of identical movements in the phantom (Ramachandran and Hirstein 1998). After prolonged exposure to a visual-proprioceptive mismatch, the mechanisms serving to coordinate the two modalities can themselves be altered. This adaptation is shown by many experiments with prism-induced visual displacements (Redding and Wallace 1997; Welch 1986). When pointing to targets viewed through a laterally displacing prism, subjects initially misreach in the direction of the visual displacement. However, after continued exposure to the prism, reaching becomes more accurate. After such adaptation, if the prism is removed and the subject is asked to reach for targets viewed normally, misreaching tends to occur in the direction opposite to the previous visual displacement. If a prism-adapted subject is asked to close his or her eyes and position the adapted hand so that it feels straight ahead of the nose, the subject will misplace the hand off the body midline, suggesting that adaptation to the initial visual–proprioceptive mismatch has led to a recalibration of the felt position of the arm (Harris 1965).
6.3.6 Interrelating multiple perceptual modalities A number of studies indicate that the body schema can coordinate even rather complex relationships among sensory modalities, including three-way interactions among touch, vision, and proprioception. For example, directing the eyes toward a particular part of the body, even in the dark, enhances the tactile sensitivity of that part (Tipper et al. 1998). Another demonstration of such three-way coordination was provided by Driver and Spence (1998). They found that a brief touch on the hand enhanced the subsequent processing of visual stimuli near the hand. The touch seemed to draw visual attention to the region of space near the hand. This enhanced visual region was anchored to the hand and moved to different locations in space when the hand was moved, even when the hands were crossed. Likewise, a 2ash of light presented near the hand enhanced the subsequent processing of tactile stimuli applied to the hand. In both versions of the experiment, crossmodal attention between vision and touch was operating on representations that had already taken body con1guration into account. Crossmodal attention has also been studied in patients with brain injury. Di Pellegrino, Ladavas, and Farne (1997) reported on a patient with right fronto-temporal damage and symptoms of tactile extinction. This subject was asked to detect a tactile stimulus applied to the contralesional hand. When a visual stimulus was simultaneously presented near the ipsilesional hand, the subject no longer reported the tactile stimulus. That is, the tactile stimulus had been extinguished by the competing visual stimulus. The critical region of visual space, in which the competing stimulus was most effective, surrounded the ipsilesional hand and moved if the hand was moved. That is,
aapc06.fm Page 149 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
touch, vision, and proprioception were bound together in a framework provided by the body schema. A similar close association between vision, touch, and proprioception occurs in the bimodal visuotactile neurons in premotor cortex, discussed above, an area that may have been damaged in the patient considered in the di Pellegrino study. An experiment by Botvinick and Cohen (1998) showed that perceptual information can sometimes be distorted in order to maintain consistency in the three-way relationship among vision, touch, and proprioception. This study reported a novel illusion, produced by introducing a spatial mismatch between seen and felt touch. The effect was elicited in the following manner (see Fig. 6.5): a rubber replica of a human hand was placed on a table in front of the subject. The subject’s own hand was positioned out of view, behind a screen. As the subject watched the rubber hand, the experimenter stroked it with a brush and, at the same time, brushed the subject’s own hand in an identical manner. Subjects reported a spatial fusion between the seen and felt touch, as if they were feeling the touch of the brush in the location where they saw the rubber hand touched. They often described this illusion by saying that it felt as if the rubber hand had become ‘their’ hand (see Fig. 6.6). The rubber hand illusion provides another example of the body schema mediating in the resolution of a con2ict; the perception of felt touch was brought into spatial alignment with the visually perceived touch, much as a ventriloquist’s voice is aligned with his dummy’s moving mouth. Botvinick and Cohen (1998) reasoned that this realignment of visual and tactile representations should involve a distortion of proprioceptive information, causing the subject’s arm to be represented in a position that would place his hand in the position of the rubber hand. They predicted that if proprioceptive information was indeed being distorted in this way, then prolonged exposure to the illusion should give rise to effects on reaching similar to those observed in prism adaptation experiments. Indeed, the subjects did show a reaching bias consistent with a recalibration of proprioception. Furthermore, the magnitude of the reaching bias correlated positively with the reported duration of the illusion.
Fig. 6.5 Arrangement used in eliciting the rubber hand illusion. The subject’s hand, out of view, was stroked with a paint brush while a rubber hand, in view, was synchronously stroked.
149
aapc06.fm Page 150 Wednesday, December 5, 2001 9:33 AM
150
Common mechanisms in perception and action
Fig. 6.6 Questionnaire results from Botvinick and Cohen (1998). Subjects were asked to rate the accuracy of each statement on a seven-point scale ranging from ‘disagree strongly’ (---), through ‘neither agree nor disagree’ (0), to ‘agree strongly’ (+ + +). Each point is the mean rating of 10 subjects. Error bars show response range.
6.3.7 The body in action: representing the relation of target and effector As reviewed above, neurophysiological 1ndings in the monkey indicate that the brain does not draw a sharp boundary between its representation of the body and its representation of movement. Behavioral studies in humans point to the same conclusion. Movements appear to be planned in spatial coordinate frames that are referenced to the different parts of the body. In particular, the act of reaching toward a target is closely related to the sense of arm position. When we reach toward a target, we normally have proprioceptive feedback from the arm, a continuous view of both the hand and the target. A number of experiments have investigated the effect of removing one or another of these sources of information. For example, vision of the hand throughout the reach improves accuracy (Desmurget et al. 1995; Prablanc et al. 1979). The importance of the view of the hand is especially clear in patients who have lost proprioceptive sense in their arms due to nerve degeneration. These patients have no other sense of arm position than vision. If these patients reach toward a target without the sight of their hands, they make large errors in both direction and extent (Ghez et al. 1995). Vision of the hand can in2uence not only the accuracy of pointing, but also the path used to reach the target. Wolpert, Ghahramani, and Jordan (1995) showed that if the hand trajectory as seen by the subject is distorted by increasing its curvature, subjects adapt by reaching along paths curved in the opposite direction, apparently seeking to produce reaches that follow a straight line in visual space. Vision of the hand can affect reaching performance even if the hand is viewed only prior to reach initiation. A glimpse of the hand in its resting position prior to movement has been shown in a number of studies to improve reach accuracy (Desmurget
aapc06.fm Page 151 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
et al. 1997; Prablanc et al. 1979). Furthermore, if the subject wears a displacing prism and is given a brief initial view of the hand, the subsequent reach is misdirected, indicating again that the position of the hand, in this case mislocalized by the subject, is incorporated into the motor program (Rossetti, Desmurget, and Prablanc 1995). Collectively, these studies show that the location of the hand is continuously monitored and used during reaching. One hypothesis is that the brain computes the current location of the target relative to the hand, and then uses this hand-centered spatial information to guide the movement of the hand (Graziano and Gross 1998). Several studies have found that the errors in reaching to a visual or remembered target tend to be along the line between the starting position of the hand and the target (Chief1 et al. 1999; Gordon et al. 1994; Soechting and Flanders 1989). This result suggests that the visuomotor system does indeed compute the distance and direction of the target from the hand, with greater error in the computation of distance. McIntyre et al. (1998) found that the pattern of errors during reaching supported both an eye-centered and a hand-centered reference frame. They point out that their 1ndings are consistent with a 1nal transformation into a hand-centered frame. Tipper, Lortie, and Baylis (1992) found evidence that attention to visual stimuli during a reaching task may be linked to the position of the hand. In their study, subjects reached for a target while avoiding a distracting stimulus. The reaction times were longer when the distracter lay roughly between the hand and the target. The critical region of visual space, in which the distracter had maximum effect, was anchored to the hand and moved if the hand was placed in different starting locations. In several conceptually related experiments (Anzola, Bertoloni, Buchtel, and Rizzolatti 1977; Wallace 1971), subjects pressed a button in response to a 2ashed light. If the 2ash of light was in the space near the hand, the subjects responded more quickly. For example, when subjects were asked to respond to a light on the right side of visual space by pressing a button with the left hand, and to a light on the left side using the right hand, they were faster when the hands were crossed than when they were uncrossed. In summary, there is accumulating behavioral evidence that during movement of the arm and hand, stimuli are encoded in a spatial reference frame that is anchored to the hand, at least at some point during the movement planning. This result matches the 1ndings in monkey premotor cortex. As described above, some of the multimodal neurons in premotor cortex that are related to arm movement also have visual receptive 1elds that are anchored to the arm and hand, apparently encoding the locations of nearby objects in arm- and hand-centered coordinates. The behavioral results suggest that humans may also have hand-centered visual receptive 1elds. This type of spatial coding, in body-part centered coordinates, would bind together the representation of the body and of the visual space around the body with the control of movement.
6.3.8 Development of the body schema: the roles of nature and nurture As the body changes size and shape over the life-span, the internal model of the body must change accordingly. A wealth of experimental and clinical data, some of which we have already reviewed, shows just how plastic the body schema can be, even in adulthood. A classic example of this adaptation is the phenomenon of prism adaptation, discussed above. The body schema can also be modi1ed by injuries that deform the limbs. In cases where the deformed limb is amputated, the patient often reports experiencing a phantom limb with the same deformity (Ramachandran and Hirstein 1998). While experience seems certain to play an important role in the construction and continual modi1cation of the body schema, at least some elements of the body schema may not depend on
151
aapc06.fm Page 152 Wednesday, December 5, 2001 9:33 AM
152
Common mechanisms in perception and action
experience, or at least may be present very early in life (Bahrick and Watson 1985). Rochat and colleagues (Morgan and Rochat 1997; Rochat 1998) investigated the development of the body schema in infancy, and found that infants as young as three months were able to match proprioceptive events to visual ones. Other researchers found evidence of visual–proprioceptive matching even shortly after birth (Meltzoff and Moore 1983). One of the most compelling arguments for an innate body schema is the phenomenon of the aplastic phantom. Here, a person congenitally lacking a limb experiences a full-2edged phantom in place of the missing limb (Saadah and Melzack 1994). While 1ndings such as these are intriguing, the relative roles of nature and nurture in establishing and calibrating the body schema remain poorly understood, and present an important area for future research. In investigating how the body schema emerges during infancy, some developmental psychologists have also asked the question of how the body comes to be distinguished from other objects as belonging to the self. Rochat (1998) suggested that the detection of correlations among visual, proprioceptive, tactile, and motor signals provides the basis for the identi1cation of the body as self. It is interesting that focal brain lesions, especially to the right parietal lobe, can cause the denial of ownership of intact body parts (Kolb and Whishaw 1990); perhaps the loss of the sense of self in these patients is due to a disruption of Rochat’s crossmodal mappings.
6.4 Conclusions We have described neurophysiological work in monkeys and psychological work in normal and brain-damaged humans on the internal representation of the body. Between these different approaches, an enormous amount is now known about the body representation. The critical brain areas have been identi1ed; they include area 5 in the superior parietal lobe, and possibly other areas such as premotor and motor cortex. A main 1nding of both the neurophysiological and the psychological approach is that the body representation is not merely a simple code for joint angles. Instead, proprioceptive information is combined with visual, tactile, and motor-feedback signals. All of this information is interpreted in the context of an internal model of the geometry of the body. The body schema appears to be a device for cross-referencing between sensory modalities, and for guiding movement of the limbs through space. Just as the body schema itself lies at the crossroads of multiple sensory modalities and in the communication among multiple cortical regions, its study traverses multiple disciplines. Progress in understanding the body schema will come from the continued, rich interrelations between psychophysics, neuropsychology, and neuroscience.
Acknowledgement We thank J. Cohen for his help.
References Aglioti, S., Smania, N., and Peru, A. (1999). Frames of reference for mapping tactile stimuli in brain-damaged patients. Journal of Cognitive Neuroscience, 11, 67–79.
aapc06.fm Page 153 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
Allard, T., Clark, S.A., Jenkins, W.M., and Merzenich, M.M. (1991). Reorganization of somatosensory area 3b representations in adult owl monkeys after digital syndactyly. Journal of Neurophysiology, 66, 1048–1058. Andersen, R.A. (1987). Inferior parietal lobule function in spatial perception and visuomotor integration. In F. Plum and V.B. Mountcastle (Eds.), Handbook of physiology, Vol. 5, pp. 483–518. Bethesda, MD: American Physiological Society. Andersen, R.A., Bracewell, R.M., Barash, S., Gnadt, J.W., and Fogassi, L. (1990). Eye-position effects on visual, memory, and saccade-related activity in areas LIP and 7a of macaque. Journal of Neuroscience, 10, 1176–1196. Andersen, R.A., Snyder, L.H., Bradley, D.C., and Xing, J. (1997). Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience, 20, 303–330. Anzola, G.P., Bertoloni, G., Buchtel, H.A., and Rizzolatti, G. (1977). Spatial compatibility and anatomical factors in simple and choice-reaction time. Neuropsychologia, 15, 295–301. Bahrick, L.E. and Watson, J.S. (1985). Detection of intermodal proprioceptive–visual contingency as a potential basis of self-perception in infancy. Developmental Psychology, 21, 963–973. Balint, R. (1909). Seelenlähmung des ‘Schauens’, optische Ataxie und räumliche Störung der Aufmerksamkeit. Monatsschrift für Psychiatrische Neurologie, 25, 51–81. Batista, A.P., Buneo, C.A., Snyder, L.H., and Andersen, R.A. (1999). Reach plans in eye-centered coordinates. Science, 285, 257–260. Benedetti, F. (1985). Processing of tactile spatial information with crossed 1ngers. Journal of Experimental Psychology: Human Perception and Performance, 11, 517–525. Benedetti, F. (1988). Exploration of a rod with crossed 1ngers. Perception and Psychophysics, 44, 281–284. Benedetti, F. (1991). Reorganization of tactile perception following the simulated amputation of one 1nger. Perception, 20, 687–692. Bioulac, B., Burbaud, P., and Varoqueaux, D. (1995). Activity of area 5 neurons in monkeys during arm movements: Effects of dentate nucleus lesion and motor cortex ablation. Neuroscience Letters, 192, 189–192. Botvinick, M. and Cohen, J.D. (1998). Rubber hand ‘feels’ what eye sees. Nature, 391, 756. Burgess, P.R., Wei, J.Y., Clark, F.J., and Simon, J. (1982). Signalling of kinesthetic information by peripheral sensory receptors. Annual Review of Neuroscience, 5, 171–187. Chief1, S., Allport, D.A., and Woodin, M. (1999). Hand-centred coding of target location in visuo-spatial working memory. Neuropsychologia, 37, 495–502. Colby, C.L. and Duhamel, J.R. (1991). Heterogeneity of extrastriate visual areas and multiple parietal areas in the macaque monkey. Neuropsychologia, 29, 517–538. Craske, B., Kenny, F.T., and Keith, D. (1984). Modifying an underlying component of perceived arm length: Adaptation of tactile location induced by spatial discordance. Journal of Experimental Psychology: Human Perception and Performance, 10, 307–317. Critchley, M. (1953). The parietal lobes. New York: Hafner. De Renzi, E. (1982). Disorders of space exploration and cognition. New York: Wiley. Desmurget, M., Rossetti, Y., Jordan, M., Meckler, C., and Prablanc, C. (1997). Viewing hand prior to movement improves accuracy of pointing performed toward unseen contralateral hand. Experimental Brain Research, 115, 180–186. Desmurget, M., Rossetti, Y., Prablanc, C., Stelmach, G.E., and Jeannerod, M. (1995). Representation of hand position prior to movement and motor variability. Canadian Journal of Physiological Pharmacology, 73, 262–272. di Pellegrino, G., Ladavas, E., and Farne, A. (1997). Seeing where your hands are. Nature, 388, 730. DiZio, P., Lathan, C.E., and Lackner, J.R. (1993). The role of brachial muscle spindle signals in assignment of visual direction. Journal of Neurophysiology, 70, 1578–1584. Driver, J. and Grossenbacher, P.G. (1996). Multimodal spatial constraints on tactile selective attention. In T. Inui and J.L. McClelland (Eds.), Attention and performance, XVI: Information integration in perception and communication, pp. 209–235. Cambridge, MA: MIT Press. Driver, J. and Spence, C. (1998). Crossmodal attention. Current Opinion in Neurobiology, 8, 245–253. Driver, J., Mattingley, J.B., Rorden, C., and Davis, G. (1997). Extinction as a paradigm measure of attentional bias and restricted capacity following brain injury. In P. Thier and H.-O. Karnath (Eds.), Parietal lobe contributions to orientation in 3D space, pp. 401–429. Berlin: Springer-Verlag.
153
aapc06.fm Page 154 Wednesday, December 5, 2001 9:33 AM
154
Common mechanisms in perception and action
Duffy, F.H. and Burch1el, J.L. (1971). Somatosensory system: Organizational hierarchy from single units in monkey area 5. Science, 172, 273–275. Ettlinger, G. and Wegener, J. (1958). Somaesthetic alternation, discrimination and orientation after frontal and parietal lesions in monkeys. Quarterly Journal of Experimental Psychology, 10, 177–186. Ferraina, S., Johnson, P.B., Garasto, M.R., Battaglia-Mayer, A., Ercolani L., Bianchi L., Ferraresi, P., Lacquaniti, F., and Caminiti, R. (1997). Combination of hand and gaze signals during reaching: Activity in parietal area 7m of the monkey. Journal of Neurophysiology, 77, 1034–1038. Ferrier, D. and Yeo, G.F. (1884). A record of the experiments on the effects of lesions of different regions of the cerebral hemispheres. Philosophical Transactions of The Royal Society, London, 175, 479–564. Fogassi, L., Gallese, V., di Pellegrino, G., Fadiga, L., Gentilucci, M., Luppino, M., Pedotti, A., and Rizzolatti, G. (1992). Space coding by premotor cortex. Experimental Brain Research, 89, 686–690. Fogassi, L., Gallese, V., Fadiga, L., Luppino, G., Matelli, M., and Rizzolatti, G. (1996). Coding of peripersonal space in inferior premotor cortex (area F4). Journal of Neurophysiology, 76, 141–157. Gallagher, S. (1986). Body image and body schema: A conceptual clari1cation. Journal of Mind and Behavior, 7, 541–554. Gallese et al., this volume, Chapter 17. Galletti, C., Battaglini, P.P., and Fattori, P. (1995). Eye position in2uence on the parieto-occipital area PO (V6) of the macaque monkey. European Journal of Neuroscience, 7, 2486–2501. Gentilucci, M., Scandolara, C., Pigarev, I.N., and Rizzolatti, G. (1983). Visual responses in the postarcuate cortex (area 6) of the monkey that are independent of eye position. Experimental Brain Research, 50, 464–468. Gentilucci, M., Fogassi, L., Luppino, G., Matelli, M., Camarda, R., and Rizzolatti, G. (1988). Functional organization of inferior area 6 in the macaque monkey. I. Somatotopy and the control of proximal movements. Experimental Brain Research, 71, 475–490. Georgopoulos, A.P. and Massey, J.T. (1985). Static versus dynamic effects in motor cortex and area 5: Comparison during movement time. Behavioral Brain Research, 18, 159–166. Ghez, C., Gordon, J., Ghilardi, M.F., and Sainburg, R. (1995). Contributions of vision and proprioception to accuracy in limb movements. In M.S. Gazzaniga (Ed.), The cognitive neurosciences, pp. 549–564. Cambridge, MA: MIT Press. Goodale, M.A., Meenan, J.P., Bülthoff, H., Nicolle, D.A., Murphy, K.J., and Racicot, C.I. (1994). Separate neural pathways for the visual analysis of object shape in perception and prehension. Current Biology, 4, 604–610. Gordon, J., Ghilardi, M.F., and Ghez, C. (1994). Accuracy of planar reaching movements. I. Independence of direction and extent variability. Experimental Brain Research, 99, 97–111. Graziano, M.S.A. (1999). Where is my arm? The relative role of vision and proprioception in the neuronal representation of limb position. Proceedings of the National Academy of Science USA, 96, 10418–10421. Graziano, M.S.A., Cooke, D.F., and Taylor, C.S.R. (2000). Coding the location of the arm by sight. Science, 290, 1782–1786. Graziano, M.S.A. and Gandhi, S. (2000). Location of the polysensory zone in the precentral gyrus of monkeys. Experimental Brain Research, 135, 259–266. Graziano, M.S.A. and Gross, C.G. (1998). Spatial maps for the control of movement. Current Opinion in Neurobiology, 8, 195–201. Graziano, M.S.A., Yap, G.S., and Gross, C.G. (1994). Coding of visual space by premotor neurons. Science, 266, 1054–1057. Graziano, M.S.A., Hu, X.T., and Gross, C.G. (1997a). Visuo-spatial properties of ventral premotor cortex. Journal of Neurophysiology, 77, 2268–2292. Graziano M.S.A., Hu, X.T., and Gross, C.G. (1997b). Coding the locations of objects in the dark. Science, 277, 239–241. Graziano, M.S.A., Reiss, L.A.J., and Gross, C.G. (1999). A neuronal representation of the location of nearby sounds. Nature, 397, 428–430. Gross, C.G. (1998). Brain, vision, memory: Tales in the history of Neuroscience. Cambridge, MA: MIT Press. Harris, C.S. (1965). Perceptual adaptation to inverted, reversed, and displaced vision. Psychological Review, 72, 419–444. Hay, J.C., Pick, H.L., and Ikeda, K. (1965). Visual capture produced by prism spectacles. Psychonomic Science, 2, 215–216.
aapc06.fm Page 155 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
Holmes, G. (1918). Disturbances of visual orientation. British Journal of Ophthalmology, 2, 449–516. Iggo, A. and Andres, K.H. (1982). Morphology of cutaneous receptors. Annual Review of Neuroscience, 5, 1–31. Jenkins, W.M., Merzenich, M.M., Ochs, M.T., Allard, T., and Guic-Robles, E. (1990). Functional reorganization of primary somatosensory cortex in adult owl monkeys after behaviorally controlled tactile stimulation. Journal of Neurophysiology, 63, 82–104. Johnson, P.B., Ferraina, S., Bianchi, L., and Caminiti, R. (1996). Cortical networks for visual reaching: Physiological and anatomical organization of the frontal and parietal lobe arm regions. Cerebral Cortex, 6, 102–119. Jones, E.G. and Powell, T.P.S. (1970). An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain, 93, 793–820. Jones, E.G., Coulter, J.D., and Hendry, S.H.C. (1978). Intracortical connectivity of architectonic 1elds in the somatic sensory, motor, and parietal cortex of monkeys. Journal of Comparative Neurology, 181, 291–348. Jordan, S. (1970). Ocular pursuit movement as a function of visual and proprioceptive stimulation. Vision Research, 10, 775–780. Kaas, J.H., Nelson, R., Sur, M., Lin, C.-S., and Merzenich, M.M. (1979). Multiple representations of the body within the primary somatosensory cortex of primates. Science, 204, 521–523. Kalaska, J.F. and Hyde, M.L. (1985). Area 4 and 5: Differences between the load-dependent discharge variability of cells during active postural 1xation. Experimental Brain Research, 59, 197–202. Kalaska, J.F., Caminiti, R., and Georgopoulos, A.P. (1983). Cortical mechanisms related to the direction of twodimensional arm movements: Relations in parietal area 5 and comparison with motor cortex. Experimental Brain Research, 51, 247–260. Kalaska, J.F., Cohen, D.A.D., Prud’homme, M., and Hyde, M.L. (1990). Parietal area 5 neuronal activity encodes movement kinematics, not movement dynamics. Experimental Brain Research, 80, 351–364. Kolb, B. and Whishaw, I.Q. (1990). Fundamentals of human neuropsychology (3rd edn). Freeman: New York. Lackner, J.R. (1988). Some proprioceptive in2uences on the perceptual representation of body shape and orientation. Brain, 111, 281–297. Lackner, J.R. and Taublieb, A.B. (1983). Reciprocal interactions between the position sense representations of the two forearms. Journal of Neuroscience, 3, 2280–2285. Lacquaniti, F., Guigon, E., Bianchi, L., Ferraina, S., and Caminiti, R. (1995). Representing spatial information for limb movement: Role of area 5 in the monkey. Cerebral Cortex, 5, 391–409. MacKay, W.A. and Crammond, D.J. (1987). Neuronal correlates in posterior parietal lobe of the expectation of events. Behavioral Brain Research, 24, 167–179. McIntyre, J., Stratta, F., and Lacquaniti, F. (1998). Short-term memory for reaching to visual targets: Psychophysical evidence for body-centered reference frames. Journal of Neuroscience, 18, 8423–8435. Meltzoff, A.N. and Moore, M.K. (1983). Newborn infants imitate adult facial gestures. Child Development, 54, 702–709. Moffet, A.M. and Ettlinger, G. (1970). Tactile discrimination performance in monkey: The effect of unilateral posterior parietal ablations. Cortex, 6, 47–67. Morgan, R. and Rochat, P. (1997). Intermodal calibration of the body in early infancy. Ecological Psychology, 9, 1–23. Moscovitch, M. and Behrmann, M. (1994). Coding of spatial information in the somatosensory system: Evidence from patients with neglect following parietal lobe damage. Journal of Cognitive Neuroscience, 6, 151–155. Mountcastle, V.B. (1984). Central nervous mechanisms in mechanoreceptive sensibility. In I. Darian-Smith (Ed.), Handbook of physiology, I/Vol. 3: Sensory processes, pp. 789–878. Bethesda, MA: American Physiological Society. Mountcastle, V.B. (1997). The columnar organization of the neocortex. Brain, 120, 701–722. Mountcastle, V.B., Lynch, J.C., Georgopoulos, A., Sakata, H., and Acuna, C. (1975). Posterior parietal association cortex of the monkey: command functions for operations within extrapersonal space. Journal of Neurophysiology, 38, 871–908. Murray, E.A. and Mishkin, M. (1984). Relative contributions of SII and area 5 to tactile discrimination in monkeys. Behavioral Brain Research, 11, 67–83. Mushiake, H., Tanatsugu, Y., and Tanji, J. (1997). Neuronal activity in the ventral part of premotor cortex during target-reach movement is modulated by direction of gaze. Journal of Neurophysiology, 78, 567–571.
155
aapc06.fm Page 156 Wednesday, December 5, 2001 9:33 AM
156
Common mechanisms in perception and action
Nakamura, K., Chung, H.H., Graziano, M.S.A., and Gross, C.G. (1999). A dynamic representation of eye position in the parieto-occipital sulcus. Journal of Neurophysiology, 81, 2374–2385. Newcombe, F. and Ratcliff, G. (1989). Disorders of visuo-spatial analysis. In F. Boller and J. Grafman (Eds.), Handbook of neuropsychology, pp. 333–356. New York: Elsevier. Pandya, D.N. and Kuypers, H.G.J.M. (1969). Cortico-cortical connections in the rhesus monkey. Brain Research, 13, 13–36. Pearson, R.C.A. and Powell, T.P.S. (1985). The projections of the primary somatosensory cortex upon area 5 in the monkey. Brain Research Reviews, 9, 89–107. Prablanc, C., Echallier, J.E., Jeannerod, M., and Komolis, E. (1979). Optimal pointing response of eye and hand motor systems in pointing to a visual target: II. Static and dynamic visual cues in the control of hand movement. Biological Cybernetics, 35, 183–187. Prablanc C., Pélisson, D., and Goodale M.A. (1986). Visual control of reaching movements without vision of the limb. I. Role of retinal feedback of target position in guiding the hand. Experimental Brain Research, 62, 293–302. Ramachandran, V.S. and Hirstein, W. (1998). The perception of phantom limbs. Brain, 121, 1603–1630. Redding, G.M. and Wallace, B. (1997). Adaptive spatial alignment. Mahwah, NJ: Erlbaum. Ridley, R.M. and Ettlinger, G. (1975). Tactile and visuo-spatial discrimination performance in the monkey: The effects of total and partial posterior parietal removals. Neuropsychologia, 13, 191–206. Rizzolatti, G., Scandolara, C., Matelli, M., and Gentilucci, M. (1981). Afferent properties of periarcuate neurons in macaque monkeys. II. Visual responses. Behavioral Brain Research, 2, 147–163. Robinson, D.L., Goldberg, M.E., and Stanton, G.B. (1978). Parietal association cortex in the primate: Sensory mechanisms and behavioral modulations. Journal of Neurophysiology, 41, 910–932. Rochat, P. (1998). Self-perception and action in infancy. Experimental Brain Research 123, 102–109. Roll, J.P., Roll, R., and Velay, J.-L. (1991). Proprioception as a link between body space and extra-personal space. In J. Paillard (Ed.), Brain and space, pp. 112–132. Oxford: Oxford University Press. Rossetti, Y., Desmurget, M., and Prablanc, C. (1995). Vectorial coding of movement: Vision, proprioception or both? Journal of Neurophysiology, 74, 457–463. Rossetti and Pisella, this volume, Chapter 4. Ruch, T.C., Fulton, J.F., and German, W.J. (1938). Sensory discrimination in monkey, chimpanzee and man after lesions of the parietal lobe. Archives of Neurological Psychiatry, 39, 919–938. Rushworth, M.F.S., Nixon, P.D., and Passingham, R.E. (1997a). Parietal cortex and movement. I. Movement selection and reaching. Experimental Brain Research, 117, 292–310. Rushworth, M.F.S., Nixon, P.D., and Passingham, R.E. (1997b). Parietal cortex and movement. II. Spatial representation. Experimental Brain Research, 117, 311–323. Saadah, E.S.M. and Melzack, R. (1994). Phantom limb experiences in congential limb-de1cient adults. Cortex, 30, 479–485. Sakata H. and Taira M. (1994). Parietal control of hand action. Current Opinion in Neurobiology, 4, 847–856. Sakata, H., Takaoka, Y., Kawarasaki, A., and Shibutani, H. (1973). Somatosensory properties of neurons in the superior parietal cortex (area 5) of the rhesus monkey. Brain Research, 64, 85–102. Seal, J., Gross, C., and Bioulac, B. (1982). Activity of neurons in area 5 during a simple arm movement in monkeys before and after deafferentation of the trained limb. Brain Research, 250, 229–243. Shiffrar and Pinto, this volume, Chapter 19. Snyder, L.H., Batista, A.P., and Andersen, R.A. (1997). Coding of intention in the posterior parietal cortex. Nature, 386, 167–170. Soechting, J.F. (1982). Does position sense at the elbow re2ect a sense of elbow joint angle or one of limb orientation? Brain Research, 248, 392–395. Soechting, J.F. and Flanders, M. (1989). Sensorimotor representations for pointing to targets in three-dimensional space. Journal of Neurophysiology, 62, 582–594. Stark, L. and Bridgeman, B. (1983). The role of corollary discharge in space constancy. Perception and Psychophysics, 34, 371–380. Strick, P.L. and Kim, C.C. (1978). Input to primate motor cortex from posterior parietal cortex (area 5). I. Demonstration by retrograde transport. Brain Research, 157, 325–330. Tipper, S.P., Lortie, C., and Baylis, G.C. (1992). Selective reaching: Evidence for action-centered attention. Journal of Experimental Psychology: Human Perception and Performance, 18, 891–905. Tipper, S.P., Lloyd, D., Shorland, B., Dancer, C., Howard, L.A., and McGlone, F. (1998). Vision in2uences tactile perception without proprioceptive orienting. NeuroReport, 9, 1741–1744.
aapc06.fm Page 157 Wednesday, December 5, 2001 9:33 AM
How the brain represents the body: insights from neurophysiology and psychology
Ungerleider, L.G. and Mishkin, M. (1982). Two cortical visual systems. In D. Ingle, M.A. Goodale, and R.J. Mans1eld (Eds.), Analysis of visual behavior, pp. 549–586. Cambridge, MA: MIT Press. Vogt, B.A. and Pandya, D.N. (1978). Cortico-cortical connections of somatic sensory cortex (areas 3, 1 and 2) in the rhesus monkey. Journal of Comparative Neurology, 177, 179–192. Wallace, R.J. (1971). S-R compatibility and the idea of response code. Journal of Experimental Psychology, 88, 354–360. Welch, R.B. (1986). Adaptation of space perception. In K.R. Boff, L. Kaufman and J.P. Thomas (Eds.), Handbook of perception and human performance, pp. 24.1–24.44. New York: Wiley. Wolpert, D.M., Ghahramani, Z., and Jordan, M. I. (1995). Are arm trajectories planned in kinematic or dynamic coordinates? An adaptation study. Experimental Brain Research, 103, 460–470.
157
aapc07.fm Page 158 Wednesday, December 5, 2001 9:34 AM
7 Action planning affects spatial localization Jerome Scott Jordan, Sonja Stork, Lothar Knuf, Dirk Kerzel, and Jochen Müsseler Abstract. When observers are asked to indicate the 1nal position of a moving stimulus, their localizations are reliably displaced beyond the 1nal position, in the direction the stimulus was traveling just prior to its offset. Recent experiments indicate that these localization errors depend on whether or not observers track the moving stimulus with eye-movements. If they track, there is a localization error; if not, the error reduces to zero. The present series of experiments investigated whether localization error might be due, in part, to the binding of the moving stimulus in an action plan. Experiment 1 utilized circular stimulus trajectories, and the eye tracking/ no-tracking discrepancy revealed in previous studies was replicated. Experiment 2 required central 1xation by all observers, and either the computer program (i.e. induction) or a button press by the observer (i.e. intention) produced the stimulus offset. The localizations made in the Intention condition were further in the direction of the planned action effect than those made in the Induction condition. Experiment 3 demonstrated these differences to be due to the intention to stop the stimulus, not the button press. And Experiment 4 revealed that action planning has its binding effect on the localization error for a duration that extends beyond the actual moment of action execution. In light of these data, an approach to perception–action coupling is proposed in which spatial perception and spatially directed action are modeled, not as input and output, respectively, but rather, as synergistically coupled control systems.
When observers are asked to indicate the 1nal location of an apparently moving, or moving stimulus, the indicated location is reliably displaced beyond the 1nal location, in the direction the target was traveling just prior to its offset (Finke, Freyd, and Shyi 1986; Freyd and Finke 1984; Hubbard 1995). In addition, the magnitude and direction of the displacement varies in a manner that is consistent with the laws of physics (i.e. velocity, friction, gravity; Hubbard 1995). Accounts of these errors are often conceptualized in terms of representational momentum—the notion that the dynamics of the external environment have been internalized into the dynamics of cognitive representational systems. Given that internal representations, just as external events, have dynamic properties that cannot simply be brought to a halt upon stimulus offset, dynamic representational transformations are assumed to continue for some time following stimulus offset. It is the momentum of these representations that is assumed to underlie the resulting localization error. Implicit in this account of localization error is the assumption that the actions produced by observers during stimulus movement do not in2uence the processes underlying the error. In short, action processes and representational momentum processes are assumed to be independent, and the localization error is described as a post-perceptual cognitive phenomenon. Contrary to this assumed independence, the purpose of the present paper is to present a series of experiments that test whether or not the actions produced in relation to a moving stimulus contribute to the spatial distortion manifested in the localization error. These experiments are motivated by the following: (1) data that indicate the localization error may, in part, be due to the action planning required to maintain an ongoing relationship between action and stimulus motion (i.e. action control), and (2) data that indicate that perception and action-planning share common mechanisms (i.e. common neural
aapc07.fm Page 159 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
mediation). Collectively, these data imply that the very act of planning an action in relation to a stimulus event serves to transform the processes underlying perceptual mappings of that stimulus event. In short, it implies that action planning in2uences the localization error.
7.1 Action control and localization error In representational momentum paradigms, observers are free to move their eyes. In fact, in most experiments no instruction is given in this regard, and it is assumed that eye-movements used to pursue and track the target do not contribute to the localization error. It has been demonstrated, however, that the eyes continue to drift in the direction of target motion if a pursued target, travelling on a linear trajectory, suddenly vanishes (Mitrani and Dimitrov 1978), and the magnitude of such drift varies directly with tracking velocity (Mitrani, Dimitrov, Yakimoff, and Mateeff 1979). In addition, static stimuli presented in the periphery are localized closer toward the fovea than they actually are ( foveal bias; e.g. Müsseler, Van der Heijden, Mahmud, Deubel, and Ertsey 1999; O’Regan 1984; Osaka 1977; Van der Heijden, Müsseler, and Bridgeman 1999). In light of these data, it may be the case that when a moving target suddenly disappears, the eyes overshoot the 1nal position of the stimulus, such that the fovea is shifted into the direction of motion. Subsequently, the foveal bias inherent in static localizations, coupled with the changing position of the fovea due to overshoot, causes the 1nal position of the target to be localized in the direction of the fovea’s motion (i.e. in the direction of the target’s motion). In short, it may be the case that the localization error is related to eye-movement control. To test this idea, Kerzel, Jordan, and Müsseler (in press) conducted a representational momentum experiment in which they asked observers to localize the 1nal position of a moving stimulus. Unlike other representational momentum experiments, however, they devised a condition in which observers were instructed to 1xate a stationary 1xation point during the presentation of the moving stimulus. This instruction, of course, prevented observers from making the smooth-pursuit movements observers normally make during such tasks. The results are depicted in Fig. 7.1. In the tracking condition, in which observers were allowed to track the moving stimulus, the traditional representational momentum effect was obtained. Localizations were displaced beyond the vanishing point, in the direction of stimulus motion, and the magnitude of the localization error varied directly with the velocity of the moving stimulus. In the 1xation condition, however, there was no displacement in the direction of stimulus motion. There was vertical displacement, probably due to the retinal eccentricity of the vanishing point (i.e. the 1xation stimulus was located 2° below the trajectory of the moving stimulus), but there was no horizontal localization error whatsoever. These data strongly imply that the localization errors reported in previous representational momentum experiments may have been due, in part, to the control of the eye movements necessary to track the moving stimulus. To be sure, arguments against an eye-movement account have been posed on many occasions (see Kerzel et al., in press, for a thorough review of these arguments). These arguments tend to treat the moving eye as a moving camera, however, and they do so by downplaying the fact that oculomotor tracking is a controlled action. Given the data of Kerzel et al., it seems this latter point is rather central to the localization error, and really cannot be downplayed. Oculomotor control requires planning, and this planning must (1) take into account anticipated future locations of the moving stimulus, and (2) be generated continuously in order to effectively control eye–target relationships. In light of these demands on eye-movement control, it may be the
159
aapc07.fm Page 160 Wednesday, December 5, 2001 9:34 AM
160
Common mechanisms in perception and action
Fig. 7.1 Mislocalization as a function of instruction (pursuit vs. 1xation) and velocity. The dark bars represent the Fixation condition, and the light bars, the Pursuit condition. Error bars represent standard errors between participants. Panel A: Positive values indicate errors in the direction of movement, negative values errors opposite to the direction of movement. Panel B: Positive values indicate errors above the 1nal position, negative values errors below the 1nal position.
case that the localization error is more due to momentum derived from action control than momentum derived from action-independent post-perceptual representations.
7.2 Action planning and perceptual mapping Another challenge to the idea that localization errors are action independent derives from data that reveal rather tight functional couplings between the planning aspect of action control and shifts in spatial perception. Classic research in visual attention, for example, indicates that roughly 50–100 ms after the presentation of a saccadic target, the threshold for the detection of events at the
aapc07.fm Page 161 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
target’s position is reduced (Bachmann 1999; Klein 1988; Posner 1980; Posner and Cohen 1984; Schneider and Deubel, this volume, Chapter 30; Wolff 1999). Such pre-saccadic shifts in detectibility thresholds constitute shifts in the spatial content of perception that are associated with the planning of an action. Some researchers even argue that these shifts constitute a necessary pre-condition of saccadic control (Rizollatti, Riggio, Dascola, and Umiltà, 1987; Wolff 1999). Further evidence of planning-perception coupling comes from experiments in which observers are asked to make judgments about the perceived location of a stimulus presented during the production of an action. Dassonville (1995), for example, asked observers to move their arm through the dark and localize the point at which their moving 1nger received a vibrotactile stimulus. Observers tended to localize the stimulus at locations beyond the point of stimulation. In other words, observers perceived the stimulus at locations to which they were planning to move their hand at the moment the stimulus was presented. Collectively, these data indicate a rather tight functional coupling between action planning and perceptual space. To be sure, this idea is not completely new. Both philosophers and psychologists have argued that actions are planned in terms of the distal effects they are to produce (i.e. in terms of distal perceptual space). Harless, for example (see Hommel 1998), referred to intentions, or action plans, as Effektbilder (effect images). James (1890/1950, p. 501) said, ‘. . . an anticipatory image . . . is the only psychic state which introspection lets us discern as the forerunner of our voluntary acts.’ And Hershberger (1976, 1987, 1998), in an attempt to explicate the idea that actions are planned in terms of their distal effects, referred to action plans as ‘afference-copies’ in order to contrast them with von Holst and Mittelstaedt’s (1950) concept, ‘efference copy’. This idea has recently received a more formal theoretical/empirical treatment in what is known as the theory of Common Coding (Prinz 1992, 1997). Basically, this theory assumes that (1) actions are planned in terms of their distal consequences, and (2) the planning of an action necessarily recruits, or rather presses into service, neural transformations that also mediate the perception of those distal consequences. Empirical support for this idea derives from both neurophysiological and psychophysical research. Several neurophysiological 1ndings of the last decade, for example, point to populations of neurons that seem to mediate both sensitivity to, and production of, distal events (i.e. they appear to be involved in both perception and action planning, respectively). Examples include the ‘visual-and-motor neurons’ (e.g. Taira, Mine, Georgopoulos, Murata, and Sakata 1990) found in monkey parietal cortex, and the ‘mirror neurons’ (e.g. di Pellegrino, Fadiga, Fogassi, Gallese, and Rizzolatti 1992) located in the premotor cortex. Additional neurophysiological support for common coding derives from research on neural mechanisms that accomplish coactivation of distributed brain areas (Roelfsema, Engel, König, and Singer 1997). Psychophysical support derives from studies in which participants are asked to identify the spatial value (i.e. left or right) of an arrow stimulus that is brie2y presented while participants plan either a left or right button press (for an overview see Müsseler 1999; Müsseler and Wühr, this volume, Chapter 25). These studies reveal that observers are better able to identify (i.e. perceive) the direction of the arrow stimulus if it is presented alone, versus in the midst of an action plan (see also comparable 1ndings by De Jong 1993; De Jong and Sweet 1994; Joliœur 1999). Further, if the arrow is presented in the midst of an action plan (i.e. it is presented while observers are planning a right or left button press), its direction is better identi1ed if it is opposite that of the planned action (e.g. left-pointing stimulus arrow presented during the planning of a right keypress). Common coding asserts these effects occur because the spatial content of planning the right or left button press becomes bound in the action plan and is, thus, less available for mediating
161
aapc07.fm Page 162 Wednesday, December 5, 2001 9:34 AM
162
Common mechanisms in perception and action
perception of the arrow’s direction (i.e. right or left). In short, the spatial dimension of planning one event interferes with the perception of another. Though these studies address the spatial relationship between planning one event and perceiving another, they do not address whether or not an action-plan involving a particular stimulus in2uences the perceived spatial location of that stimulus. This is, of course, the issue being addressed by the present paper. If spatial perception and action planning share common mediation, the perceived location of a stimulus should depend on whether or not the stimulus is bound in an action plan. In short, the localization error may be due to the action-planning aspect of action control. The data of Kerzel et al. (in press) seem to address this issue—localization errors were found in the Pursuit condition but not in the Fixation condition. One might assume these differences in localization were due to the binding of the moving stimulus in an action plan in the Pursuit condition (i.e. ‘track the moving stimulus’). It is not clear, however, whether the localization error resulted exclusively from the planning of the eye movements. If localization is in2uenced by action planning per se then localizations should vary as a function of action plans, regardless of the effector speci1ed in the action plan. We devised a series of experiments to address this issue. Speci1cally, observers were asked to indicate the perceived 1nal location of a target that moved on a circular trajectory around a central 1xation point. Circular trajectories were utilized, as opposed to linear trajectories, in order to control for the retinal eccentricity of the point at which the stimulus disappeared. Experiment 1 constituted a replication of the Kerzel et al. experiment. In Experiment 2, the offset of the moving stimulus was produced by either an observer-initiated button press (i.e. the Intention condition), or the computer program (i.e. the Induction condition). These two conditions were designed to test whether the relationship between action planning and localization error is speci1c to oculomotor control, or extends to action control in general. Experiment 3 was a replication of Experiment 2, save for a cue condition in which observers were instructed to press a button in response to the onset of the moving stimulus. This experiment was devised to clarify whether any localization differences between the Induction and Intention conditions in Experiment 2 were due to the fact that participants pressed a button in the Intention condition, yet did not do so in the Induction condition. Finally, Experiment 4 was devised to test just how long action planning has its binding effect upon perceptual space. To test this, we ran three versions of the Intention condition, each of which was programmed to produce a different degree of delay between the observer’s button press and the actual offset of the moving stimulus.
7.3 Experiment 1: oculomotor action plans The purpose of Experiment 1 was to determine whether a difference in oculomotor action plans (i.e. 1xation versus tracking), relative to the moving target, would produce differences in localization scores regarding the target’s 1nal position. This constituted a replication of Kerzel et al. (in press), save for the use of circular versus linear target trajectories.
7.3.1 Method 7.3.1.1 Participants Seven female and 1ve male students of the University of Munich who ranged in age from 21 to 32 years (mean age of 26 years) were paid to participate in the experiment. They reported normal or corrected-to-normal vision and were naive as to the purpose of the experiment.
aapc07.fm Page 163 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
7.3.1.2 Apparatus and stimuli The experiment was controlled by a Macintosh computer. The stimuli were presented on a 17 inch monitor with a refresh rate of 75 Hz and a luminance of approximately 40 cd/m2 with black-onwhite projection. The rest of the room was dimly lit. The participant’s head was placed on a chin and forehead rest 500 mm in front of the monitor. The moving stimulus was a dot, the size and luminance of which were 4.35 mm (0.5°)1 and 13 cd/ 2 m , respectively. On each trial, the dot traced out a trajectory that circled a 1xation cross at a radius of 48 mm (5.5°, cf. Fig. 7.2). The stimulus movement was induced by shifting the dot 0.54° or 2.15° clockwise with every vertical retrace of the monitor (13 ms per frame), resulting in two possible tangential velocities; 3.85 °/s (33.7 mm/s) and 15.4 °/s (134.6 mm/s). These target velocities were wellwithin the velocity range in which observers can accurately track a moving target (Robinson 1968). The movement started at the upper portion of the circle (in the range of 20° before and 20° after the 12 o’clock position). Movement length varied from 90° to 360° with absolute movement times of 2240 to 8960 ms for the slow velocity and 560 to 2250 ms for the fast velocity. An adjustment cursor, which was identical to the stimulus, appeared 500 ms after stimulus-offset at a random position on the circle. It could be moved either clockwise or counterclockwise along the circle’s edge by pressing a right or a left button, respectively. Each button press resulted in a 0.13° change in the adjustment cursor’s position. In order to accelerate the adjustment process, the adjustment cursor’s velocity accelerated if the button was pressed for a longer duration. Thus, a complete circle required approximately 1500 ms. Buttons were mounted on a 2at board in front of the participant. 7.3.1.3 Design and procedure The four combinations of two instructions (pursuit eye movement and 1xation) and two velocities were presented blockwise. The order was counterbalanced between participants. In the Pursuit condition, participants were instructed to follow the stimulus with their eyes until it vanished, while in the Fixation condition they were instructed to 1xate the 1xation cross during the presentation of the moving stimulus. Participants experienced 24 repetitions of each cell of the 2 × 2 within-subject design (i.e. 96 trials overall). The experiment lasted approximately 30 min, including training trials and short breaks.
Fig. 7.2 Stimulus con1guration used in the present experiments. The moving stimulus circled the central 1xation cross at a radius of 5.5°.
163
aapc07.fm Page 164 Wednesday, December 5, 2001 9:34 AM
164
Common mechanisms in perception and action
7.3.2 Results and discussion The localization score on every trial was computed as the difference between the computer-indicated and participant-indicated 1nal position of the stimulus with respect to movement direction. Positive values indicate localizations beyond the target’s 1nal position. Mean localization errors were computed separately for every participant and each condition. A 2 × 2 repeated measures analysis of variance (ANOVA) with the factors instruction (pursuit vs. 1xation) and velocity (3.75°/s vs. 15.4°/s) revealed a signi1cant difference due to instruction, F(1, 11) = 69.08, MSE = 8.34, p < 0.001. Figure 7.3 depicts these effects. In the Fixation condition, a post-hoc Scheffè test revealed signi1cant negative localization errors (15.4°/s: –4.95 mm, p < 0.01; 3.75°/s: –3.59 mm, p < .05), while in the Pursuit condition, in combination with the fast velocity, there was a signi1cant positive localization error of 4.20 mm, p < 0.01. Moreover, there was a signi1cant interaction between instruction and velocity, F(1, 11) = 9.91, MSE = 6.01, p = 0.009. The amount of error increased in both directions with faster velocity. Basically, Experiment 1 replicated the results from Kerzel et al. (in press). As can be seen in Fig. 7.3, the moving target’s vanishing point was localized signi1cantly further in the direction of target motion in the Pursuit versus the Fixation condition. In addition, faster-moving stimuli were localized further in the direction of target motion than slower-moving stimuli in the Pursuit condition, but not in the Fixation condition. However, in contrast to the results of Kerzel et al. (in press) the Fixation condition revealed a reliable negative localization error (i.e. an error in the direction opposite the movement direction of the target). Negative localization errors have been previously reported in tasks requiring localization of either (1) the initial target position (Actis Grosso, Stucchi, and Vicario 1996; Thornton 2001) or (2)—with an accompanying 2ash—the 1nal position (Müsseler, Stork, and Kerzel 2001). A possible explanation of this negative error is that, in the Fixation
Fig. 7.3 Mislocalization as a function of instruction (pursuit vs. 1xation) and velocity. The dark bars represent the Fixation condition, and the light bars, the Pursuit condition. Positive values indicate errors in the direction of movement, negative values errors opposite to the direction of movement. Error bars represent standard errors between participants.
aapc07.fm Page 165 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
condition, retinal stimulation during one refresh rate overlaps with the stimulation of the previous refresh rate(s). As a consequence, it is possible for stimulation to build up, simply as a function of the stroboscopic nature of stimulus presentation on a computer screen. Summation of stimulus information (cf. also the Bunson–Roscoe law) caused by stimulation during successive frames may occur at all positions on the stimulus trajectory, save the 1nal position. Given such summation, it may be the case that stimulation is less pronounced and consequently more often missed at the 1nal position. Preliminary results from our laboratory support this idea. In the present context, however, we are less concerned with negative localization error than we are with the differences in localization error between the Fixation and the Pursuit conditions. In short, the data indicate that the action control required in the pursuit condition gave rise to localization errors that were further in the direction of stimulus motion than those obtained in the Fixation condition. Experiment 2 was devised to determine whether or not it was the action planning aspect of action control that gave rise to these differences.
7.4 Experiment 2: effector-independent effects of action planning Given the results of the previous experiment, Experiment 2 was designed to determine whether the pattern of localizations is speci1c to oculomotor control. To test this, we repeated Experiment 1, and varied the action plans participants were to generate. In the Intention condition, participants 1xated the central 1xation cross throughout the presentation of the moving stimulus, yet were instructed to stop the stimulus’ motion via a button press. In the Induction condition, participants 1xated the central 1xation cross throughout the presentation of the moving stimulus, and the offset of the stimulus was caused by the computer program, not the participant. This is similar to the Fixation condition of Experiment 1. If the act of binding a stimulus in an action plan contributed to the localization differences revealed in Experiment 1, then localizations made in the Intention condition should differ from those made in the Induction condition. This is because the action plan generated in the Intention condition (i.e. stop the stimulus’ motion) necessarily entails the anticipated distal action effect, that is, the location of the target at the moment the button is pressed. This is not the case in the Induction condition. Due to common mediation, this anticipatory aspect of the Intention condition should alter perceptual space in an intention-relative manner, thus producing localization differences between the two conditions. To be sure, the pattern of results in Experiments 1 and 2 are not expected to be exactly the same. This is because of differences in the action plans required by the two experiments. In the Pursuit condition of Experiment 1, the task was to track the target via eye movements. In the Intention condition of the present experiment, the task was to stop the stimulus’ motion. Both tasks require action planning that takes into account anticipated future locations of the moving stimulus, but they differ in terms of the anticipation required. The Pursuit condition required continuous anticipation due to the need for continuous tracking. The Intention condition, on the other hand, only required anticipation and planning regarding the stimulus’ 1nal location (i.e. vanishing point). Thus, while the continuous anticipation required by the Pursuit condition gave rise to positive localization error, the rather discrete anticipation required of the Intention condition is not expected to give rise to positive localization error (i.e. the localizations should be more accurate).
165
aapc07.fm Page 166 Wednesday, December 5, 2001 9:34 AM
166
Common mechanisms in perception and action
7.4.1 Method 7.4.1.1 Participants Six female and four male students of the University of Munich who ranged in age from 21 to 34 years (mean age of 25.4 years) were paid to participate in the experiment. 7.4.1.2 Apparatus and stimuli All was the same as in Experiment 1, save for the target velocities and the presence of a response button. Given that the present experiment required participants to press a button to stop the stimulus motion in the Intention condition, we utilized the faster of the two velocities from Experiment 1 (i.e. 15.4°/s) as well as a new, faster velocity (30.8°/s, i.e. 269.3 mm/s). The faster velocity increased the salience of the action/effect relationship. In other words, faster targets approximated the more ecologically valid, natural relationship that exists between arm/1nger movements and moving visual targets (e.g. swatting at a 2y, de2ecting a rapidly moving projectile, catching a 2y ball). Stimulus movement started at a random position on the circular orbit and stimulus offset was either controlled by the computer or by a button press of the participant. Movement length varied from 90° to 360° with absolute movement times of 560 to 2240 ms for the slow velocity and 280 to 1120 ms for the fast velocity. 7.4.1.3 Design The two task conditions (Intention vs. Induction) were presented blockwise. Half the participants started with the Induction condition. The two velocity conditions varied randomly within each block. There were 10 blocks of 10 trials. Thus, each participant experienced 25 repetitions of each of the four unique experimental conditions. 7.4.1.4 Procedure In the Induction condition the offset of the moving stimulus was produced by the experimental program. In the Intention condition the participant was instructed to press a button with the right index 1nger in order to stop the target’s motion. Participants were instructed to stop the movement at an arbitrary point after the stimulus had moved 90°, yet before it had moved 360°. They were also instructed that over trials they should distribute the stop positions between 90° and 360° and should not choose recurrent salient positions (e.g. the 6 o’clock position). If a participant pressed the button too early or too late, an error message was presented, and the trial was repeated immediately. 7.4.2 Results and discussion Mean localization scores were computed separately for every participant and each condition. These scores were then entered into a 2 × 2 ANOVA with the factors instruction (Intention vs. Induction) and velocity (15.4°/s vs. 30.8°/s). As can be seen in Fig. 7.4, there was a signi1cant main effect due to instruction, F(1, 9) = 5.68, MSE = 8.75, p = 0.041, and a tendency towards an interaction F(1, 9) = 3.86, MSE = 11.72, p = 0.081. The Scheffé test revealed that the signi1cant effect of instruction was due to differences between the Fast-Induction condition and the Fast-Intention condition ( p < 0.05). The data are consistent with an action-planning account. The localization errors made in the Intention condition were signi1cantly further in the direction of stimulus motion than those made in the Induction condition; that is they were more accurate especially at the faster velocity. These data
aapc07.fm Page 167 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
Fig. 7.4 Mislocalization as a function of instruction (intention vs. induction) and velocity. The dark bars represent the Induction condition, and the light bars, the Intention condition. indicate that action planning exerted an in2uence on the perceived location of the stimulus even though the action used to attain the planned effect was a button press, not an eye movement. This 1nding is also telling, in that the spatial location of the action (i.e. the location of the button press) and the location of the event speci1ed in the action plan (i.e. the position of the moving stimulus) did not spatially overlap—the two were located at different spatial locations. These data support the assertion of the action-planning account that due to the common mediation of perception and action planning, the planning of an action recruits the transformations to be used in the perception of the planned distal event. In the present case, the binding of the target’s 1nal location within a buttonpress action plan (i.e. ‘stop the stimulus motion by pressing the button’) shifted the localization in the direction of the intended effect.
7.5 Experiment 3: action-independent effects of action planning Given the 1ndings thus far, one might claim that the localization differences revealed in Experiment 2 were due to the fact that participants produced a button-press action in the Intention condition, but not in the Induction condition. If it was action planning per se, not only action production, that was responsible for the differences, then different action plans that utilize the same action should produce different localization patterns. We devised an experiment to test this idea. In one condition (the Cue condition) participants pressed a button in response to the onset of the moving stimulus, while in another (the Intention condition) participants pressed a button in order to produce stimulus offset. If the action plan is truly critical to the localization error, then localizations made in the two conditions should not be the same. In the Cue condition, the action plan refers to the initial stimulus location, while in the Intention condition, it refers to the 1nal stimulus location. In both conditions, the offset of the moving stimulus was produced by the participant’s action (i.e. the button press), but in the Cue condition the 1nal position of the moving stimulus did not have to be bound in the participant’s button-press action plan. Rather, given the instructions, all a participant had to do was react to the onset of the moving stimulus. In short, the moving stimulus constituted an action cue, and it
167
aapc07.fm Page 168 Wednesday, December 5, 2001 9:34 AM
168
Common mechanisms in perception and action
was the initial position of the moving stimulus, not the 1nal position, that was relevant to, and thus, potentially bound within, the participant’s button-press action plan. If there are differences between the Cue condition and the Intention condition, however, one will not know if they are due to action planning or trajectory length, since the stimulus, simply due to instructions, traces out a larger trajectory in the Intention condition. Thus, a variant of the Induction condition of Experiment 2 was utilized in which the length of the trajectory was limited to a quarter circle. Trajectory length, therefore, was similar to that in the Cue condition, but the offset of the stimulus was produced by the computer program, not the participant.
7.5.1 Method 7.5.1.1 Participants Eight female and four male students of the University of Munich who ranged in age from 20 to 41 years (mean age of 27.2 years) were paid to participate in the experiment. 7.5.1.2 Apparatus and stimuli Stimulus presentation was the same as in Experiment 2, with the following exceptions. A Cue condition was added in which participants were instructed to stop the movement of the stimulus, via a button press, as soon as the moving stimulus appeared. The Intention condition remained unchanged. In order to control for trajectory length, a variant of the Induction condition was utilized in which the length of the trajectory was limited to a quarter circle. 7.5.1.3 Design and procedure The three task conditions (Intention vs. Induction vs. Cue) were presented blockwise and their order was counterbalanced between participants. Stimulus velocities were the same as in Experiment 2, and were randomized within blocks. Overall, participants experienced 150 trials. The procedure was the same as Experiment 2. The experiment lasted approximately 35 min, including a training block.
7.5.2 Results and discussion One participant had to be excluded from further analysis because, following the experiment, it was discovered she had followed the instructions incorrectly. Two separate ANOVAs were conducted on the Cue vs. Intention and Induction vs. Intention data. The Cue vs. Intention analysis revealed only a signi1cant main effect of instruction F(1, 10) = 5.95, MSE = 6.75, p = 0.035, that is, judgments were more accurate in the intention condition (Fig. 7.5). The Induction vs. Intention analysis revealed a signi1cant interaction, F(1, 10) = 6.81, MSE = 3.47, p = 0.026. Follow-up Scheffé tests revealed the interaction to be due to differences between the Induction-fast and slow conditions (p < .01) and the Induction-fast and Intention-fast conditions (p < .05). This constitutes, save for the predictability of the trajectory length in the Induction condition, a replication of Experiment 2. Localizations made in the Intention condition were more accurate than those made in the Cue condition. If this was due solely to trajectory length, then the Cue and Induction conditions should have expressed a similar relationship to the Intention condition. This was not the case. While the Intention/ Cue analysis revealed only an effect of instruction, the Intention/Induction analysis revealed an interaction. The differences between these patterns indicate that something other than trajectory length was
aapc07.fm Page 169 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
Fig. 7.5 Mislocalization as a function of instruction and velocity for the Cue–Intention data (top) and the Induction–Intention data (bottom). In the Intention–Cue graph, dotted bars represent the Cue condition, and light bars, the Intention condition. In the Induction–Intention graph, dark bars represent the Induction condition, and light bars, the Intention condition. responsible for the differences between the Intention and Cue conditions. Speci1cally, in the Induction condition, the moving stimulus was not bound in an action plan. Since the trajectory length was constant (i.e. a quarter turn) and the task conditions were blocked, the 1nal position of the moving stimulus, though not bound, may have nonetheless become predictable due to repetition, especially in the slower conditions. This may account for the differences between the Induction–fast and Induction– slow conditions. In the Cue condition, however, stimulus duration was similar to that in the Induction condition, but the initial position may have been bound in the button-press action plan. Thus, the perceived vanishing point appears to have been attracted to the location of the initial position. Collectively, these data support the following assertions: (1) the differences between the Intention and Cue conditions were not due to trajectory length, and (2) the localization differences revealed in Experiments 2 and 3 were due to differences in action planning, not action execution, per se.
169
aapc07.fm Page 170 Wednesday, December 5, 2001 9:34 AM
170
Common mechanisms in perception and action
7.6 Experiment 4: the duration of action-relative binding In all experiments reported so far, localizations of stimuli bound in an action plan were attracted to the location of the planned effect. In the present experiment we attempted to assess the duration of such binding. We did so by presenting observers three different versions of the Intention condition, each of which was programmed to produce a small degree of delay between the observer’s button press and the actual offset of the moving stimulus. If the action plan loses its binding impact on perceptual space immediately upon action completion, localizations of stimuli having delayed offsets should not be attracted to the intended location, and there is no reason to expect the error in such localizations should vary with changes in the offset delay. If the action plan still has a binding impact on perceptual space at the moment of the delayed stimulus offset, however, the localizations should be attracted to the intended offset location, and the localization errors should vary inversely with offset delays. This is because the localizations of stimuli entailing longer offset delays would entail the growing discrepancy between intended and actual offset location.
7.6.1 Method 7.6.1.1 Participants Six female and four male students from the University of Munich, ranging in age from 20 to 37 years (mean age of 27.5 years), were paid to participate in the experiment. 7.6.1.2 Apparatus and stimuli Stimulus presentation in the Induction condition was the same as in Experiment 2. In the Delay conditions, stimulus offset was brought about by a participant-produced button press, as in Experiments 2 and 3, plus a pre-programmed, post button-press delay. There were three levels of delay: 0, 53 ms (4 frames), and 107 ms (8 frames). 7.6.1.3 Design and procedure The two task conditions (Induction vs. Delay) were presented in blocks. Stimulus velocity varied randomly within both blocks, while delay also varied randomly within the delay block. Half the participants started with the Induction condition. The participants underwent 25 repetitions per cell of the 4 × 2 within-subjects design, for a total of 200 trials. The experiment lasted approximately 40 min, including training trials. The participants were not informed that delays were utilized in the present experiment. After the experiment participants were asked whether they had noticed the delays. 7.6.2 Results and discussion None of the participants reported noticing the different delay conditions. A tendency towards an instruction effect resulted from a 2 × 2 ANOVA with the factors instruction (induction vs. intention: delay 0) and velocity, F(1, 9) = 4.00, MSE = 14.44, p = 0.077. This result replicated the 1nding of the previous experiments. A 3 × 2 ANOVA with the factors delay (0, 53, vs. 107 ms) and velocity (15.4°/ s vs. 30.8°/s) revealed a signi1cant effect of delay, F(2, 18) = 7.40, MSE = 1.68, p = 0.005. As can be seen in Fig. 7.6, the localization error varied inversely with offset delay. Note, the displacement was
aapc07.fm Page 171 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
determined relative to the actual offset of the moving target (not the position intended by the button press); thus, decreasing localization values represents localization towards the intended stopping point. In sum, the data indicate that action planning has its binding impact on perceptual space for a duration that extends beyond the actual moment of action execution. Localization errors varied inversely with offset delays. This would not have been the case had the action plan lost its binding impact immediately upon action completion. Rather, the inverse relationship between localization error and offset delay indicates that the localizations were attracted toward the intended 1nal location at all levels of delay, thus causing the localization scores to decrease as the distance between the intended and actual offset location increased. To be sure, the methodology of the present experiment only extends the binding interval 107 ms beyond the moment of action execution. Thus, further research is needed to measure the size of the interval precisely.
7.7 General discussion When observers are asked to indicate the 1nal location of a moving stimulus, their localizations tend to be displaced beyond the actual 1nal location, in the direction of stimulus motion. Traditional accounts attribute this localization error to post-perceptual action-independent cognitive processes (Hubbard 1995). The present series of experiments tested this assumed independence between action processes and localization error because recent 1ndings indicate the error may be in2uenced by the action-planning processes involved in action control (Kerzel, Jordan, and Müsseler, in press). In addition, research indicates that action planning affects spatial localization because action planning is mediated by processes that also mediate the perception of stimulus location. In short, perception and action planning seem to share common mediation. This notion of common mediation has received formal theoretical/empirical treatment in what is known as the Theory of Common Coding (Prinz 1992, 1997). Speci1cally, the theory assumes the following: (1) actions are planned in terms of their intended distal effects, (2) action planning and perception, due to their inherently distal nature, share common neural mediation, and (3) action planning produces anticipatory recruitment (i.e. binding) of the transformations mediating the perception of the intended distal effect. If this
8
mislocalization in movement direction [mm]
6
induction
delay 1
delay 0
delay 2
4 2 0 –2 –4 –6 –8 15.4˚/s
30.8˚/s velocity
Fig. 7.6
Mislocalization as a function of instruction (intention vs. induction), velocity, and delay.
171
aapc07.fm Page 172 Wednesday, December 5, 2001 9:34 AM
172
Common mechanisms in perception and action
notion of common mediation is correct, then localization error is not independent of action processes. Rather, the perceived location of a stimulus should vary as a function of whether and how the stimulus is bound in an action plan. Localizations of stimulus location therefore, should be intention relative. We began our investigation with the same type of localization task used in traditional representational momentum paradigms, save for our use of circular versus linear stimulus trajectories. In representational momentum paradigms, observers usually track the target with their eyes and, in order to do this, anticipate the future positions of the target. Given the nature of oculomotor tracking (Mitrani and Dimitrov 1978; Mitrani et al. 1979), we assumed that the forward bias observed in representational momentum experiments re2ects a tendency to localize the target toward the anticipated locations inherent in the eye-movement action plan. Consequently, we predicted the bias would disappear in a 1xation condition in which observers were not allowed to pursue the target via eye movements. Indeed, localizations made in the Pursuit condition of Experiment 1 were further in the direction of stimulus motion than those made in the Fixation condition, and the magnitude of this difference increased with increases in target velocity. Both results replicate a recent 1nding of Kerzel et al. (in press), save for the use of circular stimulus trajectories. In the subsequent experiments, eye movements were suppressed, and observers’ action plans did not require continuous anticipation of future positions of the moving target. Rather, the intended effect was the offset of the target, and it was accomplished via a button press. In this situation, we also expected localizations to be attracted toward the intended action effect. That is, localizations were expected to be biased toward the anticipated offset location. Indeed, in Experiment 2 localizations made in the Intention condition were further in the direction of the actual stopping position of the target (i.e. the location of the intended action effect) than those made in the Induction condition, even though (1) the action speci1ed in the Intention condition involved a button press (i.e. not an eye movement), and (2) the action was not directed toward the moving stimulus. In the Induction condition, displacement varied with velocity in the same manner as Experiment 1. These 1ndings indicate that at least a portion of the transformation of perceptual space associated with action control is due to action planning itself, and not to the speci1c effector speci1ed in the action plan. Experiment 3 was a replication of Experiment 2, save for a Cue condition in which participants were instructed to press a button in response to the onset of the moving stimulus. This experiment was devised to test whether differences in localization discovered in Experiment 2 were simply due to the fact that participants pressed a button in the Intention condition. If so, there should have been no differences between the Cue and Intention conditions. If, however, localizations made in Experiment 2 were due to action planning, then localizations made in the Intention condition should have been closer to the (intended) offset position than those made in the Cue condition. This is because, in the Intention condition, the 1nal position of the moving stimulus was relevant to the action plan (i.e. press the button to stop the stimulus) while in the Cue condition, the initial position was relevant (i.e. press the button in response to the onset of the stimulus). In fact, Experiment 3 revealed that the differences in localization observed in Experiment 2 were due to action planning, not action execution. Finally, in Experiment 4 we realized three versions of the Intention condition, each of which was programmed to produce a small degree of delay between the participant’s button press and the actual offset of the moving stimulus. Our hypotheses were as follows: if the target moves beyond the intended stopping position, and participants have a tendency to localize the target toward the intended stopping position, localization scores should decrease with increasing delay between
aapc07.fm Page 173 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
button press and target offset (compared to the zero delay condition) as a larger distance between the intended stopping position and the actual stopping position is traversed. This hypothesis was con1rmed. Of course, if action planning serves to bind perceptual space, then it should also lose its binding impact some time after the planned action has been executed. However, the 54 and 107 ms delays introduced in Experiment 4 were probably too short to unbind stimuli from action planning—a conclusion consistent with the observation that none of the subjects noticed the delays at all. The present 1ndings are consistent with the notion that action planning affects spatial localization. The action plans generated in all the experiments transformed perceptual space in an intentionrelative fashion. Given this anticipatory, action-relative aspect of spatial localization, it seems dif1cult to sustain the representational momentum account and its assertion that localization error is due to action-independent post-perceptual cognitive momentum. In Experiments 1, 2, and 3, the error either disappeared or reversed (i.e. became negative) when observers were asked to generate action plans that did not involve (i.e. were not related to) the final position of the moving stimulus (i.e. the Fixation, Induction, and Cue conditions, respectively). This pattern of 1ndings indicates that the error comes and goes as a function of an observer’s intentional stance relative to the stimulus. If the intention is to stop a moving stimulus, there is no localization error because localizations are attracted toward the intended offset location. If the intention is to track a stimulus, there is localization error because continuous tracking requires continuous anticipation of future locations of the stimulus, and these anticipated locations are bound in the action plan because the intended distal effect is to stay on target. In short, localization error is not just related to action planning, it is dependent on it. Localization error, therefore, does not seem to be a property of post-perceptual cognitive processes. It seems, rather to be a property of the type of relationship one is attempting to maintain with a stimulus. Given the present data and their support of the idea that localization error is dependent upon action planning, it seems dif1cult to sustain the functionally orthogonal, input–output approach to perception and action control that lies at the theoretical core of the representational momentum account. If perceptual processes and action-planning processes do share common mediation, the two, by de1nition, are dependent and, therefore, cannot be realistically modeled as constituting input- and output-control processes that are processed independently. Such problems do not arise in the Common Coding account, for the account asserts that action plans specify distal effects, not behavioral outputs. In addition, these distal effect plans are assumed to be processed via resources that are also used to perceive distal effects. As a result, spatial perception and action-control are not modeled as being functionally orthogonal. They are not assumed to be processed independently on opposite sides of the system. Instead, one might model them as being synergistically coupled. According to this approach (cf. also Jordan 1998, 1999), spatial perception and action planning, due to their common mediation, constitute a distal-effect system that allows one to both specify and detect distal events. This distal-effect system, however, although it is involved in action planning, is not responsible for the effector control required of an action. Its role, rather, is to constrain effector control systems toward the attainment of a speci1ed distal effect. This implies it should be possible for action planning to produce transformations of perceptual space, regardless of the effector speci1ed in the plan. This is exactly what happened in the present experiments. The action plans speci1ed in Experiments 1 and 2 produced intention-relative shifts in spatial perception despite the fact Experiment 1 required eye movemets while Experiment 2 required 1nger movements. Given this de-coupling of the systems underlying action planning and the systems underlying effector control, one can see how spatial perception and action control are synergistically coupled. As one
173
aapc07.fm Page 174 Wednesday, December 5, 2001 9:34 AM
174
Common mechanisms in perception and action
engages in effector control, one produces changes in body–environment relationships, and these changes feed back into the distal-effect system. This perceptual feedback then allows the distal-effect system to assess whether or not the speci1ed distal effect (i.e. the action plan) has been attained. By being able to both specify the distal states toward which effector control systems are constrained, as well as detect the changes in distal states produced by effector control, the distal-effect system can be said to constitute a distal-effect control system. And as a result of the simultaneous mutual in2uence of effector control systems and distal-effect control systems, the two can be said to be synergistically coupled. Given this approach, spatial perception and action control are not coupled in an orthogonal input– output fashion. Rather, they are nested control systems that are coupled synergistically. Action planning affects spatial localization therefore, because action planning and spatial perception share common mediation and, as a result, constitute aspects of the same distal-effect control system. This system, however, does not control distal effects by engaging in effector control. Rather, it does so by specifying the distal effects toward which effector control should be constrained, while simultaneously being sensitive to the changes in body–environment relationship produced by effector control. In conclusion, this notion of synergistically coupled nested control systems may explain why it is possible to de-couple (dissociate) perceptual space and behavioral space (Bridgeman 1999, this volume, Chapter 5; Hansen and Skavenski 1985; Prof1tt, Bhalla, Gossweiler, and Midgett 1995; Rossetti and Pisella, this volume, Chapter 4) but not perceptual space and action-planning space (Haggard, Aschersleben, Gehrke, and Prinz, this volume, Chapter 13; Hershberger and Jordan 1992; Jordan 1999; Rieser and Pick, this volume, Chapter 8; Viviani, this volume, Chapter 21). In the former, the two versions of space belong to functionally distinct, yet synergistically yoked control systems, while in the latter, they belong to the same system.
Acknowledgements This research was initiated and conducted while JSJ was a Fellow at the Max Planck Institute for Psychological Research (supported by the Alexander von Humboldt Foundation and Saint Xavier University). The experiments were supported by a grant from the Deutsche Forschungsge meinschaft (DFG As 79/3) to JM. The authors would like to thank Wayne A. Hershberger, John Rieser, Wolfgang Prinz, and an anonymous reviewer for their comments on an earlier version of the manuscript.
Note 1. If the description contains the degree scale only, the unit refers to the corresponding angle in respect to the circle! If scales giving both degree and millimeter are used, the units refer to the corresponding angle in respect to the eye.
References Actis Grosso, R., Stucchi, N., and Vicario, G.B. (1996). On the length of trajectories for moving dots. Paper presented at the Proceedings of the Twelfth Annual Meeting of the International Society for Psychophysics, Fechner Day 96, Padua, Italy.
aapc07.fm Page 175 Wednesday, December 5, 2001 9:34 AM
Action planning affects spatial localization
Bachmann, T. (1999). Twelve spatiotemporal phenomena and one explanation. In G. Aschersleben, J. Müsseler, and T. Bachmann (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 173–206. North Holland: Elsevier. Bridgeman, B. (1999). Separate representations of visual space for perception and visually guided behavior. In G. Aschersleben, J. Müsseler, and T. Bachmann (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 3–13. North Holland: Elsevier. Bridgeman, B. (2001). Attention and visually guided behavior in distinct systems. In this volume, Chapter 5. Dassonville, P. (1995). Haptic localisation and the internal representation of the hand in space. Experimental Brain Research, 106, 434–448. De Jong, R. (1993). Multiple bottlenecks in overlapping task performance. Journal of Experimental Psychology: Human Perception and Performance, 19, 965–908. De Jong, R. and Sweet, J.B. (1994). Preparatory strategies in overlapping-task performance. Perception and Psychophysics, 55, 142–151. di Pellegrino, G., Fadiga, L., Fogassi, V., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180. Finke, R.A., Freyd, J.J., and Shyi, G.C.-W. (1986). Implied velocity and acceleration induce transformation of visual memory. Journal of Experimental Psychology: General, 115, 175–188. Freyd, J.J. and Finke, R.A. (1984). Representational momentum. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 126–132. Haggard, P., Aschersleben, G., Gehrke, J., and Prinz, W. (2001). Action, binding, and awareness. In this volume, Chapter 13. Hansen, R. and Skavenski, A.A. (1985). Accuracy of spatial localizations near the time of saccadic eye movements. Vision Research, 25, 1077–1082. Hershberger, W. (1976). Afference copy, the closed-loop analogue of von Holst’s efference copy. Cybernetics Forum, 8, 97–102. Hershberger, W. (1987). Sacccadic eye movements and the perception of visual direction. Perception and Psychophysics, 41, 35–44. Hershberger, W.A. (1998). Control systems with a priori intentions register environmental disturbances a posteriori. In J.S. Jordan (Ed.), Systems theories and a priori aspects of perception, pp. 3–23. Amsterdam: Elsevier. Hershberger, W.A. and Jordan, J.S. (1992). Visual direction constancy: Perceiving the visual direction of perisaccadic 2ashes. In E. Chekaluk (Ed.), The role of eye movements in perceptual processes, pp. 1–43. Amsterdam: Elsevier. Hommel, B. (1998). Perceiving one’s own action—and what it leads to. In J.S. Jordan (Ed.), Systems theories and a priori aspects of perception, pp. 143–179. Amsterdam: Elsevier. Hubbard, T.L. (1995). Environmental invariants in the representation of motion: Implied dynamics and representational momentum, gravity, friction, and centripetal force. Psychonomic Bulletin and Review, 2, 322–338. James, W. (1890/1950). The principles of psychology, Vol. 2. New York: Henry Holt. Jolicœur, P. (1999). Dual-task interference and visual encoding. Journal of Experimental Psychology: Human Perception and Performance, 25, 296–616. Jordan, J.S. (1998). Recasting Dewey’s critique of the re2ex-arc concept via a theory of anticipatory consciousness: Implications for theories of perception. New Ideas in Psychology, 16(3), 165–187. Jordan, J.S. (1999). Cognition and spatial perception: Production of output or control of input? In G. Aschersleben, J. Müsseler, and T. Bachmann (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 69–90. North Holland: Elsevier. Kerzel, D., Jordan, J.S., and Müsseler, J. (in press). The role of perceptual anticipation in the localization of the 1nal position of a moving target. Journal of Experimental Psychology: Human Perception and Performance. Klein, R. (1988). Inhibitory tagging system facilitates visual search. Nature, 334, 430–431. Mitrani, L. and Dimitrov, G. (1978). Pursuit eye movements of a disappearing moving target. Vision Research, 18, 537–539. Mitrani, L., Dimitrov, G., Yakimoff, N., and Mateeff, S. (1979). Oculomotor and perceptual localization during smooth eye movements. Vision Research, 19(5), 609–12. Müsseler, J. (1999). How independent from action control is perception? An event-coding account for more equally ranked crosstalks. In G. Aschersleben, J. Müsseler, and T. Bachmann (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 121–147. Amsterdam: Elsevier. Müsseler, J. and Wühr, P. (2001). Response-evoked interference in visual encoding. In this volume, Chapter 25.
175
aapc07.fm Page 176 Wednesday, December 5, 2001 9:34 AM
176
Common mechanisms in perception and action
Müsseler, J., Van der Heijden, A.H.C., Mahmud, S.H., Deubel, H., and Ertsey, S. (1999). Relative mislocalization of brie2y presented stimuli in the retinal periphery. Perception and Psychophysics, 61, 1646–1661. Müsseler, J., Stork, S., and Kerzel, D. (2001). Comparing mislocalizations in movement direction: The Fröhlich effect, the 2ash-lag effect and the representational momentum effect (submitted for publication). O’Regan, J.K. (1984). Retinal versus extraretinal in2uences in 2ash localization during saccadic eye movements in the presence of a visible background. Perception and Psychophysics, 36, 1–14. Osaka, N. (1977). Effect of refraction on perceived locus of a target in the peripheral visual 1eld. Journal of Psychology, 95, 59–62. Posner, M.I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3–25. Posner, M.I. and Cohen, Y. (1984). Components of visual orienting. In H. Bouma and Bouwhuis (Eds.), Control of language processes: attention and performance, Vol. 10, pp. 531–556. Hillsdale, NJ: Erlbaum. Prinz, W. (1992). Why don’t we perceive our brain states? European Journal of Cognitive Psychology, 4(1), 1–20. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Prof1tt, D., Bhalla, M., Gossweiler, R., and Midgett, J. (1995). Perceiving geographical slant. Psychonomic Bulletin and Review, 2(4), 409–428. Rieser, J. and Pick, H. (2001). The perception and representation of human locomotion. In this volume, Chapter 8. Rizzolatti, G., Riggio, L., Dascola, I., and Umiltà, C. (1987). Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention. Special Issue: Selective visual attention. Neuropsychologia, 25, 31–40. Robinson, D.A. (1968). Eye movement control in primates. Science, 184, 1219–1224. Roelfsema, P.R., Engel, A.K., König, P., and Singer, W. (1997). Visuomotor integration is associated with zero time-lag synchronization among cortical areas. Nature, 385, 157–161. Rossetti, Y. and Pisella, L. (2001). Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions. In this volume, Chapter 4. Schneider, W. and Deubel, H. (2001). Selection-for-perception and selection-for-spatial-motor-action are coupled by visual attention. In this volume, Chapter 30. Taira, M., Mine, S., Georgopoulos, A.P., Murata, A., and Sakata, H. (1990). Parietal cortex neurons of the monkey related to the visual guidance of hand movement. Experimental Brain Research, 83, 29–36. Thornton, I.M. (2001). The onset repulsion effect. (submitted for publication). Van der Heijden, A.H.C., Müsseler, J., and Bridgeman, B. (1999). On the perception of positions. In Aschersleben, G., Bachmann, T., and Müsseler, J. (Eds.). Cognitive contributions to the perception of spatial and temporal events, pp. 19–37. Amsterdam: Elsevier. Viviani, P. (2001). Motor competence in the perception of dynamic events. In this volume, Chapter 21. von Holst, E. and Mittelstaedt, H. (1950). Das Reafferenzprinzip. Naturwissenschaften, 37, 464–476. Wolff, P. (1999). Space perception and intended action. In G. Aschersleben, J. Müsseler, and T. Bachmann (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 43–63. North Holland: Elsevier.
aapc08.fm Page 177 Wednesday, December 5, 2001 9:35 AM
8 The perception and representation of human locomotion John J. Rieser and Herbert L. Pick, Jr.
Abstract. Locomotion is both a class of actions and a class of perceptible events. As a class of actions it is embedded in its environmental context and participates in 1ght, 2ight, and many functional behaviors. As a class of perceptible events, it is central to dynamic spatial orientation, that is, keeping up-to-date on the changes in self-to-object distances and directions that occur during locomotion. This chapter is focused on locomotion and on spatial orientation when people walk without vision and without access to nonvisual information about their surroundings. The thesis is that even when walking without vision, people perceive their locomotion relative to the remembered environment as a frame of reference. This environment-centered perception of locomotion accounts for how they keep up-to-date on their changing spatial orientation relative to features of the remembered surroundings. And the resulting representation is used to steer their ongoing locomotion and control all other environmentally directed actions.
Locomotion is organized around the environment as a frame of reference, both as a class of actions and as a class of perceptible events. As a class of actions, people typically control their locomotion, so that navigation is safe and productive with respect to the relevant objects, events, and features of the surrounding environment. As a class of perceptible events, people tend to perceive their locomotion relative to the environment as a frame of reference. In other words, people tend to notice where they are standing and how they are facing relative to features of their surroundings. An implication of this is that the complementary aspect of the perception of locomotion is that people keep upto-date on the dynamic changes in spatial orientation that result from locomotion, and the resulting dynamic representation is a basis for steering locomotion and controlling other environmentally directed actions. When people walk with vision, they can steer their locomotion and control their actions relative to their goals and other visible features of their surroundings. However, people do not typically steer their locomotion relative to goals that are continuously in view, because, for example, they look away from their destination while walking, or the destination is occluded by obstacles, or because it is dark, or because they are blind. Nonetheless, even when walking without vision people steer their walking relative to remembered targets with relatively high precision across paths that vary in length and complexity (Loomis and Klatzky 1993; Rieser, Ashmead, Talor, and Youngquist 1990; Rieser, Guth, and Hill 1986; Steenhuis and Goodale 1988; Thompson 1983). This chapter is about how people steer locomotion when they walk without vision and without nonvisual forms of environmental feedback. When people locomote in situations where they can see and hear, locomotion results in optical and non-optical forms of environmental 2ow which specify the changes in their distances and directions
aapc08.fm Page 178 Wednesday, December 5, 2001 9:35 AM
178
Common mechanisms in perception and action
relative to features of the surroundings. The main thesis of this chapter is that when people locomote in situations where they cannot see or hear, they perceive their walking relative to the remembered surroundings. Motor information is used to update the changes in spatial orientation, as if they could see and hear the environmental 2ow. This dynamic representation of spatial orientation serves, in turn, as a basis for steering ongoing locomotion and as a basis for controlling the wide range of actions that are directed toward features of the remembered surroundings. From our point of view it is dif1cult to dissociate the actions of locomotion from the perception and representation of locomotion, since we believe they result from the same processes. The three major sections of this chapter are organized around three central claims. The 1rst claim is about the different frames of reference that can serve to organize perception and action. Whereas people can 2exibly perceive their locomotion relative to the body as a frame of reference, we hypothesize people tend to perceive it relative to relevant features of the surrounding environment. This tendency makes sense, since many adaptive behaviors are coordinated with features of the environment. In what situations, we ask, do people tend to perceive their own locomotion when walking without vision as a temporal sequence of limb positions, and in what situations do they perceive it relative to the surroundings as a frame of reference? The second claim is about the processes that account for how locomotion can be organized relative to the environment, especially when people walk without vision—how is it that the motor information while walking with or without vision is integrated with the remembered features of the surrounding environment? We hypothesize the efferent and many forms of afferent information are integrated via perceptual–motor learning processes and result in a uni1ed perception of locomotion and its complementary aspect, the dynamic representation of spatial orientation. And the third claim is about the generality of this uni1ed, dynamic representation. We hypothesize that it is the basis for steering locomotion with and without vision, and for controlling the forces and directions of all actions that are directed toward features of the remembered environment. According to our view, a single representation and a single set of processes mediate the control of ‘locomotion as action’, the perception of ‘locomotion as event’, and in addition it mediates the broad range of other actions that are directed toward features of the surrounding environment.
8.1 Actions and their multiple frames of reference Some actions are directed toward one’s own body surfaces and thus are coordinated with a limbcentered or body-centered frame of reference. Examples of this include scratching an itch, swatting mosquitoes, buttoning a shirt, and brushing one’s teeth. For each of these, the details of the motor actions needed to scratch, swat, button, or brush depend on the position of the targeted area of the body relative to the active limb that effects the action; they do not generally depend on the active limb’s position relative to features of the surroundings. However, most actions are directed toward objects or features of the surrounding environment, and thus are coordinated with an environmental frame of reference. This is the case for actions that involve very different systems of motor effectors and very different categories of functional goals— for example, using the hand–arm–trunk system to throw a football to a receiver, using the mouth– vocal tract system to communicate with a conversational partner, and using the legs–trunk system to jump a stream in the woods. In each case, the force and direction of the needed actions take into account the distance and direction of one or more objects or features of the surrounding environment—the receiver, the partner, and the stream in the examples above.
aapc08.fm Page 179 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
People often initiate actions while they walk or run—for example, they throw a football while running to evade a tackler, they shout across the room to talk with one friend while running to catch up with another friend, and they jump across a stream while running to catch their pet dog. In each of these cases, the force and direction of the throw, the shout, and the jump depend on the person’s shifting location, and so the force and direction of the action need to be planned in anticipation of the person’s future location, that is, the location at the instant the action is initiated. This is the case whenever people are underway. This chapter is focused on situations that are the extreme case of this, namely where people walk and act without vision and without access to non-visual information about the target’s location. When walking without vision, people steer their walking relative to remembered targets. To accomplish this they need to integrate the motor information for the walking with their memory representations of their surroundings. Likewise, when throwing or communicating while walking, people need to coordinate the force of their throws and volume of their conversation with respect to their changing distance from the remembered targets in the surroundings. To do this they need to perceive their locomotion relative to features of the remembered surroundings and coordinate their throws and shouts with this representation. We focus on situations where people guide their actions while walking without vision. In our experiments people typically look at their surroundings from one point of observation, close their eyes, and guide their walking, turning, or whispering relative to features of the remembered surroundings while they are walking.
8.2 Locomotion and the frames of reference for perceiving locomotion Whether walking with vision or without vision, people typically control their walking relative to features of their immediate environment. Consider two strategies that could be used to accomplish this. According to the 1rst strategy, people would perceive their locomotion in body-centered terms as the temporal sequence of limb and body positions that occur during locomotion. In order to steer their locomotion relative to the surrounding environment, they would then need to integrate their body-centered perceptions with real-time input for the surroundings (when walking with vision or audition) or with their representation of the surroundings (when walking with their eyes closed). Although this is a plausible strategy, it seems relatively inef1cient because it hypothesizes that a body-centered stage of perception intervenes between the sensory inputs and the environmentcentered control that is needed. According to the second strategy, people would perceive their locomotion in environmentcentered terms. When walking with eyes open or with useful auditory information, this could be accomplished by attending to sights and sounds that directly specify one’s time to reach the target, or the target’s distance and direction. However, when walking without being able to see or hear, these environmental frame-of-reference cues are not available, and yet people steer their walking with considerable precision. Our hypothesis is that guiding locomotion without vision is highly similar to steering it with vision, and is based on perceptual–motor learning that occurs when walking while able to see and hear the environment. That is, when walking with visual and nonvisual environmental feedback, people learn the correlation of the temporal 2ow of motor information with the temporal 2ow of the environment relative to their shifting points of observation. Then while walking without vision, they draw on this knowledge in order to perceive their walking in environmental
179
aapc08.fm Page 180 Wednesday, December 5, 2001 9:35 AM
180
Common mechanisms in perception and action
terms. According to this strategy, walking without vision is generally steered the same way as walking with vision. These two alternatives for how self-locomotion is perceived, we argue, are similar to the classic distinction of the proximal versus distal stimulus in vision and in haptic perception. For example, Gibson distinguished these in his writings about the haptic perception of objects (Gibson 1962, 1963, 1966). His analysis recognized that the haptic identi1cation of objects depended on patterns of skin pressures and joint con1gurations. However, he rejected the idea that object perception depended on a stage of awareness of the sensations of the pressures and con1gurations. Rather he distinguished between ‘input of information’ and ‘input of sensory quality’ (Gibson 1966, p. 108). His observations of the haptic perception of sculptured objects that were nonsense forms indicated that often all ten 1ngers moved when people explored a particular object and that the patterns of exploration differed from moment to moment and from trial to trial. The patterns of cutaneous pressures and joint con1gurations (the input of sensory quality) were dif1cult to detect and describe. But the patterns of relations of one part of an object to another (the input of information) resulted in object identi1cation and it came relatively easily. Describing this in relation to judgments of weight and material composition, he suggested the ‘2ux of subjective sensations is so complex as to be unreportable’ (Gibson 1966, p. 127). Since that time, Lederman and Klatzky (1987, 1998) have described very systematically the various haptic movements observers use when exploring objects by touch in order to compare them on different features.
8.2.1 Proprioception and a body-centered frame of reference for perceiving self-locomotion What would it mean to perceive locomotion relative to the body as a frame of reference? Classically, proprioception is de1ned for a stationary person as the perception of static limb positions relative to other static limb positions, and for a moving person it is de1ned as the temporal series of limb positions relative to other limb positions (sometimes this is called kinesthesis). In order to perceive self-locomotion proprioceptively (that is, in body-centered terms), one could attend to the sequence of one leg in relation to the other, or of both legs in relation to the trunk, and these could be integrated over time as the intrinsic shape of the path walked. We have been able to think of relatively few situations that call for the proprioceptive perception and control of locomotion. As one example, consider the 1gure eights and other compulsory 1gures practiced by competitive 1gure skaters, which are de1ned as shapes irrespective of the shape’s position relative to the surroundings. As another example, consider the pliés and hand positions of classical ballet. What information could serve as the basis of body-centered perception of locomotion when walking without vision? When walking, information is transmitted from receptors associated with movements of joints and muscles that signal changes in limb position, and these are integrated over the space and time of locomotion to result in a record of the movements. In addition, the semicircular canals of the vestibular system signal angular accelerations and the otoliths signal linear accelerations and gravity. Since people tend to walk paths consisting of segments with relatively constant speeds separated by turns, it makes sense to suppose that the vestibular information on its own could signal the distances turned accurately but could not signal the distances translated accurately (Berthoz, Israel, Georges-Frances, Grasso, and Tsuzuku 1995). Joint information, muscle information, and in some cases deep pressure information could be used in conjunction with vestibular information to result in a uni1ed representation of the shape of one’s path (Guendry 1974); some believe that the vestibular inputs provide a necessary basis for this integration (Kornhuber 1974).
aapc08.fm Page 181 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
What, in principal, are the geometric properties of locomotion that could be perceived from such an integration, in a form like this that is disconnected from the surrounding environment? We limit this discussion to locomotion within the ground plane. In terms of perceiving the direction of locomotion, people could perceive themselves rotating to their left or right, and perceive themselves translating forward, backward, left, or right. In terms of perceiving distance and/or its derivatives of time, consider what could be perceived 1rst for rotational locomotion and then for translational locomotion. For rotational locomotion, there seems to be an intrinsic body-centered distance scale, namely the distance of a full turn. We do not know the origin of this scale. For example, we are not aware of anything about the neurophysiology of the vestibular canals that would identify a full turn as a special category, so perhaps its categorical quality re2ects perceptual–motor learning of the amount of motor activity that it takes to turn to spot a feature in the surroundings and continue turning in the same direction until returning to face the original feature. Whether or not it is learned, it emerges quite early, at least by the second year of life (Rieser and Heiman 1982). For translational locomotion, distances walked could be encoded in body-scaled units such as strides or eye heights. For translations and rotations alike, in the processing stream the body-centered perception of selflocomotion would initially be disembedded from the environment, since neither joints, muscles, or vestibular organs know anything about the surroundings.
8.2.2 Exteroception and an environmental frame of reference for perceiving self-locomotion An alternative strategy is that people perceive their locomotion relative to the environment as a frame of reference, directly in terms of their changing positions and facing directions relative to the seen, heard, or remembered surroundings as a frame of reference. Lee (1978) noted that the classical use of the term ‘exteroception’ referred to the perception of the environment and that the classic use of the term ‘proprioception’ referred to the perception of the body. He coined the term ‘exproprioceptive’ to label situations such as this, where actions are organized relative to the environment as a frame of reference. Consider what it would mean to perceive self-locomotion in environmental terms. Distances translated would be noticed in terms of one’s changing position relative to one or more of the features of the seen, heard, or remembered surroundings. Likewise, distances turned would be noticed as changes in angles to environmental features and changes in perspective views of objects and environmental features. We hypothesize that when people locomote without vision they can direct their attention outward toward the surrounding environment or inward toward their own body positions. In the next section we review some of the classical studies of the psychophysics of self-locomotion and show that they focus on proprioception. Then we present evidence from our laboratories indicating that when people walk with or without vision, they typically focus their attention on their changing orientation relative to the seen or remembered environment; they do not typically focus their attention on the sequences of body positions that occur while they walk.
8.2.3 Traditional studies of the psychophysics of self-locomotion emphasize perceiving in body-centered terms The perception of self-locomotion results from the integration of efferent and of multiple forms of afferent information. We are not aware of empirical demonstrations of the role played by efferent
181
aapc08.fm Page 182 Wednesday, December 5, 2001 9:35 AM
182
Common mechanisms in perception and action
information in the perception of self-locomotion. In classic studies, Mountcastle (1958) and Taub (1968) recorded the changes in proprioception and in motor control that occurred in patients with anesthetized limbs or with deafferented limbs. We are not aware of analogous studies of the perception of locomotion, and so we do not know what role, if any, that efference plays in the perception of self-locomotion. Intuitively, it seems plausible that efference plays a role. Consider, for example, the progressively diminished perceptions that might occur when one walks actively without vision to a predetermined goal, versus when one walks by grasping the upper arm of a sighted guide in order to follow the guide’s path to an unknown destination, versus being pushed in a wheelchair to an unknown destination. Multiple forms of afferent information contribute to the perception of locomotion and each has been studied in isolation from the others. The contributions of the vestibular organs have long been noted as contributing to the perception of turning even while sitting in a dark room (Wade 1998). The contributions of vision were noted and studied during the nineteenth century. For example, Helmholtz (1858) remarked on the moving train illusion (the illusion of self-movement when sitting in a stationary train while the train on the adjacent track begins to move); and Mach (1875) used an optokinetic drum to study circular vection. The contributions of the biomechanical system were noted more recently through the use of circular treadmills. Analogous to linear treadmills, on circular treadmills the 2oor rotates, and people step in place in order to compensate for the 2oor’s rotation. Bles (1981), Lackner (1988), and Garing (1999) asked people to close their eyes and step in place while walking on circular treadmills. All the subjects reported a powerful illusory perception that they were physically turning. Garing (1999) studied the psychophysics of the perception of rotational locomotion with vestibular input alone, visual input alone, and biomechanical input alone. In all three conditions the subjects stood atop a circular treadmill that was surrounded by a striped cylindrical curtain. The treadmill consisted of a waist high T-bar centered within a 122 cm disc. Subjects stood atop the disc while grasping the T-bar. When the T-bar turned (it was driven by an electric motor beneath the housing), subjects were asked to step around in a circle in order to keep the T-bar centered at their waists. When the T-bar was stationary and the disc rotated (it was driven by another electric motor beneath the disc), subjects were asked to step in place in order to compensate for the disc’s rotation. In the vestibular condition the subjects closed their eyes and grasped the T-bar (now freely spinning) while they were passively spun at 5 rpm; in the visual condition the subjects opened their eyes and stood still while a cylindrical curtain rotated around them at 5 rpm. And in the biomechanical condition the subjects grasped the stationary t-bar and stepped in place to compensate for the 2oor, which rotated at 5 rpm under their feet. In all three conditions, every subject reported a sense of self-rotation. Subjects varied in how soon they 1rst reported the sense of self-rotation in the different conditions, averaging 0 s, 4 s, and 1 s until the 1rst reports for the vestibular, visual, and biomechanical conditions, respectively. Classical studies of the psychophysics of locomotion assess locomotion in body-centered terms. The subjects were typically asked to judge proprioceptive qualities and quantities—for example, they were asked to say whether or not they were moving, to judge their direction of movement in body-centered terms (e.g. left, right, forward, backward), and to rate their speed or acceleration in terms of environment-free scales. Guendry’s (1974) chapter on the psychophysics of vestibular sensation is a powerful example of the body-centered approach to the perception of self-locomotion. In the studies covered in his review, the subjects sat in the dark in a rotating chair or on a linearly accelerating cart and were asked to say whether, in what direction, and at what rate they were moving
aapc08.fm Page 183 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
(in revolutions per minute or in meters per second). In none of the studies were subjects asked to say how their self-motions changed their distance or direction relative to features of the seen or remembered surrounding environment. The body-centered approach is also apparent in studies describing the psychophysics of the visually based perception of self-locomotion. In the important studies of visually based circular vection by Dichgans and Brandt (1972, 1978), the subjects typically stood while surrounded by a striped cylinder (an optokinetic drum) that rotated around them, or they viewed optical simulations of environmental rotation that were projected onto screens (Post 1988). When the information for cylinder rotation started, subjects typically 1rst reported that the cylinder appeared to be rotating (e.g. object motion), but then after a few seconds their perception shifted to self-motion, and they reported seeing themselves rotating within a stationary cylinder (Wertheim 1994). Again, people were asked to describe their self-motion in body-centered terms—to say whether they perceived themselves to be rotating and, if so, to say whether the rotation was clockwise or counterclockwise in direction and to estimate the speed in revolutions per minute. People were not asked to judge their turning in environmental terms, for example, by judging how far they turned relative to remembered features of their surroundings outside of the rotating drum. In studies of visually induced linear vection, subjects watch visual displays that show the radial or lamellar 2ow that speci1es translatory locomotion. The classical studies of visually induced linear vection also illustrate the body-centered approach, since the subjects were typically asked to say whether they perceived themselves to be moving forward and, if so, to judge the rate of movement (e.g. Berthoz, Pavard, and Young 1975; Lishman and Lee 1973; Johannson 1977; Telford and Frost 1993; Warren 1995). However, in recent studies (Giannopulu and Lepecq 1998; Israel 1999; Lepecq, Jouen, and Dubon 1993) the subjects were asked to judge their changing distances relative to a landmark that was out of sight in the remembered surroundings.
8.2.4 People’s self-reports of what they notice when walking without vision Our thesis is that the frame of reference for perceiving self-locomotion is 2exible and is determined by the perceiver’s goals. Since so many goals are embedded functionally in the surrounding environment, people tend to experience their self-locomotion in environmental terms. However, goals are sometimes independent of the particular environmental circumstances, and we suppose people can shift their attention onto their body surfaces or other possible frames of reference. To 1nd out about this, we collected systematic interview data, asking people what they noticed when they walked without vision with a sighted guide. At the start of each of three conditions, the tester and subject stood in a large open room and chatted about the doors, windows, and desks in the room as part of the conversation. Then subjects were asked to walk with the tester while they continued to chat. In one condition the subjects walked and chatted with their eyes open, in a second condition the subjects wore a blindfold and sound system while walking and chatting, and in a third condition the blindfolded subjects were asked to pay attention to the intrinsic shape of the path that they walked. Subjects were not given any additional instructions, and no one was informed that there was any particular purpose to walking and chatting. Ten subjects participated in each of the three conditions. The walk had six turns, and its shape can be represented as an ‘X enclosed in a square’. After the walk, the subjects were asked to tell about the walk in their own words. All ten subjects who walked with their eyes open described their walks exproprioceptively, in terms of the how they faced and where they turned relative to the features of the room. Nobody spontaneously reported the shape of
183
aapc08.fm Page 184 Wednesday, December 5, 2001 9:35 AM
184
Common mechanisms in perception and action
the walk or the number of turns. When they were asked, eight of the subjects were able to 1gure out the number of turns; none described anything like having walked anything like ‘an X in a square’. The results were very similar in the blindfolded condition where the subjects were not instructed to attend to the shape of the walk. Even in the condition where the blindfolded subjects were instructed to attend to the shape of the path they walked, although all of the subjects reported making six turns and seven reported crossing their own path (the crossing in the ‘X’), none reported walking along anything like an ‘X enclosed in a square’. The point of this is to show that people tend to perceive the distances they walk and turn in terms of features of the surrounding environment and that they do so even when they are not able to see or hear the features of their surroundings. Although the environment serves as the ‘natural’ frame of reference under these conditions, it is also clear that people can adjust their attention and notice body-centered features of their locomotion like the shape of the path, number of turns, and so forth. It may be the case that it is easier to talk about one’s locomotion in environmental terms than in body-centered terms. If so, then it might be that people tend to perceive their blindfolded walking relative to their bodies as a frame of reference, but then translate body-centered perceptions into environmental terms because it is easier to talk in environmental terms. The experimental studies of path integration that are described in the next section relate to this possibility.
8.2.5 Path integration when walking without vision is more accurate when people know the surrounding environment If people tend to perceive their locomotion in terms of their shifting distances and directions relative to features of the surroundings, we reasoned they should perceive their locomotion more accurately when walking in surroundings that were differentiated with nearby features and less accurately in surroundings that did not have distinguishable features. Rieser, Frymire, and Berry (1998; Frymire 1999) conducted an experiment to 1nd out whether this was the case. Blindfolded subjects were asked to walk with a sighted guide along a path and then turn to face their starting position at the end of the path. The paths had four legs and averaged 30 m in length; on a typical path the subjects walked 10 m straight ahead, turned right and walked 10 m, turned right again and walked 5 m, and 1nally turned right again and walked 5 m and stopped. After walking paths like this, subjects were asked to turn and face their remembered starting position, and errors in their facing directions were assessed with an electronic compass. We devised three conditions to evaluate whether people turned to face the remembered starting point of their path more accurately when they had a differentiated representation of their immediate surroundings than when they did not know their surroundings. Our preliminary observations were that when people attempted path integration tasks in situations where they did not know their surroundings, they reported that it is like walking on a large open homogeneous 1eld, a ‘Ganzfeld’ if you will. In the ‘virtual Ganz feld’ condition, subjects were equipped with a blindfold so they could not see and a sound system, so that they could converse easily but could not localize what they heard in space. Then they were guided via circuitous routes around campus for about ten minutes, while answering questions they were asked by their sighted guide. The purpose of the walk was to disorient the subjects, and it succeeded according to each subject. In the ‘actual surroundings’ condition, subjects looked at their actual surroundings at the beginning of each path integration trial. Then they put on the blindfold, and walked the path with the sighted guide. While walking, they were asked to keep several features of the surroundings in mind
aapc08.fm Page 185 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
as well as their starting point. And 1nally, in the ‘imagined surroundings’ condition, subjects were disoriented as in the ‘virtual Ganzfeld’ condition, but then they were asked to imagine they were standing in a familiar location and given time to generate an image of the locale. Having generated an image of the familiar location, they were instructed to keep the imagined landmarks in mind while walking the path and while turning to face the starting point. The errors 1t the predictions closely. The subjects averaged 20 deg of error when turning to face the starting position in the ‘actual surroundings’ condition, 35 deg in the ‘imagined surroundings’ condition, and 47 deg in the ‘virtual Ganzfeld’ condition. The thesis that people tend to perceive their locomotion in environment-centered terms led to this prediction. But on the other hand, if people tend to perceive their locomotion in body-centered terms, then one might have predicted better performance in the ‘virtual Ganzfeld’ condition than in the other two conditions. The reason for this is that people do not need to know anything about their surroundings in order to perform path integration tasks, and the instruction to focus attention on the remembered features of the surroundings while walking the to-be-integrated path might have placed an unhelpful load on working memory. In summary, when people walk without vision they generally describe their walks in environmental terms, as if they tend to perceive their walks in environmental terms. In addition, when they are asked to perceive a path that they walk in order to judge the direction back to their starting point, they do so more accurately when they have the chance to perceive their walk relative to the layout of landmarks in the remembered surroundings than when they do not know the layout of landmarks in their remembered surroundings.
8.3 Re-calibration of locomotion in environmental terms Recall the central thesis that perceptual–motor learning accounts for how people perceive their walking without vision in environmental terms. The basic idea is that while walking with vision people learn how the world looks (sounds or feels) as a function of variations in their walking. Then while walking without vision, since people already know the dynamic changes in how the world looks and sounds and feels, they draw on this knowledge to perceive their locomotion in environmentcentered terms, and thus, they represent the dynamic changes in their spatial orientation. If this hypothesis is correct, then manipulations of the relation between locomotor actions and environmental 2ow should lead to corresponding changes in spatial orientation and in the guidance of walking without vision (e.g. Pick, Rieser, Wagner, and Garing 1999; Rieser 1999; Rieser, Pick, Ashmead, and Garing 1995). We brie2y summarize our methods and central 1ndings below, for translational locomotion (generally forward walking) and for rotational locomotion (stepping in order to turn in place).
8.3.1 The recalibration of translational locomotion The methods consisted of pre-tests, a learning intervention, and post-tests (Rieser et al. 1995). During the pre- and post-tests, subjects stood in an open 1eld, viewed a target located about eight meters straight ahead, and then attempted to walk without vision to the remembered target location. During the learning intervention phase, subjects walked either in a biomechanically faster condition (the rate of walking was paired with a slower-than-normal rate of movement through the surroundings) or in a biomechanically slower condition (the rate of walking was paired with a faster-than-normal rate of movement through the surroundings).
185
aapc08.fm Page 186 Wednesday, December 5, 2001 9:35 AM
186
Common mechanisms in perception and action
The equipment used to arrange these conditions consisted of a motor-driven treadmill mounted on a trailer that was towed by a tractor. Subjects were asked to walk at one rate on the motor-driven treadmill while they were towed at a different rate behind the tractor. The motor-driven treadmill determined the rate of the biomechanical activity involved in the subject’s locomotion, while the speed of the tractor determined the environmental 2ow rate. After walking for ten minutes in the biomechanically slower condition, subjects tended to stop short of the remembered target during the post-tests, as if they perceived their rates of walking as faster than their actual rates in the scale of the surrounding environment. After walking for ten minutes in the biomechanically faster condition, on the other hand, they tended to walk past the target, as if they perceived their rates of walking as slower than their actual rates in the environmental scale. These experiments show that people perceive the scale of their own translational locomotion relative to the remembered surroundings as a frame of reference.
8.3.2 Recalibration of rotational locomotion Analogous pretest-intervention-posttest methods were used to change the calibration of rotational locomotion, but the circular treadmill described in an earlier section was used to change the calibration of turning in place instead of the tractor and linear treadmill used to change the calibration of forward walking (see Pick et al. 1999; Rieser et al. 1995). In this case during the pre- and post-tests, subjects stood in the laboratory, faced a target that was straight ahead of them, put on a blindfold, and then attempted to turn 360 deg in the clockwise or counterclockwise direction so they again faced a remembered target. During the learning intervention phase, subjects walked on the circular treadmill. In the biomechanically faster condition, the treadmill’s disc rotated at 5 revolutions per minute (rpm) while the T-bar rotated 5 rpm in the opposite direction. The result was that subjects physically turned at a rate of 5 rpm relative to the environment, but their feet and legs moved at a rate of 10 rpm. In the biomechanically slower condition, the treadmill’s disc rotated at 10 rpm while the handle rotated 5 rpm in the same direction. The result was that subjects physically turned at a rate of 10 rpm relative to the environment, but their feet and legs moved at a rate of 5 rpm. The results demonstrated a recalibration of people’s rotational locomotion. The post-tests after the biomechanically faster condition showed that subjects tended to turn about 90 deg too far during the post-tests, as if they perceived themselves to be turning at a slower rate relative to the remembered surroundings than they were actually turning. The post-tests after the biomechanically slower condition showed the opposite pattern. The subjects tended to stop about 50 deg short of the remembered target, as if they perceived themselves to be turning at a faster rate relative to the remembered surroundings than they were actually turning.
8.3.3 How are the changes in calibration organized? Are actions, including locomotion, organized as a single global system? Or are they organized more narrowly, for example around functional categories of action, or around speci1c limb systems or around speci1c muscle groups? To 1nd out about this we have probed the organization of locomotion by testing for transfer of the changes in calibration that we induced experimentally to other forms of locomotion that were not practiced during the recalibration phases of our experiments. Consider some of the alternative possible organizations. At one extreme, changes in the calibration of locomotion might transfer very broadly, so that the gain of all forms of locomotion and
aapc08.fm Page 187 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
forms of non-locomotor actions all increased or decreased in similar ways. In order to assess this possibility, Rieser et al. (1995) induced changes in the calibration of forward walking in the ‘biomechanically faster’ condition, so that the subjects all walked too far during the post-tests, and tested to see if the ‘gain’ induced in the calibration of their forward walking would transfer to their throwing a bean bag at a remembered target. The results showed that it did not transfer, indicating that the calibrations mapping actions onto the surrounding environment are not globally organized to encompass all actions. Another extreme possibility is that changes in the calibration of locomotion might be very speci1c and not transfer to any other actions or to any other forms of locomotion. To assess this, Rieser et al. (1995) again induced changes in the calibration of forward walking in the ‘biomechanically faster’ condition, so that the subjects all walked too far during the post-tests. And then we tested to 1nd out if the subjects would also walk too far when assessed with a very different gait, namely sidestepping. In order to sidestep to the target, subjects started by facing the target, turning so that it was 90 deg to their left or right, putting on their blindfold, and then moving their feet left (or right) to reach the remembered target. The step cycle and gait were quite different from forward walking, which consisted of a smooth and continuous series of steps. The cycle for sidestepping consisted of the subject moving his or her target-side foot one pace to the left (or right) toward the target, pausing in order to bring the other foot to meet the target-side foot, pausing again, and repeating the cycle. Despite the differences between forward walking and sidestepping in the muscle groups, gaits, and step cycles, each subject walked too far to the remembered target during the sidestepping post-tests, and the magnitude of the overshoot when sidestepping was virtually the same as the overshoot during the forward-walking post-tests. The results showed that the experimentally induced change in the calibration of forward walking did transfer to sidestepping. Thus, according to these 1ndings, actions are not globally organized nor are they anatomically organized at the level of particular muscle groups, step cycles, or gaits. An additional 1nding indicates that they are functionally organized—that is, actions that accomplish the same function are organized as a functional system, so that changes induced in the calibration of one method of accomplishing the function transfer to other methods of accomplishing that same function. The hypothesis of a functional organization is consistent with the positive transfer that was obtained for translational locomotion from forward walking to sidestepping. Recently, Berry (2000) put the hypothesis of a functional organization of rotational locomotion to a strong test. She induced changes in the calibration of rotational locomotion by asking subjects to walk on the circular treadmill in the ‘biomechanically faster’ condition, where their feet were stepping at a 10-rpm rate while they were physically turning relative to the surroundings at a 5-rpm rate. As expected, during the post-tests the subjects overshot the remembered target, turning too far by an average of about 90 deg. Berry wished to 1nd out whether the changes in rotational locomotion that were experimentally induced while people turned themselves by stepping with their feet and legs would transfer to turning themselves by their hands and arms. To assess turning by hands and arms, she asked subjects to stand on a swivel on the 2oor inside a round railing that was within reach. During the pre-tests and post-tests the subjects faced the target, put on a blindfold, and pushed themselves around by moving the hand over hand along the railing. Berry’s results were that the subjects turned too far by hand, averaging about 45 deg of overshoot. This indicates that the experimentally induced change in the calibration of turning by foot partially transferred to turning by hand. It is consistent with the hypothesis that locomotion is organized as a functional system.
187
aapc08.fm Page 188 Wednesday, December 5, 2001 9:35 AM
188
Common mechanisms in perception and action
8.4 Unified representations of dynamic spatial orientation People’s perception of locomotion is calibrated in environmental terms and our ongoing experiments indicate the calibrations are functionally organized. In this section we consider the thesis that the perception of locomotion in environmental terms results in a representation of the dynamic changes in spatial orientation that occur during locomotion. And that this representation is uni1ed and is used to guide a wide range of environmentally directed actions that people produce when they walk without vision. Loomis and his colleagues provided support for the analogous thesis for visual space perception (Loomis, Da-Silva, Philbeck, and Fukusima 1996; Philbeck, Loomis, and Beall 1997). By ‘uni1ed’ we mean that the representation of dynamic spatial orientation results from the perception of one’s own locomotion, and in addition, that the same representation mediates the guidance of locomotion and of other forms of environmentally directed actions. The representation likely re2ects parallel processing through dorsal, ventral, and other anatomical streams. But we hypothesize that different functions are not served by different representations of spatial orientation, and instead a single uni1ed representation of space accounts for both the perception of locomotion, the control of locomotion, and the control of other functional categories of spatially coordinated action. If the diversi1ed representation hypothesis is correct, then experimentally induced errors in the perception of locomotion should not lead to errors in the control of other environmentally directed actions. If, on the other hand, the uni1ed hypothesis is correct, then experimentally induced errors in the perception of locomotion should lead to predictable errors in the control of other environmentally directed actions. These predictions were investigated in two studies. In both studies, changes in the calibration of rotational locomotion were induced. The prediction in the 1rst study was that the resulting errors in the perception of rotational locomotion would transfer to errors in controlling locomotion while walking complex paths without vision. The prediction in the second study was that the resulting errors in the perception of rotational locomotion would cause errors in whispering in situations where people turned without vision and whispered to their remembered partners.
8.4.1 Changing the calibration of rotational locomotion alters the representation of dynamic spatial orientation when walking two-dimensional paths The changes induced in the calibration of rotational locomotion might only narrowly affect the perception of turning in place. Or alternatively, they might have more general effects, and in2uence the control of locomotion along any two-dimensional paths that include changes in heading. Wagner (1996) conducted a series of experiments to 1nd out. The experiments followed the methods described above by inducing a recalibration of rotational locomotion, and then by testing to see whether the recalibration transferred to tests of spatial orientation when walking complex twodimensional paths. Before and after the recalibration experience, the participants were tested by asking them to look at target objects in a familiar room, put on a blindfold, follow a sighted guide walking along a complex path, and then point at the locations of targets in the remembered room. Wagner’s results showed that the induced changes in the recalibration of rotational locomotion resulted in subjects underestimating their rates of turning by about 15%. Their errors in pointing at targets after walking two-dimensional routes were predicted closely by these 15% underestimates in turning. Wagner’s results reveal that the induced changes in the perception of rotational locomotion led to predictable changes in their dynamic representations of spatial orientation. They are consistent with the ‘unitary representation’ view of locomotion and not with the ‘diversi1ed representation’ view.
aapc08.fm Page 189 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
It is important to note that we have simpli1ed two sets of complicating details. One complication is that there are two recalibration processes, one operating like a sensory aftereffect and the other operating like a perceptual learning effect. They behave somewhat differently. The other detail is that the manifestations of recalibration are different when tested while turning in the same direction as testing in the recalibration experience and when tested in the opposite direction. Both of these complexities are examined in detail in Pick et al. (1999).
8.4.2 The calibration of speaking as a function of the speaker-to-listener facing direction Our hypothesis is that the dynamic representation of one’s spatial orientation during locomotion is unitary, so that the same representation mediates different actions. In order to test this hypothesis we induced changes in the calibration of rotational locomotion, and then tested to see if the resulting changes in spatial orientation when walking without vision in2uenced the vocal intensity of subjects when they attempted to speak to a remembered listener after turning without vision to a new facing direction. Skillful speakers adjust their vocal intensity to 1t their spatial orientation relative to their listeners. For example, we know that adults modify their vocal intensity with changes in the distance to their listener (Johnson, Pick, Siegel, Ciccarielli, and Garber 1981; Michael and Siegel 1998). Conversational effectiveness also depends on the speaker’s facing direction relative to the listener— in order to be heard, people need to talk more loudly when they face away from the listener than when they face directly toward the listener. We exploited this in the following experimental study of the calibration of vocal intensity as a function of the speaker’s facing direction relative to the listener (McMillan 1998; Rieser, McMillan, Berry, and Pick 1998). In order to assess the calibration of vocal communication, subjects were asked to whisper their telephone number, so that their conversational partner [standing 1m away] could just barely hear the number. Subjects were asked to do this after they turned to face away from the listener by 0, 180, and 360 deg. Their vocal productions were tape-recorded and scored to determine their vocal intensity, which was averaged across the duration of the utterance. In the ‘eyes open’ condition, people could see their facing direction relative to the listener during each trial. In the ‘eyes closed’ condition, subjects faced the listener, put on a blindfold, and then were turned by the experimenter into one of the facing directions. In the ‘eyes open’ condition, subjects could adjust their loudness according to their visible facing direction relative to the listener. But in the ‘eyes closed’ condition, subjects needed to perceive their turn in relation to the remembered listener, and adjust their loudness in conjunction with this representation of their spatial orientation. The experiment consisted of pre-tests of whispering as a function of the different facing directions in the ‘eyes open’ and ‘eyes closed’ conditions, the standard intervention to induce a change in rotational locomotion together with the tests to assess the recalibration as described earlier in this chapter, and 1nally whispering post-tests, which were the same as the whispering pre-tests. As expected, stepping on the circular treadmill caused people to recalibrate their rotational locomotion, so that they underestimated their degree of rotation by about 20% during the post-tests relative to the pre-tests. Also as expected, the recalibration of rotational locomotion did not, per se, lead to a change in the calibration of listening. That is, when the subjects could see their facing direction relative to the observer, their average vocal intensity during the post-tests was the same as their average during the pre-tests after turning 0.180, and 360 deg. The theoretically critical results involved the vocal intensities when people whispered after turning with their eyes closed. The ‘unitary representation’ hypothesis leads to this prediction:
189
aapc08.fm Page 190 Wednesday, December 5, 2001 9:35 AM
190
Common mechanisms in perception and action
the 20% underestimation error in the perception of rotational locomotion that was induced by the recalibration phase should result in a 20% error in the represented facing direction during the posttest trials, and this error in the representation should lead to predictable changes in vocal intensity as a function of the facing direction. Consider three speci1c predictions, for when subjects viewed the listener straight ahead of them, and then closed their eyes. First, after turning 0 deg with their eyes closed, the subjects should have an accurate representation of their facing directions, and thus there should be no pre-test to post-test change in their vocal intensity. The results con1rmed this. Second, after turning 180 deg with their eyes closed, the subjects faced directly away from the listener, and correspondingly, during the pre-tests their whispering was the loudest in this condition. During the post-tests, the subjects actually faced directly away from the listeners, but we predicted the recalibration would induce them to underestimate their turn by about 36 deg. If this is the case, then they should incorrectly represent themselves as facing about 144 deg away from the listener instead of their actual 180 deg. As predicted, the subjects whispered more softly during the posttests than the pre-tests in this condition, averaging a 5-decibel reduction in vocal intensity. And third, after turning 360 deg with their eyes closed, subjects faced directly toward the listener and, correspondingly, during the pre-tests their whispers were the softest in this condition. During the post-tests, we predicted that recalibration of rotational locomotion would induce them to underestimate their turn by about 72 deg. If this is correct, their representation would be of facing 288 deg (i.e. the 360 deg of their actual facing direction minus the 72 deg underestimation due to the recalibration) away from the listener (whereas they actually faced 0 deg away), and they should speak more loudly during the post-tests than the pre-tests. The results con1rmed this for all the subjects, whose vocal intensities averaged a 3-decibel increase during the post-tests relative to the pre-tests.
8.5 Summary and conclusions Locomotion causes changes in spatial orientation relative to the surrounding environment. When walking with vision or without it, people tend to perceive their locomotion relative to the surroundings as a frame of reference, and their perception serves to update their representation of their spatial orientation. We have argued that the resulting representation of spatial orientation is unitary—it re2ects the perception of locomotion, it mediates guiding ongoing locomotion, and it mediates the control of other environmentally directed actions that depend on spatial orientation. The chapter has summarized 1ve types of empirical evidence that support the argument. First, the study of people’s reports of conscious awareness while walking to follow a sighted guide without vision indicated that people tended to notice their walking in the environmental terms of their changing locations and facing directions relative to objects in the remembered surroundings. Second, the studies of path integration showed that people are better able to judge the starting point of a path walked without vision in conditions where they can relate the path to a remembered structured frame of reference than when they can only relate it to a ‘Ganzfeld’. Third, the studies of the recalibration of translational and of rotational locomotion showed that brief perceptual–motor learning experiences induce changes in the environmental scale of perception. And in addition, they showed the experimentally induced changes in calibration were functionally speci1c. Whereas the changes did not transfer to different functions (e.g. changes in the calibration of forward walking did not transfer to throwing a bean bag or to whispering to a conversational partner), the changes readily transferred to new gaits of forward walking and new limb systems of turning in place. Fourth, Wagner’s (1996) studies showed that whereas the recalibration of rotational locomotion did not
aapc08.fm Page 191 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
transfer to translational locomotion, it did instead lead to predictable errors in pointing at remembered targets while walking paths in the two-dimensional ground plane. And 1fth, the studies of whispering while turning without vision showed that the recalibration of rotational locomotion did not affect the loudness of whispering per se. But instead, the recalibration induced changes in the representation of spatial orientation, and the changed representation did in2uence how loudly the subjects whispered to their remembered partner while turning in place without vision. We close with four questions and answers in light of our work. First, is self-locomotion with or without vision perceived in body-centered or environment-centered terms? Our answer is that it can be perceived 2exibly in either way, depending on the perceiver’s immediate goals. Second, is self-locomotion an action to be controlled or is it an event to be perceived? Our answer is that it is both. Third, is the perception and perceptual representation of self-locomotion uni1ed or diversi1ed? Our answer is, it is uni1ed, such that errors induced in the perception of rotational locomotion also cause errors in walking paths in the two-dimensional ground plane and in the vocal intensity of whispering. And fourth, is the perception of self-locomotion and its complement, the representation of spatial orientation, ‘perception’ in the sense of responding to on-line input, or is it ‘memory’ in the sense of being mediated by memory representations? Our answer is it is both or it is neither. Gibson (1979) pointed out that the pick-up of information by the senses is typically distributed through time and through space. Like him we do not know where to draw the line between perceiving and remembering, and point out that during exploratory actions the information for perceiving objects and for perceiving the environment is always distributed through the space and time involved in the exploratory actions. We think the perception of self-locomotion when walking without vision and the complementary representation of the resulting changes in spatial orientation, are good examples of this dynamic feature of perception.
References Berry, D. (2000). Does recalibration of turning with the feet transfer to turning with the hands? Thesis submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the Master’s Degree. Nashville, TN. Berthoz, A., Pavard, B., and Young, L.R. (1975). Perception of linear horizontal self-motion induced by peripheral vision (linear vection): Basic characteristics and visual vestibular interactions. Experimental Brain Research, 23, 471–489. Berthoz, A., Israel, I., Georges-Francois, P., and Grasso, R. (1995). Spatial memory of body linear displacement: What is being stored? Science, 269, 95–98. Bles, W. (1981). Stepping around: Circular vection and Coriolis effects. In J. Long and A. Baddeley (Eds.), Attention and performance IX, pp. 47–62. Hillsdale, NJ: Lawrence Erlbaum. Dichgans, J. and Brandt, T. (1972). Visual–vestibular interaction and motion perception. Bibiotheca Ophthalmologica, 82, 327–338. Dichgans, J. and Brandt, T. (1978). Visual–vestibular interactions: Effect on self-motion perception and postural control. In R. Held, H. Leibowitz, and H.L. Teuber (Eds.), Handbook of sensory physiology, Vol. 8: Perception. New York: Springer. Frymire, M. (1998). Path integration when walking without vision as a function of whether or not the surrounding environment is known. Unpublished senior thesis with honors in Cognitive Studies, Vanderbilt University. Garing, A.E. (1999). Intersensory integration in the perception of self-movement. Unpublished Ph.D. dissertation, Vanderbilt University.
191
aapc08.fm Page 192 Wednesday, December 5, 2001 9:35 AM
192
Common mechanisms in perception and action
Giannopulu, I. and Lepecq, J-C. (1998). Linear vection chronometry along spinal and saggital axes in erect man. Perception, 27, 363–372. Gibson, J.J. (1958). Visually controlled locomotion and visual orientation in animals. British Journal of Psychology, 49, 182–194. Gibson, J.J. (1962). Observations on active touch. Psychological Review, 69, 477–491. Gibson, J.J. (1966). The senses considered as perceptual systems. Boston: Houghton Mif2in. Gibson, J.J. (1979). The ecological approach to visual perception. Boston: Houghton Mif2in. Guendry, F.E. (1974). Psychophysics of vestibular sensation. In H.H. Kornhuber (Ed.), Vestibular system. Part 2: Psychophysics, applied aspects and general interpretations. Berlin: Springer-Verlag. Guth, D.A. and Rieser, J.J. (1997). Perception and the control of locomotion by blind and visually impaired pedestrians. In B. Blasch, W. Wiener, and R. Welch (Eds.), Handbook of orientation and mobility, pp. 9–39. New York: American Foundation for the Blind. Helmholtz, H. von (1866). Handbuch der physiologischen Optik, Leipzig: Voss. Johansson, G. (1977). Studies on visual perception of locomotion. Perception, 6, 365–376. Johnson, C., Pick, H.L., Siegel, G.M., Ciccarielli, C., and Garber, S.R. (1981). Effects of interpersonal distance on children’s vocal intensity. Child Development, 52, 721–723. Kornhuber, H.H. (ed.) (1974). Vestibular system. Part 2: Psychophysics applied aspects and general interpretations. Berlin: Springer-Verlag. Lackner, J.R. (1988). Some proprioceptive in2uences on the perceptual representation of body shape and orientation. Brain, 111, 281–297. Lederman, S.J. and Klatzky, R.L. (1987). Hand movements: A window into haptic object recognition. Cognitive Psychology, 19, 342–368. Lederman, S.J. and Klatzky, R.L. (1998). Relative availability of surface and object properties during early haptic processing. Journal of Experimental Psychology: Human Perception and Performance, 23, 1680–1707. Lee, D.N. (1978). The functions of vision. In H. Pick and E. Salzmann (Eds.), Modes of perceiving and processing information. Hillsdale, NJ: Erlbaum. Lee, D.N. and Lishman, J.R. (1974). Visual proprioceptive control of stance. Journal of Human Movement Studies, 1, 87–95. Lepecq, J.C., Jouen, F., and Dubon, D. (1993). The effect of linear vection on manual aiming at memorized directions of stationary targets. Perception, 22, 49–60. Lishman, J.R. and Lee, D.N. (1973). The autonomy of visual kinesthesis. Perception, 2, 287–294. Loomis, J.M., Klatzky, R.L., Golledge, R.G., Cicinelli, J.G. (1993). Nonvisual navigation by blind and sighted: Assessment of path integration ability. Journal of Experimental Psychology: General, 122, 73–91. Loomis, J.M., Da-Silva, J.A., Philbeck, J.W., and Fukusima, S.S. (1996). Visual perception of location and distance. Current Directions in Psychological Science, 5, 72–77. Loomis, J.M. and Philbeck, J.W. (1999). Is the anisotropy of perceived 3-D shape invariant across scale? Perception and Psychophysics, 61(3), 397–402. Mach, E. (1875). Grundlinien der Lehre von den Bewegungsemp1ndungen. Leipzig: Wilhelm Engelmann. McMillan, A. (1998). Changes in whispering as a function of the recalibration of rotational locomotion. Unpublished senior thesis with honors in Cognitive Studies, Vanderbilt University. Michael, D., Siegel, G.M., and Pick, H.L., Jr. (1995). Effects of distance on vocal intensity. Journal of Speech and Hearing Research, 38, 1176–1183. Mountcastle, V.B. (1958). Somatic functions of the nervous system. Annual Review of Physiology, 20, 471–508. Philbeck, J.W. and Loomis, J.M. (1997). Comparison of two indicators of perceived egocentric distance under full-cue and reduced-cue conditions. Journal of Experimental Psychology: Human Perception and Performance, 23(1), 72–85. Philbeck, J.W., Loomis, J.M., and Beall, A.C. (1997). Visually perceived location is an invariant in the control of action. Perception and Psychophysics, 59(4), 601–612. Pick, H.L., Jr., Yonas, A., and Rieser, J.J. (1979). Spatial reference systems in perceptual development. In M. Bornstein and W. Kessen (Eds.), Psychological development from infancy. Hillsdale, NJ: Erlbaum. Pick, H.L., Jr., Rieser, J.J., Wagner, D., and Garing, A.E. (1999). The recalibration of rotational locomotion. Journal of Experimental Psychology: Human Perception and Performance, 25, 1179–1188. Post, R.B. (1988). Circular vection is independent of stimulus eccentricity. Perception, 17, 737–744. Rieser, J.J., and Heiman, M.L. (1982). Spatial self-reference systems and shortest-route behavior in toddlers. Child Development, 53, 524–533.
aapc08.fm Page 193 Wednesday, December 5, 2001 9:35 AM
The perception and representation of human locomotion
Rieser, J.J., Guth, D.A., and Hill, E.W. (1986). Sensitivity to perspective structure while walking without vision. Perception, 15, 173–188. Rieser, J.J. (1989). Access to knowledge of spatial structure from novel points of observation. Journal of Experimental Psychology: Human Learning, Memory, and Cognition, 15, 1157–1165. Rieser, J.J., Ashmead, D.A., Talor, C., and Youngquist, G. (1990). Visual perception and the guidance of locomotion without vision to previously seen targets. Perception, 19, 675–689. Rieser, J.J., Pick, H.L., Jr., Ashmead, D.A., and Garing, A.E. (1995). The calibration of human locomotion and models of perceptual–motor organization. Journal of Experimental Psychology: Human Perception and Performance, 21, 480–497. Rieser, J.J., McMillan, A., Pick, H.L., and Berry, D.M. (1998). Changes in the vocal intensity of whispering without vision as a function of the recalibration of rotational locomotion. Presentation to the Psychonomics Society. Rieser, J.J. (1999). Dynamic spatial orientation and the coupling of representation and action. In R. Golledge (Ed.), Cognitive mapping and spatial behavior. Baltimore, MD: Johns Hopkins Press. Rieser, J.J., Frymire, M., and Berry, D.M. (1999). Path integration when walking without vision depends on the geometry of the remembered surroundings. Presentation to the Psychonomics Society. Steenhuis, R.E. and Goodale, M.A. (1988). The effects of time and distance on accuracy of target-directed locomotion: Does an accurate short-term memory for spatial location exist? Journal of Motor Behavior, 20, 399–415. Taub, E., Ellman, S., and Berman, A. (1966). Deafferentation in monkeys: Effect on conditioned grasp response. Science, 151, 593–594. Telford, L. and Frost, B.J. (1993). Factors affecting the onset and magnitude of linear vection. Perception and Psychophysics, 52(6), 682–692. Wade, N.J. (1998). Light and sight since antiquity. Perception, 27, 637–670. Wagner, D.G. (1996). Generalizing recalibration of locomotion to natural spatial updating. Unpublished Ph.D. Dissertation, University of Minnesota, MN. Warren, W.H. (1995). Self-motion: Visual perception and visual control. In Handbook of perception and cognition, Vol. 5: Perception of space and motion. New York: Academic Press. Wertheim, A.H. (1994). Motion perception during self-motion: The direct versus inferential controversy revisited. Behavioral and Brain Sciences, 17, 293–310.
193
aapc08.fm Page 194 Wednesday, December 5, 2001 9:35 AM
This page intentionally left blank
aapc09.fm Page 195 Wednesday, December 5, 2001 9:59 AM
II
Timing in perception and action
aapc09.fm Page 196 Wednesday, December 5, 2001 9:59 AM
This page intentionally left blank
aapc09.fm Page 197 Wednesday, December 5, 2001 9:59 AM
9 Perspectives on the timing of events and actions Introduction to Section II Jeff Summers
Time perception in terms of the temporal properties of environmental events and time production in terms of the execution of self-generated sequences of precisely timed movements are abilities that are essential for successful interactions with the world around us. Furthermore, coordinating our actions in time to coincide with an external event, that is, perception–action coupling, is a feature of many everyday activities, from pressing a key on a piano, to shaking someone’s hand or hitting a squash ball. Unfortunately the study of the perception of event structure (i.e. the motional and rhythmic properties of events) and the study of movement timing have progressed somewhat independently, leaving the crucial question of how perceptual variables map onto motor variables largely neglected. In this section four papers are presented dealing with different aspects of timing. The 1rst chapter, by Wing and Beek, presents a tutorial review of studies of movement timing. This aspect of timerelated behaviour has a long history in experimental psychology dating back to the late 1800s, and the paradigmatic task has involved repetitive key tapping in synchrony with or as a continuation of a sequence of auditory stimuli. The question of interest has been: How does the motor system compose sequences of precisely timed movements? Historically, cognitive models of movement timing have appealed to central interval timing mechanisms or clocks that emit a regular series of pulses with some variability to trigger movements. In recent years, strong criticism of internal timekeepers has come from proponents of a dynamical systems approach to movement behaviour. This approach emphasizes self-organizing processes within the motor system and sees timing as an emergent consequence of these processes rather than as being explicitly controlled. Initial support for dynamical models came from periodicities evident in cyclic repetitive movements, such as locomotion and bimanual movements. In their chapter Wing and Beek juxtapose the cognitive and dynamical approaches on a number of research issues within the area of movement timing. What becomes clear is that over the last 1fteen years or so the two approaches have pursued their research agendas independently and in parallel, focusing on different kinds of phenomena and levels of analysis. This is nicely illustrated in the research conducted by both groups on bimanual polyrhythmic tapping. Dynamical systems researchers have focused on movement processes, in particular the intrinsic constraints on the phasing between the limbs and the mechanisms underlying loss of stability and pattern switches. Cognitive researchers, in contrast, have been concerned more with the behavioural events (e.g. key
aapc09.fm Page 198 Wednesday, December 5, 2001 9:59 AM
198
Common mechanisms in perception and action
taps) produced by the movements and have attempted to uncover the underlying motor organization adopted by subjects through an analysis of the covariability among intertap intervals. Abernethy and Sparrow in 1992 predicted that the motor behaviour 1eld was entering a long period of bitter and intense con2ict from which one of the competing views would emerge as dominant. Others, in contrast, argued that the 1eld would move towards a reconciliation between the two approaches (e.g. Summers 1992). In the chapter written by Wing and Beek, two of the leading 1gures in the ‘opposing’ camps, the mood is clearly toward a merging of the approaches. Whether the merging is best realized by a uni1ed mathematical form (Pressing 1999), a two-level oscillator model (Peper, Beek, and Daffertshofer 2000), or a two-level organization incorporating both central timekeeping processes and trajectory coupling phenomena (Semjen 2001) remains to be seen. In the second chapter Aschersleben and colleagues investigate the role sensory feedback may play in the online correction of movements that have to be synchronized with an external stimulus. To address this issue the authors exploit the negative asynchrony observed when subjects are asked to tap the index 1nger to coincide with a predictable external stimulus, usually an isochronous sequence of auditory pacing signals. The asynchrony, noted over 100 years ago, refers to the 1nding that in this simple repetitive task subjects typically tap consistently ahead of the tones by some 30–50 milliseconds. One explanation for the anticipation error maintains that synchrony involves a matching, at a central representational level, between the sensory consequences of the tap and the auditory pacing signal. As it takes longer for the tactile and kinesthetic feedback from the tapping 1nger to reach the brain than the auditory feedback from the ear, the tap must be executed prior to the pacing signal for synchrony to be perceived at the central level. The authors present several lines of evidence in support of a model (the sensory accumulator model) in which the temporal asynchronies re2ect differences in processing times needed to generate the central representations, rather than pure nerve conduction time differences. Of particular interest is a series of experiments involving a deafferented subject, IW, to examine the effect of eliminating sensory feedback from the 1nger on the anticipation error. The studies not only show that IW has been able to substitute visual feedback for the missing kinesthetic feedback to spatially control his movements, but also that even in the absence of all feedback sources he was still able to produce a relatively stable series of taps coordinated with the pacing signal. To the authors, the latter remarkable feat is consistent with the notion of forward modelling in which the expected sensory consequences of an action (i.e. tap) can be used to time the action (ie. match with the perceived pacing signal) even when no actual feedback is available. At another level the chapter by Aschersleben and colleagues illustrates nicely the importance of multiple sources of information in interceptive timing tasks. That is, under the minimal information conditions that exist in the synchronization experiments, subjects’ performance is quite poor and determined mainly by sensory processing times. However, in everyday activities where predictive information can be gained from a variety of sources (e.g. optic 2ow), interceptive timing is remarkably accurate. For example, compare the 30–50 millisecond error in synchronized tapping with the 2–5 millisecond timing accuracy exhibited at bat/ball contact by expert table tennis players when executing a series of forehand drives towards a small target area across the net (Bootsma and van Wieringen 1990). The importance of information sources on synchronization performance is further illustrated in the chapter by Bruno Repp. Two experiments are described in which skilled musicians attempted to synchronize 1nger taps with click sequences that represented variations from an average expressive timing pro1le derived from an analysis of 115 expert performances of a musical excerpt. It should be noted that expressive timing patterns when represented as a click sequence are quite meaningless.
aapc09.fm Page 199 Wednesday, December 5, 2001 9:59 AM
Perspectives on the timing of events and actions
The question of interest was whether adding musical context to the clicks would facilitate synchronization performance. Providing the click sequences with accompanying music immediately improved synchronization, particularly for the most typical pattern but also for other expressive timing patterns. These 1ndings in the music domain provide a clear illustration of the tight coupling between perception and action. Furthermore, both having accompanying music and having subjects merely imagine the music produced systematic deviations from regularity in the tapping of an isochronous click sequence. These involuntary effects provide further evidence of coupling between music perception, musical imagery, and movement timing. In the 1nal chapter, Haggard looks at the role of consciousness in the link between perception and action. While the debate over what consciousness is continues (e.g. Dennett 2001), Haggard approaches the issue by asking what is the function of consciousness when achieved in the generation of intentional action. Conscious representations are seen as contributing a constructive role by providing the necessary link between intention and action underlying the sense of agency. To construct the relation between intention and action, consciousness participates in a hypothetical neural process of efferent binding. Through efferent binding intentions, actions and the environmental sensory consequences of these actions are linked. To examine the binding between action and effect Libet et al. (1983) paradigm is resurrected in which subjects judge via the rotating hand of a clock the perceived time of occurrence of sensory (a pure tone) and motor (a key tap) events. Although one may question the ecological validity of the paradigm, strong support for the efferent binding hypothesis is obtained across two experiments, in the form of interchangeable attraction effects when stimuli and actions are associated in a causal context. It appears, therefore, that the processes underlying conscious awareness are similar for stimulus and motor representations. The interplay between perception and action is the central theme of the present volume, and the nature of the interface between environmental information and the control of movement is also the source of greatest division between the ecological and cognitive theoretical positions. In the timing domain, cognitivists have addressed the question of how timing in perception maps onto timing in action by assuming common representations and coding for environmental events and actions and a common mechanism underlying the perception and production of time (e.g. Ivry and Hazeltine 1995). Proponents of the ecological view, in contrast, expunge the notion of timing mechanisms and rely on the Gibsonian concept of affordance as the link in the direct coupling between perception and action. The direct perception perspective, however, has been hampered by a lack of consensus within the ecological community with regard to the nature and role of affordances (e.g. Michaels 2000; Stoffregen 2000) and the environmental information relevant to the control of movement (Summers 1998). As illustrated in the review by Wing and Beek, the study of movement timing has to a large extent ignored the perceptual side of the perception–action cycle. Implicit in most models of key tapping is the assumption that perceptual variables in2uence timing by modulating the internal timekeeping processes that pace movements (e.g. Repp, this volume, Chapter 12). Analyses of coordination dynamics (arguably a perspective within ecological psychology) also rarely refer to the affordance concept or to how perceptual variables map onto motor variables. Recently, however, a strong link has been demonstrated between coordination dynamics and perceptual dynamics (e.g. Zaal, Bingham, and Schmidt 2000). That is, the common measures of interlimb coordination, relative phase and phase variability, appear to be properties to which the perceptual system is also sensitive. Furthermore, the stability of judgements of the relative phasing of two circles moving rhythmically on a computer display exhibited an inverted U-shaped function of relative phase similar to that
199
aapc09.fm Page 200 Wednesday, December 5, 2001 9:59 AM
200
Common mechanisms in perception and action
obtained with rhythmic limb movements. Interestingly, highest variability in the perceptual judgements was found around 90° relative phase, even when there was no variability in the display. One, somewhat radical, interpretation of these results is that patterns of interlimb coordination are primarily perceptually determined rather than re2ecting physical constraints on limb movements imposed by coupled oscillator systems (see Zaal et al. 2000, for other interpretations). Bruno Repp’s 1nding that the perception of musical structure involuntarily affected synchronization with an isochronous click sequence is another illustration of the powerful in2uence of perceptual factors on movement timing. At the very least, these results suggest that future models of coordination dynamics will need to more explicitly model how perceptual variables interact with motor coordination variables. The remaining two chapters in the section provide support for the alternative view that perception–action coupling is a consequence of a common representational basis for perceptual events and actions. The sensory accumulator model proposed by Aschersleben and colleagues to account for the negative asynchronies in their tapping task, assumes that synchronization is achieved through the matching of central representations of clicks and taps. A further assumption of the model is that the representation or neural state is in terms of the sensory consequences of an event or action (i.e. somatosensory feedback or reafference). Finally, the generation of a representation involves the accumulation of sensory evidence (afferent neural signals) until a threshold is reached. According to this account the negative asynchrony between a tap and the auditory pacing signal is a consequence of different accumulation functions for auditory and tactile information. Support for the model has come from a variety of experiments in which factors (e.g. amount of sensory feedback) affecting the accumulation functions have been manipulated. The model proposes, therefore, a common representational coding for both environmental events and actions and a common mechanism (accumulation function) through which they are coordinated. Haggard and colleagues, using a different experimental paradigm, provide further support for a common coding for perceptual events and actions and for the hypothesis that the time of an action is determined by its consequences rather than antecedents. The general aim of the research reported in this section has been to elucidate the basic mechanisms underlying the timing of movements. In order to examine these mechanisms without confounding from other factors (e.g. environmental and musculo-skeletal constraints), researchers in the timing domain have typically chosen simple tasks in which the environmental events are con1ned to sequences of auditory signals (tones) and actions to simple key taps. An important issue, therefore, is the extent to which the timing mechanisms identi1ed for this class of tasks also underlie tasks involving more complex interactions between perception and action. There is some evidence, for example, to suggest that different timing mechanisms may predominate in the control of discrete versus continuous movements (Semjen 2001).
References Abernethy, B. and Sparrow, W.A. (1992). The rise and fall of dominant paradigms in motor behaviour research. In J.J. Summers (Ed.), Approaches to the study of motor control and learning, pp.3–46. Amsterdam: NorthHolland. Bootsma, R.J. and van Wieringen, P.W.C. (1990). Timing an attacking forehand drive in table tennis. Journal of Experimental Psychology: Human Perception and Performance, 16, 21–29.
aapc09.fm Page 201 Wednesday, December 5, 2001 9:59 AM
Perspectives on the timing of events and actions
Dennett, D. (2001). Are we explaining consciousness yet? Cognition, 79, 221–237. Ivry, R.B. and Hazeltine, R.E. (1995). Perception and production of temporal intervals across a range of durations: Evidence for a common timing mechanism. Journal of Experimental Psychology: Human Perception and Performance, 21, 3–18. Libet, B., Gleason, C.A., Wright, E.W. and Pearl, D.K. (1983). Time of conscious intention to act in relation to onset of cerebral activity (readiness potential). Brain, 106, 623–642. Michaels, C.F. (2000). Information, perception, and action: What should ecological psychologists learn from Milner and Goodale (1995). Ecological Psychology, 12, 241–258. Peper, C.E., Beek, P.J., and Daffertshofer, A. (2000). Considerations regarding a comprehensive model of (poly)rhythmic movement. In P. Desain and L. Windsor (Eds.), Rhythm perception and production, pp. 35–50. Lisse: Swets and Zeitlinger. Pressing, J. (1999). The referential dynamics of cognition and action. Psychological Review, 106, 714–747. Semjen, A. (2001). On the timing basis of bimanual coordination in discrete and continuous tasks. Brain and Cognition, doi:10.1006/brcg.2001.1309. (Online reference.) Stoffregen, T.A. (2000). Affordances and events. Ecological Psychology, 12, 1–28. Summers, J.J. (1992). Movement behaviour: A 1eld in crisis? In J.J. Summers (Ed.), Approaches to the study of motor control and learning, pp. 551–562. Amsterdam: North-Holland. Summers, J.J. (1998). Has ecological psychology delivered what it promised? In J. Piek (Ed.), Motor behavior and human skill: A multidisciplinary approach, pp. 385–402. Champaign, IL: Human Kinetics. Zaal, F.T.J.M., Bingham, G.P., and Schmidt, R.C. (2000). Visual perception of mean relative phase and phase variability. Journal of Experimental Psychology: Human Perception and Performance, 26, 1209–1220.
201
aapc10.fm Page 202 Thursday, December 6, 2001 9:52 AM
10 Movement timing: a tutorial Alan M. Wing and Peter J. Beek
10.1 Introduction: two traditions in studying the timing of movement Two broad traditions underlie behavioural studies of movement timing. In one, which we will refer to as the information processing approach, time is considered a mental abstraction, applicable to, but represented independently of, any particular effector system. In this view our ability to carry out an action fast or slow, to write slow or fast, to speak with variation in rate, or to adjust the speed of the music we play, depends on central timing processes. It may be supposed that these depend on brain circuits that make contact with the motor system. However, they are functionally contained in that they do not require the action of any particular effector system. Rather they may be set to initiate movements at certain times, and these movements’ other parameters, such as force or direction, may be independently speci1ed. In the other perspective, which we will call the dynamical systems approach, timing is considered a by-product or emergent property of the organizational principles (i.e. dynamical equations of motion) governing a particular coordinated action. If an action has a characteristic timing, it is part and parcel of other movement dimensions of that action, such as frequency or its dynamical equivalent, stiffness. A repetitive activity such as handwriting may be carried out with regular timing but that is a consequence of a dynamical regime specifying a sequence of pen stroke directions and amplitudes under particular stiffness constraints. Time as such is not an explicitly controlled variable, but follows from dynamical equations of motion and their parameter settings. In this view the control of timing in the production of a musical rhythm may thus be said to follow from the effector system used to implement movement. In this tutorial review we consider the two approaches as they have been applied to repetitive timing behaviour. We 1rst describe a commonly used experimental paradigm, the continuation of a previously speci1ed interval. Then we review a number of performance measures. Next we detail two models as representative of the two approaches before considering how each has been applied to a set of studies illustrating a range of experimental paradigms. These ‘application areas’ include dual-task isochronous unpaced responding, synchronization, multi-effector, isochronous timing, and multi-effector, multifrequency timing. We conclude that we are witnessing a vibrant and active 1eld of behavioural research which has many and varied analytic tools. Although the two traditions have maintained largely separate development paths, our review suggests ways in which the two can be seen as complementary and deserving of more attention to bridges that might be built between them.
aapc10.fm Page 203 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
10.2 Paced and free tapping Our starting point is a paradigm developed over 100 years ago by an American psychologist working at Harvard University, L.T. Stevens. In his research into the ‘time sense’, Stevens asked the subject to tap a morse telegraph key in time with a metronome. Once the subject was following the beat, Stevens stopped the metronome while the subject continued to tap on his own. Using a smoked drum kymograph to record key depressions with an accuracy of one millisecond, Stevens’ interest was in the accuracy of the interresponse intervals produced in the continuation phase. He presented his data in the form of graphs with interval duration on the vertical axis and repetition along the horizontal axis; that is, a time time series. Fig. 10.1 shows three such time series with mean intervals 400, 600, 1000 ms. It will be observed that the time intervals produced re2ect the different metronome target times, but that there is some 2uctuation around the correct value, which is greater at longer intervals. Stevens’ use of a paradigm combining synchronization with continuation was an important methodological contribution to the study of timing, offering experimental control over the produced interval. The effectiveness of the paradigm testi1es to the adjustability of timing and also to the stability of timing, in the sense that the interval is maintained at approximately the right value well after synchronization stops. However, it may be noted that maintenance of the interval at the correct value is not a trivial task and indeed Stevens noted that the intervals 2uctuate more at slower rates. In fact, Stevens made two fundamental observations about the nature of the variability in timing.
Fig. 10.1 Paradigm for studying timing (Stevens 1886). (a) Trials are in two phases. An initial synchronization phase is followed by a continuation phase when the subject continues to tap at the same rate. (b) Data from three different trials with target interval of 400, 600, and 1000 ms shown as a time series with intervals on the y-axis shown in succession along the x-axis. (c) Variability increases with mean interval.
203
aapc10.fm Page 204 Thursday, December 6, 2001 9:52 AM
204
Common mechanisms in perception and action
He observed that the variability had short- and long-term components. Short-term 2uctuations around the mean he attributed to motor limitation—‘the hand (or perhaps the will during the interval) cannot be accurately true’. Long-term drift around the target he suggested re2ected ‘rhythmic variation of the standard carried in the mind’. Although these quotes may sound somewhat quaint to modern-day academics, the insight that variability in timing may be partitioned into components related to timing control structures on the one hand, and the effector system on the other, is essential, and still plays a key role in the theory and experimental study of timed motor behaviour.
10.3 Linear models and measures of performance Making inferences about the timing control structures that may underlie performance in tasks such as Stevens’ paradigm depends on accurate characterization of the behaviour being studied. Stevens’ observations on variability were based on visual inspection. However, we now have more sophisticated statistical tools at our disposal to characterize the nature of variation and these are reviewed under two headings. We 1rst consider those that apply when behaviour is stable based on the concept of a general linear model. That is, it is assumed that the behaviour of interest, in the present case the series of time intervals, is generated by a process that can be expressed as a weighted set of additive components plus a random element which re2ects a residual indeterminacy in the system. This may be characterized in terms of a linear or polynomial regression (a deterministic trend) plus random noise with mean zero. A key assumption in using regression methods to characterize the time series is that the noise term is stationary; its distributional properties do not evolve with time. Linear models have also been developed where the successive observations are related, not through a deterministic trend resulting in mean shift, but in local dependence. In autoregressive models, the deviation of the current observation from the mean (y) is proportionally related to previous observations’ deviations from the mean plus a random term (x). A 1rst-order autoregressive process assumes just one proportionality term. Thus: y ( j) = a*y ( j − 1) + x ( j) −1 < a < 1. Another class of model is the moving average, in which the deviation of the current observation from the mean is given by a random term plus a weighted sum of previous random terms. Thus, for a 1rst-order moving average process: y (j) = b*x (j − 1) + x (j)
–1 < b < 1.
Such models may be distinguished in terms of their autocorrelation function. For a series of observations, the autocorrelation at lag one is de1ned as the average of products of successive deviations from the mean taken over the whole series (the autocovariance) divided by the variance of the series. At lag two the products involve deviations two steps, rather than one step, apart. In general, the autocorrelation at lag k is given by: A(k) = [Av {y (j)*y ( j − k)}]/var(y)
k = 1, N.
The value of the autocorrelation is bounded by plus and minus one. At lag zero, the autocovariance in the numerator is equal to the denominator so the autocorrelation is 1. Low-order autoregressive
aapc10.fm Page 205 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
models give rise to autocorrelation functions with nonzero values persisting at long lags (the values decrease with increase in lag k). Low-order moving average models have nonzero autocorrelation only at low lags, the transition to zero autocorrelation being determined by the order of the moving average process. An alternative representation of the form of variability in stationary processes is the power spectrum. The time series is represented in terms of the relative power of a set of sinusoidal components. Rapid 2uctuations involving closely-spaced observations are represented by sinusoidal terms of short period (high frequency). Fluctuations around the mean taking place over observations spaced further apart (at a longer lag) are represented by sinusoidal terms of longer period (lower frequency). Random variation is represented by equal spread of power across all frequencies, sometimes referred to as white or Gaussian noise. It will be observed that the use of the spectrum to characterize the patterning of values in a time time series (time values ordered by occurrence) departs from another use of the power spectrum in which some measure, such as amplitude of movement, is observed as a continuous function of time. In such cases the form of the power spectrum constitutes a description of the movement trajectory. In repetitive tapping, a signi1cant proportion of the power is concentrated at the frequency corresponding to the reciprocal of the mean interval being produced.
10.4 Nonlinear models and measures of performance The assumption is often made in psychology that behaviour is stable over time. If repeated observations do not yield identical results, it is assumed that the process is subject to random variation but that long-term predictability improves on average with the availability of more observations. In the dynamical systems approach to the study of timing the interest is in the evolution of performance with time. Characterization of a dynamical system is a matter of describing its evolution path, or trajectory. This may be to compare observed properties with those predicted from assumed algebra or the inverse, to identify the underlying model which provides a suitable phenomenological description of the observed data in the sense of a formal analogy (cf. Beek, Peper, and Stegeman 1995). Stability and changes in stability due to variations in internal or external parameters form a special entry point for constructing dynamical models of behaviour. Of particular interest are those behavioural organizations that are resistant to small, transient perturbations. Such stable organizations are called attractors and are de1ned mathematically as regions in phase space to which trajectories converge (see Fig. 10.2). Phase space is a multidimensional space whose axes represent system variables, excluding time, such as effector position and velocity, so that a trajectory in phase space depicts the evolution in time from a particular set of starting conditions. (For these and other useful de1nitions see Gregson and Pressing 2000.) In the theory of dynamical systems, as it has evolved to date, four types of attractors are recognized. These are: (1) the point attractor or stable 1xed point (nearby trajectories converge onto a point; e.g. the equilibrium point of a damped pendulum), (2) the periodic attractor or limit cycle (trajectories converge onto a closed orbit; e.g. the periodic oscillations of a damped pendulum with an escapement to sustain the oscillations), (3) the quasi-periodic attractor (trajectories converge onto a non-closed orbit de1ned by a number (≥2) of basic frequencies that do not relate as integers; e.g. an externally driven limit cycle oscillator), and (4) the chaotic attractor (trajectories converge onto a non-closed orbit that cannot be meaningfully decomposed into a number of basic frequencies
205
aapc10.fm Page 206 Thursday, December 6, 2001 9:52 AM
206
Common mechanisms in perception and action
x(t)
y(t)
2.00 1.00 0.00 – 1.00 – 2.00
2.00 1.00
t
0.00
y(t) 2.00 1.00 0.00 – 1.00 – 2.00
– 1.00 – 2.00
– 2.00 – 1.00 0.00 1.00
2.00
t
x(t)
Fig. 10.2 Example of phase plane trajectory (phase portrait) of the Rayleigh oscillator for different initial conditions (x(t) represents position and y(t) velocity). Even if motion is started outside the normal range of interest the trajectory rapidly converges onto the characteristic limit cycle of the oscillator. because all frequencies are present; e.g. a dripping faucet or a metal pendulum attracted by three magnets positioned in a triangle underneath it). A remarkable property of the latter kind of dynamically stable organization is that the resulting erratic behaviour appears to be noise to the observer, at least at 1rst glance, and is equally unpredictable in the long run, even though it is intrinsically deterministic. Therefore, chaos in the technical sense of dynamical systems theory is also called deterministic or classical chaos. Necessary conditions for the occurrence of chaos in time-continuous dynamical systems are that the evolution laws describing the relations between the state variables are nonlinear and that the number of relevant state variables is three or more. Thus, when variability is studied from the perspective of dynamical systems theory, the assumption of a linear model with superimposed random variability (stochastic noise) is typically avoided, leaving open the possibility that the observed variability is the result of deterministic chaos. Traditional statistical methods, such as analysis of variance, linear time series analysis or frequency partition into energy spectra derive from a general linear model with superimposed additive random noise and an underlying assumption of stability. The characterization of a dynamical system trajectory exceeds the bounds of this model. In the absence of analytic solutions to nonlinear dynamics, investigators may need to draw on qualitative methods for studying a dynamical system’s topology (as recognized by Poincaré almost a century ago) and quantitative methods for studying its dimensionality and 2ow characteristics. A common issue is to determine whether, in the 1rst place, a dataset is low-dimensional and possibly chaotic, or, instead, random in the sense of being stochastic and of high dimensionality. One approach in this case is to determine the Lyapunov exponent. This measure describes the exponential rate of convergence/divergence of two initially close points in phase space along relevant dimensions of the attractor. One issue for such an analysis is that a large number of observations are required, certainly in the hundreds if not thousands. Producing a large number of interresponse intervals in a laboratory context can be quite tedious and inattention or fatigue may develop. Thus the conditions required for analysing nonstationarity may, themselves, change performance from being stationary into an unstable process subject to drift.
aapc10.fm Page 207 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
10.5 Two approaches to modelling timing We now turn to consider two contrasting approaches to studying variability in timing. One, 1rst set out by Wing and Kristofferson (1973), follows Stevens in addressing the nature of variability in repetitive tapping with one hand. This approach concerns steady-state, stable performance and assumes simple, noninteracting component processes. Movements are treated in terms of discrete events or responses that demarcate psychologically signi1cant time intervals. A second approach, proposed by Haken, Kelso, and Bunz (1985), applies to two-hand responding that is synchronized with a metronome whose speed is progressively changed over a series of blocks. It is concerned with progressive changes (which can include qualitative changes or phase transitions) in relative phase of these movements of the two hands due to nonlinear interactions of component oscillatory processes. In this approach, movements are represented as continuous functions of time.
10.6 Wing–Kristofferson (WK) model The fundamental idea in the WK model is that variability in tapping re2ects two independent additive sources of variance, one arising in an adjustable central timekeeper and the other in delays associated with motor implementation (see Fig. 10.3). Wing and Kristofferson (1973) proposed a hierarchical arrangement in which responses, triggered by the central timekeeper at the end of each internally generated interval Cj, are subject to delays in motor implementation Mj before the occurrence of observable responses. If, over successive responses, j = 1, 2, . . . N, the Cj and Mj are statistically independent Mj, and both are random variables, the Ij have variance: var(I) = var(C) + 2var(M). The model predicts dependence between adjacent Ij, Ij + 1 with lag one autocovariance: acov(I(1)) = −var(M) This results in the following prediction for the autocorrelation at lag k: acorr[I(k)] = acov(I(k))/var (I) = −1/{2 + (var(C)/var (M))} k = 1 0 k > 1. From the last relation it can be seen that the WK model predicts that the lag one autocorrelation (the correlation between adjacent pairs of intervals taken through the sequence of intervals) should be bounded by zero and minus one-half. The autocorrelation at higher lags is predicted to be equal to zero, acorr[I(k)] = 0, k > 1. This has generally been reported to be the case. However, a number of studies have reported a proportion of estimates outside this range. This may represent statistical 2uctuation and/or bias in estimators associated with relatively short sequences (see Vorberg and Wing 1996). From a control point of view the WK model is very simple—it has no feedback loop to correct errors. Indeed, for this reason, it is inadequate as an account of synchronized tapping with a metronome, a point to which we return later. However, the model does embody the form of the short-term
207
aapc10.fm Page 208 Thursday, December 6, 2001 9:52 AM
208
Common mechanisms in perception and action
Fig. 10.3 Wing–Kristofferson (WK) timing model. Variable timekeeper intervals subject to random motor implementation delays result in interresponse intervals that are negatively autocorrelated at lag 1. variability noted by Stevens in predicting the negative correlation of successive intervals (Wing and Kristofferson 1973; Vorberg and Wing 1996). In fact, this is an example of a more speci1c case, that random delays superimposed on a periodic appointment stream result in patterns of short–long alternation of intervals occurring more often than would be expected from a random series (Govier and Lewis 1967). In the more general case of the WK model the magnitude of the negative correlation produced by the two-level model depends on the relative amount of variability in timekeeper and motor delays. According to the WK model, the variability of the observed interresponse intervals is equal to the sum of the variability of the timekeeper process and twice the variance of the motor delays. The covariance of successive intervals is equal to the variance of the motor delays. Two equations expressing two observable quantities in terms of two unknowns can be solved to give the unknowns—in this case giving the variances of the underlying component processes, in terms of the observable interresponse interval measures. Thus, the covariation between successive intervals can be used to estimate variance of motor delays. Then, by subtraction of the motor delay variance from the interresponse interval variance, it is possible to estimate the variance of the timekeeper intervals. This approach was taken, for example, in an experiment in which subjects produced on each trial a series of responses with target interval selected from a range 290 to 540 ms (Wing 1980). It was found that timekeeper variance increased linearly with mean, whereas estimates of motor delay variance were relatively constant. Thus, at longer intervals, variability re2ected the timer, while at shorter intervals the motor implementation delays were relatively more important (and so the correlation between successive intervals was more negative at short intervals). The WK model of timing has been extended to provide a psychological account of rhythm (Fig. 10.4). Western music is frequently organized into rhythmic 1gures that follow hierarchical rules. Thus bars are subdivided into beats which may be further subdivided into simple fractions of the beat. This led Vorberg and Hambuch (1978, 1984) to suggest that the production of rhythms may involve separate timers at each level of the hierarchy. Assuming variability at each level is independent of variability at other levels leads to a prediction of negative correlation, not only for adjacent interresponse intervals, but also for some nonadjacent intervals. Moreover, intervals between the repetition of any particular response in successive cycles of the rhythm will have variability related
aapc10.fm Page 209 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
Fig. 10.4 Vorberg and Hambuch extended the WK model to include multiple timekeepers nested in a hierarchy. The cycle interval de1ned at the highest level of the hierarchy (bold line) has the lowest variability. to the position of that response in the hierarchy. Both predictions received support in an analysis of two-handed synchronous responding (Vorberg and Hambuch 1984) which partialled out characteristics of the motor delays by using covariation between left- and right-hand interresponse intervals to estimate multilevel timer properties. However, in this analysis some cases of positive rather than negative covariation in the timekeeper were found and these are not predicted by the basic hierarchical timer model. One possible account of such positive covariation in interresponse intervals is that it re2ects propagation of a rate parameter through successive layers of the timer hierarchy (Vorberg and Wing 1996). In music, speeding up or slowing down does not affect the fundamental structure of a rhythm. Thus it is reasonable to suppose that rhythms are speci1ed in terms of ratios of intervals (e.g. 1:2) and not by the absolute durations of those elements (e.g. 150 ms with 300 ms). However, operation of the timekeeper at each level in the hierarchy does ultimately require the duration be speci1ed. Thus, the model assumes that, before each cycle of a rhythm is produced, a preparatory process propagates a rate parameter down through the hierarchy and the multiplicative nature of this process can introduce overall positive correlations, albeit still modulated by negative correlations re2ecting the hierarchy. This model has been successfully applied to data from skilled musicians performing rhythm production tasks (Krampe et al. 2000).
10.7 Haken–Kelso–Bunz (HKB) model The starting point for the HKB model was the observation of Kelso (1984) of an abrupt, spontaneous transition in the phase coordination between the index 1ngers of the hands when making
209
aapc10.fm Page 210 Thursday, December 6, 2001 9:52 AM
210
Common mechanisms in perception and action
Fig. 10.5 Phase transitions in bilateral index 1nger oscillations. x1 and x2 represent the 1nger displacements. Finger trajectories as a function of time (above) and relative phase as a function of frequency (below) show antiphase 1nger movements switch to inphase movements.
simultaneous cyclic extension/2exion movements at a common frequency (see Fig. 10.5). A transition from one phase coordination to another occurs as frequency increases. If the movements start in antiphase (involving nonhomologous muscles) they may switch to moving in phase (homologous muscles). HKB showed that this behaviour could be modelled in terms of a so-called potential V(φ) which they de1ned by dφ/dt = −dV/dφ so that it re2ects the rate of change of relative phase. In order to represent the fact that antiphase movements can be produced in a relatively stable manner at lower frequencies but not at higher frequencies, and given the periodic nature of the system under study, HKB suggested the potential be de1ned as the sum of two cosines: V(φ) = −acos(φ) − bcos(2φ) The parameters a,b specify a family of functions with extrema (i.e. minima or maxima) at 0 and 180 deg. If the ratio b/a, the so-called bifurcation parameter, is larger than 0.25 (with a and b both larger than 0), the extrema at 0 and 180 deg are minima (Fig. 10.6). At these minima a change in the value of relative phase φ results in a rate of change of relative phase that tends to restore relative phase to the value before the change. Hence these minima represent points of stable relative phase. In the HKB model it is assumed that the ratio b/a decreases with frequency so that, ultimately, when b/a becomes smaller than 0.25, the minimum at 180 deg disappears. In this approach, relative phase, φ, is termed a collective variable or order parameter (capturing system states in terms of
aapc10.fm Page 211 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
Fig. 10.6 Haken–Kelso–Bunz (HKB) potential V describing the loss of stability of antiphase coordination with increasing frequency. As the control parameter frequency increases the ratio b/a decreases leaving only the stable region at φ = 0. (Adapted from Daffertshofer 1997.)
coordination) and frequency is a control parameter (affecting the dynamics of the order parameter without actually specifying the dynamics). In order to account for the description of phase switching behaviour by the potential, HKB proposed a model in which movements of each hand are described in terms of two oscillators subject to a pair of functions that couple the state of each effector to that of the other. They modelled each 1nger as a nonlinearly damped oscillator with a linear negative damping term for the insertion of energy and a nonlinear energy dissipation term—HKB considered x2(dx/dt) (Van der Pol) and (dx/dt)3 (Rayleigh)—which make the oscillations self-sustaining. In the phase plane (with dx/dt plotted against x) the trajectory is a limit cycle. Kinematic studies (e.g. Kay, Kelso, Saltzman, and Schöner 1987) have provided support for a limit-cycle model of 1nger motion involving both a Van der Pol and a Rayleigh term. Thus, increasing frequency results in oscillations with smaller amplitudes (a consequence of the Rayleigh term) and higher peak velocities (a consequence of the Van der Pol term). Using, for the sake of convenience, a limit-cycle model of the 1nger movements with only a Rayleigh term (that is, a Rayleigh oscillator), HKB analysed what coupling functions linking the states of one effector to those of the other, and vice versa, would produce the desired theoretical order parameter equation. Although many different mathematical forms of the coupling function are possible, HKB considered two speci1c cases. In one, the coupling is determined by time derivatives of the state variables, in the other, by time-delayed values of the state variables. Recall that, in the HKB account, the relative phases 0 and 180 deg are stable, at least at low frequencies. That is to say that a change in relative phase results in restorative rates of change of
211
aapc10.fm Page 212 Thursday, December 6, 2001 9:52 AM
212
Common mechanisms in perception and action
Fig. 10.7 The stability of a minimum in a potential V as measured by the standard deviation SD (a, b, c) or the relaxation time τrel (d, e, f). The SD is smaller for a steep minimum (a) than for a shallower minimum (b). If the shape of the potential is varied by manipulating a control parameter, the SD exhibits the corresponding change in stability of the stationary state (c). Similarly, in a steep minimum, the system relaxes more quickly from a small perturbation of given size ε (d) than in the case of a shallower potential (e). If the shape of the potential is varied by manipulating a control parameter, τrel also re2ects the induced change in the stability of the stationary state (f). (Figure adapted from Kelso et al. 1994.) relative phase. If phase is perturbed (e.g. the movement of one hand is brie2y arrested), relative phase should subsequently return to its pre-perturbation value. However, the potential landscape changes with frequency. At higher frequencies, the local minimum at 180 deg gets shallower and eventually disappears, so that 0 deg remains as the only stable region. At this point spontaneous transitions from 180 to 0 deg relative phase would be expected. Prior to this point, changes in the potential should be evident in a decreased stability with longer settling (or ‘relaxation’) time after perturbation at higher frequencies (see Fig. 10.7). This has been observed to be the case by Scholz, Kelso, and Schöner (1987). Biological systems are invariably prone to random 2uctuations, or noise, and Schöner, Haken, and Kelso (1986) considered the effect of adding random noise to the equation of motion of the order parameter, relative phase (i.e. dφ/dt = −dV/dφ). This extension to the HKB model predicts that, at suf1ciently low frequencies, the variability of relative phase is lower for in-phase than for antiphase movements, and that the variability of relative phase of the two coordination modes increases differentially with increasing movement frequency. In addition, this stochastic model predicts a strong increase of the variability of relative phase in the antiphase mode
aapc10.fm Page 213 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
just prior to the phase transition. These critical 2uctuations are a consequence of the 2attening of the potential well corresponding to the 180 deg phase relation and have been observed by Schöner et al. (1986).
10.8 Application area 1: dual task isochronous free responding The regularity of tapping, or more precisely, the variance, has been used as an index of mental load (Michon 1966). It is therefore interesting to review two studies of attention and timing, one from the information processing and one from the dynamical systems perspective. Sergent, Hellige, and Cherry (1993) analysed the effects of concurrent anagram solving on timing in terms of the twolevel timing model. They were interested to know whether the cognitive task would have differential effects on estimates of timer and motor delay. Figure 10.8 shows that, compared to no secondary task baseline, the secondary task increased var(C), leaving var(M) unchanged. One possible reason for selective central interference is that memory processes for timing are affected by the concurrent tasks (Inhoff and Bisiacchi 1990; Saito and Ishio 1998). A possible account of memory processes in central timing was provided by Gibbon, Church, and Meck (1984), based on ideas of Creelman (1962) and Treisman (1963). Gibbon et al. assumed that timekeeping is based on pacemaker pulses gated into an accumulator with a count being compared with a target value maintained in a reference memory to determine when a response should be made. From this perspective, impaired timing during simultaneous performance of another task might result from disturbances to reference memory or disruption of the gating process. Yamanishi, Kawato, and Suzuki (1979) took an approach to the effects of attention on repetitive 1nger tapping more in line with the dynamical systems perspective. They were interested in determining whether oscillatory 1nger movements in tapping would exhibit local instability which varied as a function of a contrasting interfering secondary task. They reasoned that varying the degree of cognitive complexity of a probe task to include speaking, remembering a visual decision, or responding with a 1nger of the other hand while tapping might be used to identify the level of the oscillator circuit subserving 1nger tapping. If interference was equal in all cases, then the oscillator
Fig. 10.8 A concurrent secondary task (anagram solution) increases var(C) not var(M) compared to baseline conditions without secondary task. (Adapted from Sergent et al. 1993.)
213
aapc10.fm Page 214 Thursday, December 6, 2001 9:52 AM
214
Common mechanisms in perception and action
circuits include cognitive functions such as visual perception and memory, whereas if interference was restricted, say, to the 1nger movement condition, it would suggest the oscillator involved overlapping motor processes but was insulated from the cognitive processes. Yamanishi et al. used phase transition curves (PTC) to describe the nature of the interactions. They showed that the tendency to reset the tapping was greatest in the case of the motor task. In contrast the other tasks had relatively little effect, leading to the conclusion that the oscillator includes the motor system. Such coupling between the hands, in this case between a continuous oscillator in one hand and a discrete movement in the other, anticipated the assumptions of the HKB model. It is interesting to note that PTCs provide an alternative to Scholz et al.’s (1987) method of estimating the relative stability of the oscillator using relaxation time.
10.9 Application area 2: synchronization The WK model is an open-loop model and makes no provision for processing feedback about the times of responses. Yet synchronization, as in the Stevens paradigm, is often used to de1ne the target interval and subjects typically experience no dif1culty tapping in phase. How might the timekeeper be adjusted to keep responses in phase with an external pacing stimulus? One possible approach was outlined by Vorberg and Wing (1996) and developed by Semjen, Schulze, and Vorberg (2000). It assumes that the current timekeeper interval is adjusted as a 1xed proportion of the asynchronies between the previous pacing stimuli and associated responses. The correction is linear in taking a 1xed proportion of either just the immediately preceding response (1rst order) or the response before that as well (second order). A(n + 1) = (1 − α)A(n) − βA( n − 1) + T + C(n) + D(n + 1) − D(n). This results in stable performance as long as β lies in the range plus/minus one and α lies between −β and 2 + β. Such correction has profound effects on the predicted variance and correlations of the asynchronies and interresponse intervals. Figure 10.9 shows changes with α on the x-axis for different values of β. As α increases, asynchrony variance (Fig. 10.9a) decreases but interresponse interval variance (Fig. 10.9d) increases. The group average estimates of the model parameters estimated using numerical methods by Semjen et al. (2000) are shown in Fig. 10.10. Synchronization data are shown with 1lled circles, continuation data with open circles. The values for the correction factors (shown at the bottom) show a cross-over with second-order correction being more important at high rates and decreasing at lower rates. A similar 1nding of greater second-order correction at higher response rates has also been reported by Pressing and Jolley-Rogers (1997) using spectral methods. One interpretation is that correction is a time-demanding process and, at higher rates, there is insuf1cient time to apply the correction within the period of the next timer interval. In such cases it seems reasonable to assume some degree of carry-over of the correction to the next interval. The HKB model places strong emphasis on the frequency of limb movements as the experimental control parameter. As frequency increases, antiphase coordination becomes less stable and a phase transition to in-phase coordination is often observed. A metronome is commonly used to effect the necessary frequency change and it is interesting to examine the relation between metronome and
aapc10.fm Page 215 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
1400
1.0 (a)
(b)
1200
0.5
1000 800
0.0
600 400
– 0.5
200 0 0.2
0.4
0.6
0.8 1.0 var (An)
1.2
1.4
– 1.0 0.2 1.6 1.8
1.0
1400
(c)
0.4
0.6
0.8 1.0 1.2 ρ ( An1, An+1)
1.4
1.6
1.8
(d)
1200 0.5
1000 800
0.0
600 400
– 0.5
200 – 1.0 0.2
0.4
0.6
0.8 1.0 1.2 ρ ( A n1, A n+2)
1.4
1.6
1.8
1.0
0 0.2
0.4
0.6
0.8 1.0 var (In)
1.2
1.4
1.6 1.8
1.0 (f)
(e) 0.5
0.5
0.0
0.0
– 0.5
– 0.5
– 1.0 0.2 0.4
0.6 0.8 1.0 1.2 1.4 ρ (In1, In+1)
1.6
1.8
– 1.0 0.2 0.4
0.6 0.8 1.0 1.2 1.4 ρ (In1, In+2)
1.6
1.8
Fig. 10.9 Linear phase correction in synchronization. Effects on variance and autocorrelation of asynchrony (a,b,c) and interresponse interval (d,e,f) of varying the 1rst- and second-order correction parameters (α horizontal axis, β positive and negative above and below the solid line). Observe that var(A) is minimized with β = 0, α < 1, var(I) is minimized with α = β = 0. (Semjen et al. 2000.)
215
aapc10.fm Page 216 Thursday, December 6, 2001 9:52 AM
216
Common mechanisms in perception and action
Fig. 10.10 Estimates of motor implementation and timekeeper standard deviation (s.d.) in synchronization (filled circles) and free (open circles) responding (above, middle) and of correction factors α (triangles) and β (squares) in synchronization (below). (Semjen et al. 2000.) hand movement. For example, as frequency increases does the antiphase hand movement tend to move in phase with the metronome? Kelso et al. (1990) have shown that, with one hand responding in a synchronization task, it does. However, in addition to phase transitions, cases of progressive drift in phase (phase wandering) are observed. The original formulation of the HKB models
aapc10.fm Page 217 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
Fig. 10.11 Skewing of the HKB-potential due to adding an asymmetry term Ωφ. If Ω = 0 the potential is left-right symmetric (a). With increasing Ω, the degree of skewedness increases, resulting eventually in loss of the stable solution near π (b, c). Black balls represent stable states, the white ball the lost stable state. assumed symmetry of the coupling functions between the two effectors. Clearly, in synchronization, the coupling is quite asymmetric, with the metronome in2uencing the participant rather than vice versa. Kelso et al. suggested a modi1cation to the potential landscape description: V(φ) = Ωφ − acos(φ) − bcos(2φ) where Ω represents the difference in uncoupled frequencies of the components (i.e. the frequencies when oscillating alone). When there is no difference in the uncoupled frequencies, that is, Ω = 0, the symmetric-coordination law of the original HKB model is recovered However, for a given movement frequency, if Ω is increased (Ω > 0), the stable 1xed points will drift away from their initial value (see Fig. 10.11). Moreover, if Ω is increased further, transitions from (more or less) antiphase to (more or less) in-phase may be observed or even a complete loss of synchronization (depending on the value of b/a). A third prediction is that the probability of maintaining a stable state in the asymmetric case is lower than in the symmetric case. Some support for this extension of the HKB model in terms of directions of phase transitions and pre-transition variability was obtained by Kelso et al. (1990). An assumption of the HKB model is that the frequency-induced phase transitions in rhythmically coordinated movements result from the coupling function describing their interaction. HKB considered two forms, one based on time derivatives, the other on time delays. Both give rise to the same potential landscape and both postulate that the phase transition from antiphase to in-phase coordination with increasing frequency is mediated by a frequency-induced reduction in movement amplitude in the component oscillators. The difference between the two forms of the model is that in the time-derivative model the decrease of coordinative stability with increasing frequency is solely associated with the decrease in movement amplitude, whereas according to the time delays
217
aapc10.fm Page 218 Thursday, December 6, 2001 9:52 AM
218
Common mechanisms in perception and action
version there is also an effect of movement frequency per se. The latter is a general effect that does not account for the differential loss of stability of the in-phase and antiphase modes of coordination with increasing frequency. However, it could be capitalized upon to provide an alternative explanation for the occurrence of frequency-induced phase transitions in the absence of changes in amplitude (see below). Peper and Beek (1998a) set out to test the HKB model assumption that the frequency-induced phase transitions in coordinated rhythmic movements are mediated by a drop in amplitude. They examined the effects of restricting movement amplitude in a task that involved the synchronization of arm movement with an oscillating visual target. In case the subjects started making antiphase movements with the targets (i.e. movements with a direction opposite to that of the target), switching to in-phase movement occurred in the majority of the trials as movement frequency increased. However, no effects of movement amplitude on pattern stability (operationalized as variability of relative phase and switching frequency) were observed. This 1nding is evidence against the kind of amplitude coupling postulated in both versions of the HKB model. The occurrence of a phase transition, however, may be reconciled with the time-delays version of the model because in this formulation increasing frequency leads to an overall decrease in coordinative stability. Thus, at a certain critical frequency the stability provided by the antiphase coordination may become too small to resist the stochastic 2uctuations in the system, resulting in a noise-induced phase transition from antiphase to in-phase coordination. In this sense, the time delays version provides a better account of the data than the time derivatives model. It has also more intuitive appeal in that the time delays postulated in the model are more readily identi1ed with neural transmission delays, such as those related to feedback processing.
10.10 Application area 3: multi-effector isochronous tapping In this section we consider application of the WK and HKB approaches to movements involving two effectors each moving at the same frequency. We 1rst consider the information processing approach applied to two-hand simultaneous responding by Helmuth and Ivry (1996). They asked subjects to tap simultaneously with one or two hands and found less variability in timing with two hands. Application of the WK model showed this was due to reduction in timekeeper variance. Interestingly, the bene1t also occurred with hand and foot. Helmuth and Ivry suggested that the effect was due to combining the output of two separate timing systems. For example, if onehanded timing involves setting a threshold level on noisy accrual of activation (e.g. Gibbon et al. 1984), with two hands two separate accrual processes might be combined so that the 1rst crossing of either threshold is used to trigger both hands. This would result in statistical reduction of variance of the time interval as observed. However, it should be noted that such a process would also lead to advancing of the expected time to threshold or a bias to shorter time intervals, and this has not been reported to be the case. In contrast an averaging approach, with response triggered at the midpoint of the time dictated by each timing signal, results in lowered variability without shift in mean interval. Helmuth and Ivry suggested such averaging might be achieved by summation of two separate integrative processes against a threshold normalized to re2ect the increase in number of inputs. Interestingly, Ivry and Hazeltine (1999) observed the two-hand variance reduction in a callosotomy patient, suggesting that integration involves subcortical integration of each hemisphere’s input. Wing, Church, and Gentner (1989) studied alternate hand tapping, involving antiphase movements of the two hands. They noted that intervals of the same average duration produced between
aapc10.fm Page 219 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
hands were more variable than the same interval occurring within hands. They also found adjacent between-hand intervals (e.g. LR, RL) exhibited correlations more negative than the limit of minus one-half predicted by the WK model. Results of a simulation study suggested that this might be accounted for by two coupled WK timing systems with left and right clock intervals kept in appropriate relative phase by corrections involving preceding produced intervals—in essence a linear phase correction model. The original formulation of the HKB model assumed symmetry of the coupling functions between the two effectors. However, we have already seen that phase transition phenomena occur in synchronizing with an oscillating visual target where it is reasonable to suppose the coupling between pacing stimulus and effector is asymmetric (Kelso et al. 1990). Asymmetry in the potential due to unequal uncoupled frequencies might also be expected to apply to situations in which movements involve different limbs. Kelso and Jeka (1992) showed this accounted for coordination dynamics between arm and leg, and Jeka and Kelso (1995) showed how the 1nding applies to changes in coordination dynamics when the arm is weighted to make its uncoupled frequency more similar to the leg.
10.11 Application area 4: multi-effector, multifrequency tapping With the two hands tapping at the same rate, at certain phase relations intermediate between alternation and synchrony, simple rhythmic patterns such as (1:2) or (1:3) may be de1ned by the betweenhand intervals. More complex rhythmic patterns may be produced if the two hands tap regularly but with different periods that are not in simple integer relation to each other. Thus, for example, if the hands start in synchrony but one hand produces two intervals while the other produces three in the same time (e.g. the left hand taps at 300 ms intervals and the right hand at 200 ms), a between-hand interval pattern of (2:1:1:2) results. Three against four produces a pattern (3:1:2:2:1:3). Given such polyrhythms involve periodic responding by each hand, it is interesting to ask whether control involves parallel independent timing by each hand (but with a link to keep in phase at the beginning of each cycle). When such parallel control is contrasted with serial integrated control in which a single timer is responsible for the between-hand intervals, the pattern of covariances observed between component intervals (see Fig. 10.12) rejects the parallel model (Jagacinski, Marshburn, Klapp, and Jones 1988; Summers, Rosenbaum, Burns, and Ford 1993b) even after extensive practice (Summers, Ford, and Todd 1993a; Summers and Kennedy 1992; Klapp, Nelson, and Jagacinski 1998). However, an analysis of highly skilled keyboard performance has recently produced evidence for parallel timing when overall response rate is high (Krampe et al. 2000). If the above account of polyrhythm were correct, the deviations from the mean intervals produced within each hand should be random. Engbert et al. (1997) examined this assumption using a nonlinear time series analysis which provides visualization for departures from randomness. Successive intervals in each hand’s contribution to 12 cycles of a polyrhythm were marked as 0 or 1 depending on whether they fell within a window around the correct interval proportion (Fig. 10.13). They found that at slower response rates, the departures were indeed random. However, at faster response rates there were systematic departures from the random pattern. Engbert et al. provided a theoretical account of the phase transition which yielded at least a qualitative match to the data. The model was based on a nonlinear dependence in successive within-hand intervals plus a between-hand adjustment at the end of every rhythm cycle.
219
aapc10.fm Page 220 Thursday, December 6, 2001 9:52 AM
220
Common mechanisms in perception and action
Fig. 10.12 Schematic of serial integrated and parallel timing models of polyrhythmic performance. (a) The former predicts I4 positively correlated with I5 and the latter positively correlated I3 with I1. (b) The results support the former (redrawn from Jagacinski et al. 1988.)
Fig. 10.13 Patterning of 2uctuations over 140 polyrhythm trials (3 in LH against 4 in RH) (Engbert et al. 1997). (a) The symbolic analysis (for each trial, and each of 36 LH and 48 RH response intervals, empty rectangle indicates interval near target, 1lled rectangle if not) with trials sorted by overall cycle duration. Dark and light regions indicate departures from target intervals are not random but change systematically with overall rate. (b) Estimates of entropy con1rm departures from random patterns suggested by the symbolic analysis.
aapc10.fm Page 221 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
Engbert et al. (1997) showed qualitative departures from strict timing of polyrhythm performance. From the dynamical systems perspective, it is also interesting to note that polyrhythm performance is liable to exhibit phase transitions as frequency increases, with more complex patterns switching to simpler patterns (Peper, Beek, and van Wieringen 1991, 1995). For instance, if a skilled drummer performs a complex polyrhythm, such as 3:8 or 5:8, and the cycle frequency of the fast hand, as speci1ed by an auditory metronome, is gradually increased, abrupt, spontaneous transitions to simpler (poly)rhythms, such as 2:3, 1:3, 1:2, and 1:1, occur at some critical frequency. If the cycle frequency of the hands is further increased, a second transition (e.g. from 2:3 to 1:2) or even a third may be observed (e.g. from 1:2 to 1:1). To account for such phase transitions, Haken, Peper, Beek, and Daffertshofer (1996) extended the HKB model to polyrhythms. The HKB model concerned the loss of stability of relative phase between two limbs moving at the same frequency, whereas the HPBD extension described the loss of stability of the ratio of two different frequencies produced by the two limbs. However, the logic of the HPBD model was similar in explaining phase transitions (between different frequency ratios) as a result of a frequency-induced drop in amplitude in the movements of the two effectors. In the HPBD model each stable frequency ratio was identi1ed with a speci1c coupling term for the interaction between the hands. These coupling terms described the interaction between the hands in terms of their normalized amplitudes (i.e. values between 0 and 1) raised to certain powers, with more complex rhythms requiring higher powers than simpler rhythms. According to the HPBD model, as the frequencies of the hands increase and their amplitudes decrease, the more complex rhythms lose their stability earlier than the simpler rhythms. This is because the coupling coef1cients associated with these rhythms decrease more rapidly (due to the higher powers) than those associated with the simpler rhythms. The coupling between the component oscillators in the HPBD model was originally formulated in terms of time derivatives of the state of the other oscillator, which resulted in an exclusive dependence of pattern stability on amplitude. As an alternative, Peper and Beek (1998b) demonstrated that it is also possible to determine a coupling function based on time-delayed values of the state of the other oscillator. (It will be recalled that a similar development of time derivative and time delay forms of coupling function was described by Haken et al. 1985 of the two hands moving at the same frequency.) As stated before, such time delays may be identi1ed with the neural processes that underlie the interaction between the hands, such as the neurophysiological delays associated with the use of kinesthetic feedback. This leads to an alternative HPBD model with the same coupling terms as in the original but with an additional, overall reciprocal dependence of coupling strength on the frequencies of the hands. According to this alternative model, the degree of interaction between the hands depends not only on the amplitudes of the oscillations but also on the frequencies at which the hands move. To test the two versions of the HPBD model, Peper and Beek (1998b) conducted an experiment aimed at dissociating the effects of frequency and amplitude on interlimb coupling in stable, steadystate production of the 2:3 polyrhythm. Subjects tapped this polyrhythm at 1ve different tempos under three amplitude conditions and performed unimanual control trials in which they had to tap at the same frequencies as produced in the bimanual trials. Subsequently, the degree of coupling between the hands in the bimanual trial was assessed by comparing the degree of harmonicity of the movements of the hands in bimanual compared to unimanual conditions. Harmonicity was operationalized as the relative contribution to the power spectrum (here the spectrum refers to the movement trajectory rather than the earlier noted use of the spectrum to characterize periodicity in a train of time intervals) of the evolving phase in the oscillation pattern. Using this measure, no signi1cant effects of
221
aapc10.fm Page 222 Thursday, December 6, 2001 9:52 AM
222
Common mechanisms in perception and action
amplitude were observed, whereas the strength of interaction between the hands decreased with increasing tempo (i.e. movement frequency). Again, this result favours the time-delays version over the time-derivatives version of the HPBD model and, as noted by Peper and Beek (1998b), is suggestive that the interaction between limbs may re2ect kinesthetic in2uences with the time delay re2ecting associated neural conduction delays.
10.12 A merging of the two approaches Shaping of stochastic 2uctuations by deterministic equations of motion plays an essential role in the dynamical systems approach to accounting for the loss of stability of coordinated movement patterns. However, very little work has been done in which these stochastic 2uctuations are used as a window into the control structure of timed motor behaviour. In contrast, showing that the covariance structure of timed motor responses may be exploited to uncover the underlying timekeeper organization has been a major contribution of the information processing approach. Recently, however, an initial attempt has been made to account for empirically observed patterns of temporal variability from the perspective of dynamical systems theory. Speci1cally, Daffertshofer (1998)—following an earlier suggestion of Schöner (1994)—examined both analytically and numerically the minimal conditions under which limit cycle models with noise consistently produce a negative lag-1 serial correlation (with a value between 0 and −0.5) between consecutive periods of oscillation. Contrary to earlier intuitions, he showed that a single limit cycle oscillator that is stochastically forced by (additive or multiplicative) white or by coloured noise cannot produce the desired period correlation but predominantly results in phase diffusion. In order to obtain reliable negative correlations, it is necessary either to introduce two conveniently placed noise sources (as in the WK model), or to add a second oscillator that is coupled either unidirectionally (i.e. an external forcing function) or bidirectionally to the limit cycle oscillator of interest, thus stabilizing its phase. In the latter system a single noise source is suf1cient to obtain the sought-after negative lag-1 serial correlation. This is interesting because, in this case, the property is not simply the result of certain statistical properties but rather the consequence of deterministic interactions between two coupled oscillators with a single noise source. In order to incorporate Daffertshofer’s conclusions as well as to account for the previously described results that the stability of the coordination between two rhythmically moving limbs is unaffected by their amplitudes, Beek, Peper, and Daffertshofer (in press; see also Peper, Beek, and Daffertshofer 2000) proposed a new dynamical model for interlimb coordination. This comprises a two-tiered structure that resembles the Wing and Kristofferson (1973) model as extended by Vorberg and Hambuch (1984) for application to bilateral responding (see also Wing 1982; Wing et al. 1989). Thus it represents timing in terms of two levels, one neural, one effector, and there is a divergence at the effector level. However, in the new model the elements at each level are composed of oscillators (see Fig. 10.14) and the two lower-level effector oscillators are driven by two nonlinearly coupled limit cycle oscillators at the neural level. Because of their nonlinear coupling, the limit cycles of the neural level oscillators can exhibit phase transitions between phase locked and/or frequency locked states as the frequency is increased. However, this effect is not necessarily mediated by the amplitudes of the oscillators, which may be set to 1 (representing a 1xed level of neural activity). The driving signals from each of the coupled neural limit cycle oscillators are transferred to a peripheral oscillator having certain physical properties, such as mass, stiffness, and damping. These physical properties lead to a particular amplitude response to the driving signals, which may or may
aapc10.fm Page 223 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
forcing
feedback
coupling (HKB)
neural level
effector level
Fig. 10.14 Neural and effector levels of oscillators are assumed in the two-tiered oscillator model of Beek et al. (in press) and Peper et al. (2000).
not be characterized by a resonance peak. In either case the response of the peripheral oscillators may entail a drop in amplitude over a large range of frequencies. This frequency-induced drop in amplitude is viewed as a purely peripheral effect that has no consequence for the stability properties of the observed patterns of coordination. These follow from the neural limit cycles and their interactions, although kinesthetic feedback signals related to the phasing of the peripheral oscillators may play into this neural control structure. Due to its two-tiered structure, the proposed model is capable of providing an account for the differential effects of frequency (tempo) and amplitude on pattern stability. However, it also provides a good starting point for explaining the temporal variability characteristics of timed motor responses in that it satis1es the minimal requirements for a dynamical model of negative lag-1 serial correlations in the periods.
10.13 Summary and conclusions In this review we have considered two approaches to variability in timing. The information processing model or time series approach is based on linear autoregressive moving average (ARMA) modelling of stable discrete behaviour (Pressing 1999). This approach lends itself to hierarchical modelling with subdivision of noise sources through explicit consideration of dependence in the dataset. The lowest level in the hierarchy is then the motor process. Thus the information processing approach does not reject motor systems, rather it avoids detailing all aspects of movement timing. The dynamical systems approach, based on nonlinear oscillators, is represented in terms of differential equations. The latter provide a full description of spatial trajectory and in that sense fully model the movement system. A major achievement of the dynamical systems perspective is that various aspects of classic physical system phase transitions (or bifurcations) have been shown to occur in repetitive coordinated human behaviour. These transitions, which are not obviously explained by motor program approaches, include features such as discontinuity, hysteresis, critical 2uctuations, and critical slowing. Scienti1c explanation often proceeds by setting two competing views against each other to determine which is right and which is wrong. This is enshrined in inferential statistics by hypothesis testing in which the goal is to prove the null hypothesis wrong and so support the alternative. However,
223
aapc10.fm Page 224 Thursday, December 6, 2001 9:52 AM
224
Common mechanisms in perception and action
behaviour is multifaceted and context-dependent. So, while one theory may be supported when one set of aspects of behaviour or contexts is considered, the other theory may receive support when the focus shifts to other aspects of behaviour in other contexts. Hence our understanding of behaviour may advance not by proving one right and the other wrong, but rather by characterizing why one theory works in one situation, the other (or others) in another situation. With two different approaches to timing, we would argue that, rather than asking which is right or which is better, it is more useful to think what the relation is between the theories (see also Heuer 1996). While the two approaches were originally drawn up to account for different aspects of behaviour, they are now increasingly being applied to the same or related aspects of behaviour. So we consider it important to actively seek aspects of each approach that can be used in a combined approach that might be more powerful. Pressing (1999) for example has argued for a general approach, referential behaviour theory, based on an underlying principle of homeostasis and drawing on the language of control theory, which can be applied to both information processing and dynamical systems approaches to timing. However, we acknowledge that there is a risk of losing simplicity and comprehensibility if the relation between the approaches is not carefully examined. Thus we believe development will be through contrasting and comparing as well as seeking to combine the two approaches.
Acknowledgement We thank Andreas Daffertshofer for comments on an earlier version of this paper and for assistance with the 1gures. AMW was supported by a grant from the MRC.
References Beek, P.J., Peper, C.E., and Daffertshofer, A. (in press). Modeling rhythmic interlimb coordination: Beyond the Haken–Kelso–Bunz model. Brain and Cognition. Beek, P.J., Peper, C.E., and Stegeman, D.F. (1995). Dynamical models of movement coordination. Human Movement Science, 14, 573–608. Creelman, C.D. (1962). Human discrimination of auditory duration. Journal of the Acoustical Society of America, 34, 582–593. Daffertshofer, A. (1997). Nichtgleichgewichtsphasenübergänge in der Menschlichen Motorik und Erweiterung des synergetischen Computers. Aachen: Shaker Verlag. Daffertshofer, A. (1998). Effects of noise on the phase dynamics of non-linear oscillators. Physical Review E, 58, 327–338 Engbert, R., Scheffczyk, C., Krampe, R.T., Rosenblum, M., Kurths, J., and Kliegl, R. (1997). Tempo-induced transitions in polyrhythmic hand movements. Physical Review E, 56, 5823–5833. Gibbon, J., Church, R.M., and Meck, W.H. (1984). Scalar timing in memory. In J. Gibbon and L. Allan (Eds.), Timing and time perception. Annals of the New York Academy of Sciences, 423, 52–77. Govier, L.J. and Lewis, T. (1967). Estimation of the dispersion parameter of an (A,B) process. In R. Cruon (Ed.), Queuing theory: Recent developments and applications. Amsterdam: Elsevier. Gregson, R.G. and Pressing, J. (2000). Dynamic modeling. In L.G. Tassinary, J.T. Caccioppo, and G. Berntson (Eds.), Principles of psychophysiology: Physical, social, and inferential elements. Cambridge: Cambridge University Press. Haken, H., Kelso, J.A.S., and Bunz, H. (1985). A theoretical model of phase transitions in human hand movements. Biological Cybernetics, 51, 347–356. Haken, H.H., Peper, C.E., Beek, P.J., and Daffertshofer, A. (1996). A model for phase transitions in human hand movements during multifrequency tapping. Physica D, 90, 179–196.
aapc10.fm Page 225 Thursday, December 6, 2001 9:52 AM
Movement timing: a tutorial
Helmuth, L.L. and Ivry, R.B. (1996). When two hands are better than one: Reduced timing variability during bimanual movements. Journal of Experimental Psychology: Human Perception and Performance, 22, 278–293. Heuer, H. (1996). In H. Heuer and S. Keele (Eds.), Human motor performance. New York: Academic Press. Inhoff, A.W. and Bisiacchi, P. (1990). Unimanual tapping during concurrent articulation: Generalized and lateralized effects of memory encoding upon the rate and variability of concurrent 1nger tapping. Brain and Cognition, 6, 24–40. Ivry, R.B. and Hazeltine, E. (1999). Subcortical locus of temporal coupling in the bimanual movements of a callosotomy patient. Human Movement Science, 18, 345–375. Jagacinski, R.J., Marshburn, E., Klapp, S.T., and Jones, M.R. (1988). Tests of parallel versus integrated structure in polyrhythmic tapping. Journal of Motor Behavior, 20, 416–442. Jeka, J.J. and Kelso, J.A.S. (1995). Manipulating symmetry in the coordination dynamics of human movement. Journal of Experimental Psychology: Human Perception and Performance, 21, 360–374. Kay, B.A., Kelso, J.A.S., Saltzman, E.L., and Schöner, G. (1987). Space–time behavior of single and bimanual rhythmical movements: Data and limit cycle model. Journal of Experimental Psychology: Human Perception and Performance, 13, 178–192. Kelso, J.A.S. (1984). Phase transitions and critical behavior in human bimanual coordination. American Journal of Physiology, 15, 1000–1004. Kelso, J.A.S., DelColle, J.D., and Schöner, G. (1990). Action-perception as a pattern formation process. In M. Jeannerod (Ed.), Attention and performance XII, pp. 139–169. Hillsdale, NJ: Lawrence Erlbaum. Kelso, J.A.S., Ding, M., and Schöner, G. (1994). Dynamic pattern formation: A primer. In L.B. Smith and E. Thelen (Eds.), A dynamic systems approach to development, pp. 13–50. Cambridge, MA: MIT Press. Kelso, J.A.S. and Jeka, J.J. (1992). Symmetry breaking dynamics of human multilimb coordination. Journal of Experimental Psychology: Human Perception and Performance, 18, 645–668. Klapp, S.T., Nelson, J.M., and Jagacinski, R.J. (1998). Can people tap concurrent bimanual rhythms independently? Journal of Motor Behavior, 30, 301–322. Krampe, R.T., Kliegl, R., Mayr, U., Engbert, R., and Vorberg, D. (2000). The fast and slow of skilled bimanual rhythm production: Parallel vs. integrated timing. Journal of Experimental Psychology: Human Perception and Performance, 26, 206–233. Michon, J.A. (1966). Tapping regularity as a measure of perceptual motor load. Ergonomics, 9, 401–412. Peper, C.E. and Beek, P.J. (1998a). Are frequency-induced transitions in rhythmic coordination mediated by a drop in amplitude? Biological Cybernetics, 79, 291–300. Peper, C.E. and Beek, P.J. (1998b). Distinguishing between the effects of frequency and amplitude on interlimb coupling in tapping a 2:3 polyrhythm. Experimental Brain Research, 118, 78–92. Peper, C.E., Beek, P.J., and Daffertshofer, A. (2000). Considerations regarding a comprehensive model of (poly)rhythmic movements. In P. Desain and L. Windsor (Eds.), Rhythm perception and production, pp. 35–49. Lisse: Swets and Zeitlinger. Peper, C.E., Beek, P.J., and Wieringen, P.C.W. van (1991). Bifurcations in bimanual tapping: In search of Farey principles. In J. Requin and G.E. Stelmach (Eds.), Tutorials in motor neuroscience, pp. 413–431. Dordrecht: Kluwer. Peper, C.E., Beek, P.J., and Wieringen, P.C.W. van (1995). Frequency-induced phase transitions in bimanual tapping. Biological Cybernetics, 73, 301–309. Pressing, J. (1999). The referential dynamics of cognition and action. Psychological Review, 106, 714–747. Pressing, J. and Jolley-Rogers, G. (1997). Spectral properties of human cognition and skill. Biological Cybernetics, 76, 339–347. Rosenbaum, D.A. and Patashnik, O. (1980). Time to time in the human motor system. In R.S. Nickerson (Ed.), Attention and performance VIII. Hillsdale, NJ: Erlbaum. Saito, S. and Ishio, A. (1998). Rhythmic information in working memory: Effects of concurrent articulation on reproduction of rhythms. Japanese Psychological Research, 40, 10–18. Scholz, J.P., Kelso, J.A.S., and Schöner, G. (1987). Nonequilibrium phase transitions in coordinated biological motion: Critical slowing down and switching time. Physics Letters A, 123, 390–394. Schöner, G. (1994). From interlimb coordination to trajectory formation: Common dynamical principles. In S.P. Swinnen, J. Massion, H. Heuer and P. Casaer (Eds.), Interlimb coordination: Neural, dynamical, and cognitive constraints, pp. 339–368. San Diego, CA: Academic Press. Schöner, G., Haken, H., and Kelso, J.A.S. (1986). A stochastic theory of phase transitions in human hand movements. Biological Cybernetics, 53, 247–257.
225
aapc10.fm Page 226 Thursday, December 6, 2001 9:52 AM
226
Common mechanisms in perception and action
Semjen, A., Schulze, H.-H., and Vorberg, D. (2000). Timing precision in continuation and synchronization tapping Psychological Research, 63, 137–147. Sergent, V., Hellige, J.B., and Cherry, B. (1993). Effects of responding hand and concurrent verbal processing on time-keeping and motor-implementation processes. Brain and Cognition, 23, 243–262. Shaffer, L.H. (1982). Rhythm and timing in skill. Psychological Review, 89, 109–122. Stevens, L.T. (1886). On the time sense. Mind, 11, 393–404. Summers, J.J. and Kennedy, T.M. (1992). Strategies in the production of 5:3 polyrhythm. Human Movement Science, 11, 101–112. Summers, J.J., Ford, S.K., and Todd, J.A. (1993a). Practice effects on the coordination of the two hands in a bimanual tapping task. Human Movement Science, 12, 111–133. Summers, J.J., Rosenbaum, D.A., Burns, B.D., and Ford, S.K. (1993b). Production of polyrhythms. Journal of Experimental Psychology: Human Perception and Performance, 19, 416–428. Treisman, M. (1963). Temporal discrimination and the indifference interval: Implications for a model of the internal clock. Psychological Monographs, 77, 13, Whole No. 576. Vorberg, D. and Hambuch, R. (1978). On the temporal control of rhythmic performance. In J. Requin (Ed.), Attention and performance VII, pp. 535–555. Hillsdale, NJ: Erlbaum. Vorberg, D. and Hambuch, R. (1984). Timing of two-handed rhythmic performance. In J. Gibbon and L. Allan (Eds.), Timing and time perception. Annals of the New York Academy of Sciences, 423, 390–406. Vorberg, D. and Wing, A.M. (1996). Modeling variablity and dependence in timing. In H. Heuer and S. Keele (Eds.), Handbook of perception and action, Vol. 2, pp. 181–262. London, New York: Academic Press. Wing, A.M. (1980). The long and short of timing in response sequences. In G.E. Stelmach and J. Requin (Eds.), Tutorials in motor behaviour, pp. 469–486. Amsterdam: North-Holland. Wing, A.M. (1982). Timing and coordination of repetitive bimanual movements. Quarterly Journal of Experimental Psychology, 34, 339–348. Wing, A.M, Church, R.M., and Gentner, D.R. (1989). Variability in the timing of responses during repetitive tapping with alternate hands. Psychological Research, 51, 28–37. Wing, A.M. and Kristofferson, A.B. (1973). Response delays and the timing of discrete motor responses. Perception and Psychophysics, 14, 5–12. Yamanishi, T., Kawato, M., and Suzuki, R. (1980). Two coupled oscillators as model for the coordinated 1nger tapping by both hands. Biological Cybernetics, 37, 221–225.
aapc11.fm Page 227 Wednesday, December 5, 2001 10:01 AM
11 Timing mechanisms in sensorimotor synchronization Gisa Aschersleben, Prisca Stenneken, Jonathan Cole, and Wolfgang Prinz Abstract. This study examines the in2uence of sensory feedback on the timing of simple repetitive movements in a sensorimotor synchronization task. Subjects were instructed to synchronize 1nger taps with an isochronous sequence of auditory signals. Although this is an easy task, a systematic error is commonly observed: taps precede clicks by several tens of milliseconds. One explanation proposed for ‘negative asynchrony’ is based on the idea that synchrony is established at the level of central representation (and not at the level of external events) and that the timing of an action is determined by the (anticipated) action effect. To test this hypothesis the sensory feedback available from the tap as well as its temporal characteristics were manipulated, and evidence supporting the hypothesis was obtained. After reviewing these 1ndings we report new evidence obtained from a deafferented subject who suffers from a complete loss of tactile and kinesthetic afferences from below the neck. In three experiments we studied his performance (and compared it with a group of age-matched control subjects) under conditions with differing amounts of feedback. In the 1rst experiment, in which all information about the tapping movement was excluded, the deafferented subject nevertheless maintained a stable phase relationship between the pacing signal and his movements but with a large negative asynchrony. In the second experiment, an auditory feedback tone was provided each time the subject touched the key. This manipulation led to clear improvement in his performance; however, he was only able to tap in exact synchrony in the third experiment, when he was allowed to visually monitor his tapping movements as well. These results demonstrate the important role of sensory feedback in the timing of movements. Furthermore, the 1ndings suggest an internal prediction of the movement’s sensory consequences, as expressed in the account of internal forward models.
11.1 Introduction The importance of sensory feedback in the control of movements has been shown in numerous studies. One important function of feedback is the online correction of movements, for example in pointing. Seeing the hand as it approaches a visible target is an instance of closed-loop control. Under openloop conditions, when feedback is unavailable, no corrective movements are possible, leading to a degradation in performance. Besides visual feedback, sensory reafferent information from proprioception and touch plays an important role in the temporal and spatial control of movements. The present study is concerned with the role of sensory feedback in the timing of movements. To analyze the timing of movements in the absence of other confounds, it is reasonable to study simple, repetitive tasks, in which subjects are required to accompany a predictable stimulus with a simple movement. In the synchronization task, subjects tap with, for example, the right index 1nger on a key at a given rate (tapping task). The beat is presented by a metronome emitting clicks. The interval between the 1nger touching the key and the presentation of the pacing signal is calculated as the dependent variable. Though this is an easy task, subjects are generally not able to perform the movement in exact synchrony with the clicks and a systematic error is observed: taps usually precede clicks by several tens of milliseconds—the ‘negative asynchrony’. This effect was described
aapc11.fm Page 228 Wednesday, December 5, 2001 10:01 AM
228
Common mechanisms in perception and action
more than a century ago (e.g. Dunlap 1910; Johnson 1898; Miyake 1902) and has been replicated ever since in many studies (e.g. Aschersleben and Prinz 1995, 1997; Fraisse 1980; Kolers and Brewster 1985; Mates, Müller, Radil, and Pöppel 1994; Repp 2000; Thaut, Tian, and Azimi-Sadjadi 1998; Vos, Mates, and van Kruysbergen 1995; Wohlschläger and Koch 2000; for an overview, see Aschersleben, in press). The size of the negative asynchrony depends to a great extent on the experimental conditions, though there are large interindividual differences as well. One important factor in2uencing the asynchrony is musical experience. Highly trained musicians exhibit a much smaller asynchrony than musically untrained persons. Ludwig (1992) asked 27 students at the Music Academy in Munich to synchronize their taps to an auditory pacing signal under controlled feedback conditions (visual and auditory feedback from the tap was eliminated). Their mean asynchrony was 14ms (SD between subjects: 13 ms) compared with 50 ms (SD between subjects: 23 ms) in an age-matched cohort of musically untrained persons. However, all subjects revealed a negative asynchrony, so that even the highly musically trained subjects did not tap in exact synchrony with the metronome. One reason for the reduced asynchrony in musicians might be the huge amount of training with different kinds of synchronization tasks during individual and ensemble playing. To analyse the in2uence of training on the asynchrony, Aschersleben (2000, 2001) asked musically untrained subjects to tap for ten sessions (45 min each; about 10,000 taps in total) with and without knowledge of results about their asynchronies. If people tap without this feedback, no change in asynchrony was observed as a function of practice. Only when knowledge of results was provided did the subjects alter their performance and become able, after ten sessions, to tap in exact physical synchrony. However, in this condition subjects reported that they had to subjectively delay their tap to produce the required objective synchrony.
11.1.1 An explanatory account: the role of sensory feedback Although the negative asynchrony has been known for more than a hundred years, the underlying mechanisms are still not completely understood. Some recent accounts are based on the assumption that synchrony is established at a central level at which both events and actions are represented in terms of their sensory consequences (Aschersleben, in press; Aschersleben and Prinz 1995, 1997; Aschersleben, Gehrke, and Prinz in press; Gehrke 1996; Prinz 1990, 1997; for comparable ideas see Fraisse 1980; Mates 1994; Paillard 1949). If so, the temporal delays involved in perceiving the click and the tap become crucial. Because of differences in (central and/or peripheral) processing times, the temporal delay between actual and perceived click is likely to be shorter than the delay between actual and perceived tap, so that the actual tap must precede the actual click to result in synchrony between the perceived events. As a consequence, the negative asynchrony between click onset and overt tap is observed. There are two versions of this hypothesis (the nerve-conduction hypothesis and the sensory accumulator model) that differ in the levels at which the asynchrony is caused. According to the nerveconduction hypothesis (also known as the Paillard–Fraisse hypothesis), differences in nerve-conduction time for the sensory information about the click (from ear to brain) and the tap (from 1nger to brain) are responsible for the negative asynchrony (Aschersleben 1994; Aschersleben and Prinz 1995, 1997; Fraisse 1980; Paillard 1949). Thus it is assumed that the negative asynchrony originates in peripheral processes. In contrast, the sensory accumulator model assumes the crucial factor to be the central processing times involved in generating central representations of the peripheral events. According to that
aapc11.fm Page 229 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
model, these central representations unfold in time and cannot be considered as punctuate events. For both clicks and taps the unfolding of the sensory evidence is captured by an accumulation function. It is assumed that the points at which the accumulation functions reach their respective thresholds determine the times at which the two events are perceived. The sensory accumulator model posits that the sensory evidence for clicks and taps gets accumulated in the same functional domain and that the same threshold applies to both of them. It is further assumed that to achieve perceived synchrony, the two accumulation functions need to hit the threshold at the same point. Given these two assumptions, it is the steepness of the two accumulation functions that determines to what extent the onsets of the two physical events must be offset to achieve perceived synchrony of the two mental events. For instance, if the auditory pacing signal used in the synchronization task has a steeper accumulation function than the tap (because the temporal resolution is much higher in the auditory than in the tactile system), the tap onset needs to precede the click onset to make sure that both functions hit the threshold at the same point (Aschersleben et al. in press; Gehrke 1996). Although the two hypotheses differ in the processes assumed to be crucial, both rely on the assumption that synchrony is established at a central level at which both events and actions are represented. Therefore, we will refer to them as representational models (see Aschersleben, in press). This view differs from models assuming that the cognitive system is able to generate coincidence between external events in a veridical manner. These models are based on the idea of an asymmetrical error tolerance, that is, the cognitive system is assumed to tolerate (small) negative errors (taps preceding clicks) whereas positive errors (taps following clicks) are corrected almost immediately. Results from psychophysical experiments support this view by showing that taps following clicks are detected with larger probability than taps preceding clicks (Koch 1999; for overviews see Aschersleben, in press; Müller, Aschersleben, Koch, Freund, and Prinz 1999). We will not elaborate on these models because the empirical part of this chapter aims at further testing the representational models. Another important feature of the two hypotheses described above is their emphasis on the role of sensory (re)afferent signals in the timing of tapping movements. An obvious way to test this assumption is to manipulate the feedback coming from the tap so as to change the temporal delay between actual and perceived tap. The longer this delay, the more the actual tap must precede the actual click to result in coincidence of the two perceived events, and the more pronounced the negative asynchrony should be. This prediction has received empirical support in several experiments, which will now be summarized.
11.1.2 Empirical support for the role of sensory feedback Experiments designed to manipulate the delay between the tap and its central representation have delayed the intrinsic (somatosensory) feedback, added extra, extrinsic feedback components, and intensi1ed or eliminated (part of) the intrinsic feedback. A simple way to manipulate the delay between the tap and its central representation is to make participants tap with different body parts. Experiments in which subjects were asked to tap with the hand or foot (under the assumption that the temporal delay between actual and perceived tap increases with the ‘neural distance’ between effector and brain) have shown that the negative asynchrony is more pronounced with foot than with hand tapping (Aschersleben and Prinz 1995; Billon, Bard, Fleury, Blouin, and Teasdale 1996a; Fraisse 1980). Corresponding results have been reported
229
aapc11.fm Page 230 Wednesday, December 5, 2001 10:01 AM
230
Common mechanisms in perception and action
for self-initiated tapping with hand and foot, that is, conditions in which the absolute timing of the (simultaneously performed) hand and foot tap was determined by the subject (Bard, Paillard, Teasdale, Fleury, and Lajoie 1991; Bard et al. 1992; Billon et al. 1996a; Paillard 1949). The increased asynchrony in foot tapping compared with hand tapping was independent of the body side involved and of whether the two effectors performed the tapping movement separately or simultaneously (Aschersleben and Prinz 1995; Billon et al. 1996a). To study the in2uence of additional auditory feedback on the timing of the tap, some studies presented an auditory signal each time the subject touched the key. Under the assumption that the central representation of the tap is an integrated percept of all feedback components (‘joint event code’), a reduced negative asynchrony was expected because the additional auditory feedback was thought to arrive prior to the tactile/kinesthetic feedback. Any integration of these two parts should lead to a reduced negative asynchrony (Aschersleben and Prinz 1995, 1997; for a similar assumption, see Fraisse, Oléron, and Paillard 1958).1 This effect was indeed found in a number of such experiments (e.g. Aschersleben and Prinz 1995, 1997; Mates and Aschersleben 2000; Mates, Radil, and Pöppel 1992; O’Boyle and Clarke 1996; Repp, 2001). An extension of this approach can be found in studies on delayed auditory feedback in which a delay is introduced between the touch of the key and the presentation of the corresponding feedback tone. If the delay is short enough for the subjects not to be aware of this manipulation (less than 100 ms), an increase in the negative asynchrony with increasing delay is observed (Aschersleben and Prinz 1997; Mates and Aschersleben 2000; see also Fraisse et al. 1958). Moreover, if there is a joint event code resulting from all feedback components, then a linear relationship between asynchrony and delay is expected. This is because if there is a constant weighting between the feedback components, a linear shift in the timing of one component should result in a linear shift in the timing of the joint event code and, thus, a linear shift in the observed asynchrony. Corresponding experiments con1rm this prediction (Aschersleben and Prinz 1997). Another way to manipulate the delay between the tap and its central representation is to intensify tactile/kinesthetic feedback from the 1nger (Aschersleben et al. in press; Gehrke 1995). According to the threshold model outlined earlier, this should lead to a steeper accumulation function and thus reduce the negative asynchrony. One way to test this prediction is to vary the amplitude of 1nger movement. A large amplitude leads to an increase in force and, thus, to an increase in tactile stimulation. In addition, larger 1nger movements are performed with higher velocity, which increases kinesthetic feedback. Altogether, a larger movement amplitude intensi1es sensory stimulation at the 1nger, which should result in a reduced asynchrony. Indeed, when subjects tapped with large movement amplitudes the negative asynchrony was signi1cantly smaller than in conditions with small movement amplitudes (Aschersleben et al. in press; Gehrke 1995). Finally, another way of testing the hypothesis that sensory feedback is important in determining the timing of movements is to eliminate feedback altogether, either by applying local anesthesia to healthy subjects or by studying subjects with pathological loss of sensory feedback. In both kinds of studies it is important to make sure that the efferent nerves are unimpaired. Aschersleben, Gehrke, and Prinz (2001) studied the in2uence of local anesthesia of the index 1nger on the timing of taps. Local anesthesia suppresses tactile reafferent information without disturbing the reafferent discharge of the joint and muscle receptors. It affected neither the maximum tapping rate nor the timing of synchronized 1nger movements without a response key (tapping in the air like a conductor), indicating that the efferent nerve 1bers and the transmission of kinesthetic feedback information were unimpaired. However, under conditions of standard 1nger tapping the
aapc11.fm Page 231 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
negative asynchrony was signi1cantly increased, demonstrating an important in2uence of tactile feedback on the timing of the taps. Only few pathological cases meet the criterion of complete sensory deafferentation because complete sensory loss with unimpaired efferent pathways and cortical motor structures is very rare. Bard et al. (1992) studied a woman who is almost completely deafferented (for a description, see Cooke, Brown, Forget, and Lamarre 1985). The task was to initiate, simultaneously, ipsilateral 1nger extension and heel raising in two conditions: (1) simple reaction time (movements triggered by an external signal), and (2) self-initiated movement. In the reactive condition, the deafferented subject showed a behavior similar to healthy controls, with 1nger movement preceding heel raising, but her performance differed in the self-initiated condition. In healthy subjects, a lead of the foot movement is observed, which supports the idea of synchronized afferent feedback. However, the deafferented subject did not show this effect, which suggests that she had to rely on synchronization of her motor commands. Billon, Semjen, Cole, and Gauthier (1996b) studied another deafferented subject who lacked proprioceptive and tactile sensitivity below the neck. They asked their subject to produce sequences of periodic 1nger taps involving a complex pattern of accentuation in synchrony with a metronome. They found that, without visual and auditory feedback, the deafferented man did not lose correct phasing between the taps and the clicks of the pacing signal. That is, at least for the required sequence of 35 taps, he was able to produce a regular sequence of synchronized taps without any feedback available. However, his mean synchronization error increased and so did the force of the taps and the amplitude of the tapping movements. In addition, the deafferented subject did not show an effect observed in the control subjects, namely a delay in the initiation of the accentuated taps. This result indicated that movement-related feedback plays a prominent role in the temporal control of the tapping movements (Billon et al. 1996b). The aim of the present experiments was to investigate the effect of loss of sensory feedback on the timing of tapping movements in greater detail by studying IW, who had a complete loss of tactile and kinesthetic afferent 1bers from below the neck. We asked IW to tap with hand and foot in synchrony with an auditory metronome under different feedback conditions. We wondered whether IW would be able to perform the synchronization task without auditory and visual control, that is, without any feedback at all about his own movements (Experiment 1). The report by Billon et al. (1996b), who had studied the same deafferented subject IW, gives a 1rst hint that he can keep his taps in phase with an acoustic metronome simply by listening to the metronome sounds. We also wanted to compare IW’s performance to that of healthy control subjects. To study the role of sensory feedback in greater detail, we then manipulated the availability of auditory feedback and visual control in Experiments 2 and 3.
11.2 Experiment 1 In this experiment, we asked IW to tap either with the hand or with the foot. However, we excluded auditory as well as visual information about the tapping movement. IW’s performance under these conditions was compared with that of an age-matched group of healthy controls. From the perspective of the representational models we would expect that IW is unable to perform the synchronization task under conditions in which neither proprioceptive nor auditory or visual feedback about the tapping movement is available. However, the results from the Billon et al. study (1996b) indicate that IW can follow an auditory pacing signal with his taps.
231
aapc11.fm Page 232 Wednesday, December 5, 2001 10:01 AM
232
Common mechanisms in perception and action
11.2.1 Method 11.2.1.1 Subjects A control group of 14 healthy subjects (6 female, 8 male, between 42 and 53 years of age, mean age 46.8 years; all right-handed) and a deafferented male subject, IW (age 47 years, left-handed), participated. At the age of 19 years, IW suffered a purely sensory neuronopathy with acute onset. This led to a total loss of kinesthetic and tactile sensitivity for the whole body below the neck. Clinical tests revealed an absence of large myelinated 1ber function. Sensory nerve action potentials and cutaneous muscular reflexes are absent. Temperature and deep pressure sensations and perception of muscle fatigue are still present, suggesting a signi1cant sparing of small myelinated and unmyelinated 1bers. Motor conduction velocities are normal. Clinical rehabilitation extended over three years. All movements require sustained attention and visual control. IW is able to move by visually monitoring his actions, but he is unable to carry out simultaneous motor tasks such as maintaining a precision grip while walking, because of a limited span of attention (for details, see Cole 1995; Cole and Paillard 1995; Cole and Sedgwick 1992). 11.2.1.2 Apparatus and stimuli The subjects were seated at a table and were asked to tap with the dominant index 1nger or foot on a silent electrical contact switch mounted on a wooden board. To eliminate visual feedback the key was placed behind a wooden plate that obstructed the subject’s view (see Fig. 11.1). The auditory pacing signal (1000 Hz, 80 dB[A], duration 10ms, interstimulus interval 600 ms) was presented binaurally through headphones (audio-technical ATH–A5). To mask other external sounds, continuous white noise (20 dB[A]) was added. The stimuli were produced by a personal computer (Compaq Presario 1630) via a SoundBlaster-compatible sound card. The computer controlled the experimental procedure and registered the onset of keypresses (with a resolution of 1ms).
Fig. 11.1 Illustration of the experimental set-up in conditions without visual control.
aapc11.fm Page 233 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
11.2.1.3 Procedure The hand and foot tapping conditions were presented blockwise, each block consisting of five trials. Each trial presented 50 pacing signals. Instructions required the subjects to start tapping as soon as they picked up the beat (usually within the 1rst three signals) and then to tap along as precisely as possible. At the beginning the subjects performed some taps with the hand and the foot without any pacing signal to get a ‘feeling’ for the required movement. We knew from previous studies with IW that in the absence of visual feedback he would lose contact with the response key. Therefore we 1xed his hand (at the wrist and the other 1ngers) and his foot (at the heel) with Velcro straps on the wooden part of the keyboard. However, this manipulation did not interfere with his natural 1nger and foot tapping movements. To avoid fatigue and feedback via muscle fatigue, short breaks between trials and longer breaks between blocks were introduced. 11.2.2 Results Data analysis started with the fourth signal in each trial. The initial taps were not included because about three signals were required for the subject to pick up the beat. Hence, the means reported below always refer to the taps accompanying the remaining 47 signals in each trial. The means of the asynchronies between tap onsets and click onsets were computed for each trial. Negative values indicate that taps preceded clicks. Trials were dropped from the analysis if they contained fewer than 25 taps or when the standard deviation exceeded a pre-set criterion of 100ms. For the control subjects 2.1% of the trials had to be rejected, for IW one single trial.
Fig. 11.2 Averages and standard errors of the mean asynchronies for the deafferented subject IW and the age-matched control subjects, for hand and foot tapping without auditory feedback or visual control (Exp. 1). The standard errors for control subjects represent between-subject variability in mean asynchrony. Fig. 11.2 shows the mean asynchronies for hand and foot tapping for IW and for the control subjects. IW had large mean asynchronies for both hand and foot tapping (−103 ms and −146 ms). Although the mean difference between hand and foot tapping amounted to 43ms, it was not signi1cant, t(8)=1.80,
233
aapc11.fm Page 234 Wednesday, December 5, 2001 10:01 AM
234
Common mechanisms in perception and action
p > 0.10. This was due to large variability between trials: the trial mean asynchrony ranged from −145ms to −56ms in hand tapping and from –186ms to –93ms in foot tapping. The control subjects, who had tactile information about the tap onset and kinesthetic information about the 1nger movement available, also showed clear negative asynchronies under both effector conditions (hand:−28 ms, foot:−59 ms). However, a one-sample t-test procedure testing whether IW’s asynchrony differed from the average asynchronies of the control subjects revealed that their asynchronies were much smaller than IW’s, t(13) = 12.10, p < 0.001. In addition, a signi1cant difference between hand and foot was observed in the control subjects (31 ms), as expected, t(13) = 4.96, p < 0.001. The mean standard deviations of the asynchronies within trials for IW (hand: 27 ms; foot: 36 ms) were similar to those of the control subjects (hand: 23 ms; foot: 27 ms).
11.2.3 Discussion In this 1rst experiment, hand and foot tapping were studied under conditions without any auditory or visual information about the tapping movement. Healthy controls had to rely on their tactile and kinesthetic feedback from the 1nger movement to control their taps, whereas the deafferented subject IW was deprived of all feedback information about the spatial and temporal characteristics of his movements. Nevertheless, he coped with the task rather well. The results for the control subjects revealed a clearly negative asynchrony in both the hand and the foot-tapping condition, with the asynchrony for the foot being even more pronounced. These results are in accordance with those reported in the literature (Aschersleben and Prinz 1995; Billon et al. 1996a; Fraisse 1980). Even if we quantitatively compare the results of the age-matched controls in the present study (mean age 47 years) with the results from the usually studied college students (mean age about 27 years), there is no substantial difference (for a similar result, see Stenneken, Aschersleben, Cole, and Prinz in press). The mean asynchrony for hand tapping in college students is typically between −50ms and −30ms (e.g. Aschersleben and Prinz 1995, 1997; Billon et al. 1996a; Fraisse 1980; Mates and Aschersleben 2000; Peters 1989), and that for foot tapping is between −90 ms and −50 ms (Aschersleben and Prinz 1995; Billon et al. 1996a; Fraisse 1980). The mean asynchronies for the present age-matched control group fall well within these ranges. Therefore, we conclude that age was not an important factor. More interestingly, the deafferented subject IW was able to coordinate his taps with the pacing signal (as with indicated in the study by Billon et al. 1996b) and, moreover, he showed a negative asynchrony as well. Compared control subjects, IW performed the synchronization task with rather large asynchronies. Though interindividual differences are well known, IW’s performance was still outside the range of control subjects. The mean asynchronies for individual subjects in the control group ranged from −65 ms to + 1 ms for hand tapping and from −116 ms to 0 ms for foot tapping. How was IW able to perform the synchronization task? Although he cannot rely on any kind of feedback to time and to control the 1nger or foot movements, his performance in the synchronization task was close to normal. The variability of the asynchronies within trials indicates that the timing of the movements was far from being random. On the contrary, performance was as stable as in the control subjects. However, variability between trials was rather high. For both hand and foot tapping the range in the corresponding trials was about 90ms, whereas the mean range in the control subjects was about 25 ms. This may suggest that IW made an estimate of the delay between his motor command and the tap at the beginning of each trial and then tried to synchronize the click with this simulation of the tap. To be more precise, we assume that he internally simulated the
aapc11.fm Page 235 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
central representation of the tap, which is usually based on the sensory feedback, and compared the timing of this internally generated tap representation with the timing of the click representation. We come back to this interpretation in the General Discussion. However, there are at least two possible alternative explanations. One hypothesis is that IW reacted to the previous signal, rather than anticipating the upcoming one. Then, his mean reaction time would amount to 500ms for the hand and 450ms for the foot condition, which is rather long. In a simple reaction time task, Stenneken et al. (in press) observed mean reaction times less than 300 ms in IW. Another argument against this interpretation is the fact that in reaction time tasks, the hand is clearly faster than the foot. This has been reported not only for healthy control subjects (e.g. Bard et al. 1991, 1992; Paillard 1949; Seashore and Seashore 1941; Stenneken et al. in press) but for deafferented subjects as well (Bard et al. 1992; Stenneken et al. in press). The second alternative explanation is that IW just maintained the initial phase he started within a given trial. This strategy would imply that IW performs his taps without any error correction. Tapping without error correction, however, would lead to considerable drifts within each trial.2 Moreover, the pattern of asynchronies at the beginning of each trial did not differ systematically from what is usually observed in control subject, that is, in some trials he kept the initial value whereas in other trials he started at a larger (or smaller) asynchrony and then ‘tuned in’.
11.3 Experiment 2 In the second experiment we studied the in2uence of auditory feedback on synchronization performance. We asked our subjects again to tap either with the hand or with the foot, but each time the subject touched the key an auditory feedback signal was presented via headphones. Based on earlier 1ndings (Aschersleben and Prinz 1995, 1997; Mates and Aschersleben 2000; Mates et al. 1992; O’Boyle and Clarke 1996; Repp, 2001), we expected a reduction in the negative asynchrony for both conditions (hand and foot tapping) in the control subjects. However, the hand–foot difference should persist in reduced form (see Aschersleben and Prinz 1995). This pattern of results is consistent with the assumption that, when auditory feedback is available, each tap is represented by a late tactile/kinesthetic and an early auditory feedback code which are integrated into a joint event code that is ‘dated’ somewhere between the two codes it consists of. This model also makes a clear prediction for the deafferented subject. As central and peripheral processing times should be the same for both auditory stimuli (pacing signal and auditory feedback signal) and no other source of feedback is available, the mean negative asynchrony should disappear if the corresponding representations (perceived click and perceived tap) are synchronized at a central level.
11.3.1 Method Subjects, stimuli, and procedure were the same as in Experiment 1, except that an auditory feedback signal (2000Hz, 60 dB[A], duration 10ms) was presented binaurally through headphones each time the 1nger/foot touched the response key. Feedback was clearly distinguishable from the 1000-Hz pacing signal.
235
aapc11.fm Page 236 Wednesday, December 5, 2001 10:01 AM
236
Common mechanisms in perception and action
11.3.2 Results and discussion Data analysis was identical to that in the 1rst experiment. According to our pre-set criteria, 3.6% of the trials of the control subjects were rejected whereas in the deafferented subject IW no trial had to be rejected. Figure 11.3 shows the mean asynchronies for hand and foot tapping for IW and for the control subjects. Contrary to our expectations, the asynchrony between tap and click was not reduced in the control subjects, relative to Experiment 1 (Fig. 11.2), but the difference in asynchrony between hand and foot tapping remained (19ms), as expected, t(13) = 2.97, p = 0.01. A 2 × 2 ANOVA comparing the results of Experiments 1 and 2 only revealed a signi1cant main effect of effector, F(1, 13) = 21.26, p < 0.001. For the control subjects in the present study, synchronizing the auditory feedback tone with the pacing signal seemed a fairly dif1cult task. First, the number of trials that had to be eliminated (because of large variability) increased compared with Experiment 1. Second, even after the elimination of these trials the mean variability within trials was clearly higher than in Experiment 1 (33 ms vs. 25ms). Earlier experiments with college-age students did not show such an increase in variability (e.g. Aschersleben and Prinz 1995, 1997). The age of the subjects may have played a role here. The deafferented subject, however, showed clearly reduced asynchronies as a consequence of auditory feedback (hand tapping:−37 ms, foot tapping:−35ms). However, the negative asynchrony did not disappear as expected. A 2 × 2 ANOVA comparing IW’s results from Experiments 1 and 2 revealed only a highly signi1cant effect of feedback F(1,4) = 181.38, p < 0.001. The main effect of effector and the effector × feedback interaction clearly failed to reach signi1cance (p > 0.18). In addition, the difference in asynchrony between hand and foot tapping disappeared both statistically and numerically (2ms), t(8) = 0.21, p > 0.20. The results of the deafferented subject indicate that he used the available feedback to time his actions and, moreover, that the auditory feedback clearly improved his performance. The persisting negative asynchronies may derive from the fact that this subject very much relies on visual control of movements. He told us that performing a movement without visual control is a very hard, atten-
Fig. 11.3 Averages and standard errors of the mean asynchronies for the deafferented subject IW and the age-matched control subjects, for hand and foot tapping with auditory feedback (Exp. 2).
aapc11.fm Page 237 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
tion-demanding task for him that requires huge effort. Therefore, performing the tapping movement without visual control created a kind of dual-task situation and, as a consequence, it may well be that for IW the situation we studied in the present experiment was not comparable to the same situation in control subjects. This idea was tested in Experiment 3.
11.4 Experiment 3 In this experiment, we allowed subjects to monitor their movements visually. If the hypothesis is correct that the remaining asynchrony in the deafferented subject was due to a strong demand of the spatial control of the movement, the asynchrony should disappear in the present experiment. For the control subjects, however, visual monitoring should not make a big difference because they mainly rely on proprioceptive feedback in the spatial control of their movements. Support for these predictions can be found in the study by Billon et al. (1996b). They compared a condition without auditory and visual feedback with a condition that offered ‘natural’ feedback, that is, subjects were allowed to visually monitor their movements and could hear the touch of the key. Billon et al. found no difference between these two conditions in the asynchronies of the control subjects. However, the deafferented subject showed a negative asynchrony only in the no-feedback condition. In fact, under natural feedback he even showed a small positive asynchrony.
11.4.1 Method Subjects, stimuli, and procedure were the same as in Experiment 2, except that subjects were able and instructed to visually monitor their 1nger or foot movements during tapping.
11.4.2 Results and discussion According to our pre-set criteria, in the control subjects and in the deafferented subject IW no trial had to be rejected. Fig. 11.4 shows the mean asynchronies for hand and foot tapping for IW and for
Fig. 11.4 Averages and standard errors of the mean asynchronies for the deafferented subject IW and the age-matched control subjects, for hand and foot tapping with auditory feedback and visual control (Exp. 3).
237
aapc11.fm Page 238 Wednesday, December 5, 2001 10:01 AM
238
Common mechanisms in perception and action
the control subjects. As expected, the control subjects still showed negative asynchronies similar to those in the previous two experiments. Moreover, the asynchrony between tap and click was not reduced in the control subjects, relative to Experiment 2 (Fig. 11.3), and the difference in asynchrony between hand and foot tapping remained (17ms), t(13) = 4.82, p < 0.001. A 2 × 2 ANOVA comparing the results of Experiments 2 and 3 only revealed a signi1cant main effect of effector, F(1,13) = 19.57, p < 0.001. These results clearly indicate that visual monitoring of the 1nger and foot movements had no in2uence on the timing of the taps. In the deafferented subject the average asynchrony between click and tap disappeared completely, t(9) = 0.77, p > 0.20, and there was again no signi1cant difference in asynchrony between hand and foot tapping (hand tapping: −11 ms, foot tapping: + 4 ms), t(4) = 1.63, p > 0.10. The results of this experiment 1nally showed the expected pattern. If IW is allowed to monitor his movements visually and gets auditory feedback about the tap onset, he is able to tap in exact synchrony with the auditory pacing signal. This is exactly what was predicted from the representational models. In this situation the time to establish a central representation of the auditory pacing signal should be identical to establishing a central representation of the tap, which is, under these conditions, only represented by the auditory feedback signal. Although he probably has some central engrams of tapping, which could interact with an auditory feedback, he apparently did not rely on them when presented with external ‘veridical’ feedback.
11.5 General discussion In the present chapter we examined the in2uence of sensory feedback on the timing of simple repetitive movements in a sensorimotor synchronization task. To account for the usually observed lead of the tap (negative asynchrony), we proposed that it is the central representations of tap and click that are synchronized, which are derived from sensory feedback. We then reviewed studies, in which the sensory feedback available from the tap, as well as its temporal characteristics, was manipulated. Going to the limits of such manipulations, we presented a study with a completely deafferented subject. We asked this subject, IW, to perform three synchronization tasks, which differed in the amount of feedback available from the tapping movement (hand and foot tapping). His performance was compared with the results of an age-matched control group. In principle, inferring the role of sensory feedback by comparing IW and control subjects is problematical, because IW has spent more than 25 years developing compensatory strategies to control his movements. A more appropriate test of deafferentation would have been to observe a deafferented patient soon after the onset of his neuropathy and before the start of the rehabilitation. However, IW’s remaining de1cits do serve to highlight where sensory feedback remains necessary. Contrary to our expectations, no in2uence of the feedback manipulation was found in the control subjects. A summary 3 × 2 ANOVA comparing the results of the three experiments revealed neither a signi1cant effect of feedback nor of the feedback × effector interaction. Only the main effect of effector reached signi1cance, F(1, 13) = 27.77, p < 0.001, which replicates the standard 1nding that foot taps occur earlier than hand taps. Although additional auditory feedback has been found to reduce the amount of asynchrony in previous studies, this 1nding was not replicated here, perhaps because of large variability in the age-matched group. In addition, presence or absence of visual control of the tapping movement also had no effect on the asynchrony in control subjects. This result is crucial for the comparison with the results obtained in the deafferented subject, and it is in line with the 1ndings reported by Billon et al. (1996b). They compared conditions with and without
aapc11.fm Page 239 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
‘natural’ feedback (seeing and hearing the tapping 1nger) and also found no asynchrony difference in the usual tapping task with normal subjects. For the deafferented subject, a corresponding summary ANOVA indicated a highly signi1cant effect of feedback, F(2, 8) = 241.99, p < 0.001, whereas the main effect of effector and the interaction between the two factors were far from being signi1cant (p-values > 0.20). With increasing amount of feedback, the timing of IW’s taps became more precise and the negative asynchrony disappeared. This interesting pattern of results leaves us with at least three questions. First, how was the deafferented subject able to tap at all in the absence of any feedback (Experiment 1)? Second, why did he produce (rather large) negative asynchronies in that condition? Third, why did the asynchrony not disappear in the condition with auditory feedback (Experiment 2), as predicted by the representational models? This prediction was only con1rmed in Experiment 3, when the deafferented subject was allowed to visually monitor his movements. Thus, we need to discuss the role of visual control in more detail. Let us start with the last question and then proceed to the 1rst two questions, which are, from our theoretical point of view, closely related. According to the representational models, which emphasize differences in peripheral and central processing times between the external events (click and tap) and their corresponding central representations, the asynchrony between click and tap should disappear when the processing times are identical. In normal subjects, this condition is hard to create. A recent study by Müller and colleagues (2001) apparently succeeded by applying a tactile pacing signal to the left index 1nger, which resulted in a disappearance of the negative asynchrony.3 With the deafferented subject we had the chance to study a situation in which processing times between click and tap could really be matched by using comparable auditory signals for both the pacing stimuli and the feedback from the taps. But, as Experiment 2 indicated, the asynchrony did not disappear completely. This was only the case when IW was allowed to visually monitor his movements. This visual monitoring had basically no effect on the timing of the taps in the healthy control subjects. What caused this fundamental difference between a deafferented subject and healthy control subjects in the use of visual control? In deafferented subjects like IW there is a permanent requirement for visual monitoring of their movements. This has also been reported for other deafferented subjects (see, e.g. Cole and Paillard 1995; Sanes, Mauritz, Dalakas, and Evarts 1985). Even in situations in which visual control is possible, each movement requires mental control, that is, visual feedback can only be used with concentration and attentional effort. For example, IW is not able to maintain a precision grip while walking. Studies using the dual-task paradigm have shown that without proprioception signi1cant mental resources are necessary to monitor movements (Ingram et al. 2000; Lajoie et al. 1996). Of course, the situation is even worse if visual control is excluded as in our 1rst two experiments. The presentation of auditory feedback in our second experiment gave only discrete temporal information about the movement at a certain point in time—it just indicated when the 1nger (or toe) hit the key. In contrast, control subjects received continuous feedback about the timing and the spatial parameters of their 1nger and toe movements via proprioception. This information was missing in the deafferented subject. We would like to suggest that IW requires visual feedback to control the spatial component of movement; that is, that visual feedback replaces the missing proprioceptive information. When IW could not visually monitor his actions, he had to produce and time the motor command from memory without the possibility of correcting it on the basis of proprioception, which led to a decrease in performance. Interestingly, however, IW was able to produce stable performance even in the absence of any sensory feedback (Experiment 1). As already discussed, the high variability between trials and the
239
aapc11.fm Page 240 Wednesday, December 5, 2001 10:01 AM
240
Common mechanisms in perception and action
relatively low variability within trials suggested that for each single sequence of taps IW determined in advance the movement parameters for the taps and then reproduced these movements rather consistently (however, still with some variability due to variability in the production of the motor command). Such an interpretation is supported by the fact that before a new trial could be started he 1rst made some 1nger taps under visual control, then put his head behind the screen (to eliminate visual feedback); only then could the trial be started. This suggests that during the following sequence he reproduced the motor command from memory in a way he had just determined under visual control. Although this strategy is devoid of peripheral feedback because feedback was excluded, IW’s ability to produce a sequence of 50 taps without considerable drifts indicates the presence of an errorcorrection process. But what was this process based on if there was no feedback available? One possible interpretation is that the deafferented subject internally predicted the time of his tap and then compared the time of the click with the time of the predicted tap. Or, to be more precise, he internally simulated the central representation of the tap which is usually based on the sensory feedback and compared the timing of this internally generated tap representation with the timing of the click representation. The idea of internal generation of sensory consequences of actions is clearly in line with so-called forward models (e.g. Wolpert 1997; Wolpert, Ghahramani, and Jordan 1995). These models assume that ef1cient motor control requires the representation of different states of our cognitive system, such as the current, the desired, and the predicted states. Inverse modeling is used to derive the actions required to move from the current state to the desired state, while forward modeling is used to derive the predicted state if these actions were to be performed. The forward model predicts the sensory consequences of an action, that is, it allows for anticipating reafferences (sensory effects) on the basis of efference copy (motor out2ow). Discrepancies between the desired and the actual outcome of motor commands are then the basis for error correction. These forward models have been used, for example, to explain abnormalities in the perception and control of actions such as in schizophrenia (e.g. Frith 1992), but the concept can be easily applied to the control of actions in deafferented subjects. In our deafferented subject, the forward model would predict the sensory consequences of the tap which then can be used to control and time the action even under conditions in which no actual feedback is available. At this point, it is important to recall that IW has a long experience in controlling movements via visual feedback only and is very used to his condition, unlike a control subject or someone who has just lost feedback due to some accident. Thus, it might be that during rehabilitation the original representation of the proprioceptive feedback was replaced by visual feedback. As a consequence, IW developed a way to internally generate tap-related information in situations in which no feedback is available. The basic idea would then be that he times his motor command such that the internally simulated feedback from the tap coincides with the perceived click. Both the relevance of visual information and the internal generation of movement consequences are supported by brain-imaging data of the subject IW (Athwal, Cole, Wolpert, Frith, and Frackowiak 1999). During PET scan a sequential 1nger–thumb opposition task of one hand was applied. The 1nger-opposition movement and a resting state were both combined with two types of visual feedback (vision of moving 1nger on a video screen vs. vision of resting 1nger). IW showed a pattern of activation that differed from that obtained in control subjects. In the movement condition with veridical visual feedback contralateral [previsual] areas were activated more than in controls, indicating the requirement of visual control. Moreover, contralateral activation of the inferior parietal cortex in conditions with a mismatch between produced and observed state, and bilateral cere-
aapc11.fm Page 241 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
bellar activation in all experimental conditions are in line with a feedforward control of movements (see also, Blakemore, Wolpert, and Frith 1998). Such an interpretation can explain IW’s relatively stable performance within trials because the internally generated feedback might not be as variable as sensory feedback (since additional ‘noise’ may arise in peripheral processing). In addition, the interpretation offers an explanation for IW’s negative asynchrony. If the original representation of the proprioceptive feedback was replaced by visual feedback, which then is internally generated in situations in which no feedback is available, then the timing of this generated feedback should be comparable to the original feedback. However, at the same time the withdrawal of visual control created a kind of dual-task situation for IW (because visual monitoring of the spatial components of the movement was no longer possible), resulting in rather large asynchronies.
11.6 Conclusions In demonstrating the important contribution of sensory feedback to the timing of actions, we have gained further support for the general idea that actions are timed and controlled by their anticipated effects (e.g. their sensory consequences). This idea, according to which actions are more effective if they are planned in terms of their intended outcome rather than in terms of their proximal effects, has been proposed by several authors (e.g. Greenwald 1970; Hommel, Müsseler, Aschersleben, and Prinz, in press; James 1890; Prinz 1990, 1997). Empirical support for such an approach not only comes from our studies in sensorimotor synchronization but also from studies on bimanual coordination (Drewing and Aschersleben 2001; Drewing, Hennings, and Aschersleben in press), on compatibility effects, and on sequence learning (see, e.g. Elsner and Hommel 2001; Kunde 2001; the contributions by Hazeltine this volume, Chapter 33; Müsseler and Wühr, this volume, Chapter 25; and Ziessler and Nattkemper, this volume, Chapter 32).
Acknowledgments The authors express their gratitude to IW for his participation in the experiment and his great patience in this matter. We wish to thank Bruno Repp, an anonymous reviewer, and Bernhard Hommel for their helpful criticism, suggestions, and comments on an earlier draft. We also wish to thank Frank Miedreich for programming and Renate Tschakert for her support in data collection. This research was partially supported by a grant from the Deutsche Forschungsgemeinschaft to the 1rst author. Requests for reprints should be sent to Gisa Aschersleben, Max-Planck-Institut für Psychologische Forschung, Postfach 34 01 21, D-80098 München, Germany.
Notes 1. This idea is closely related to an assumption proposed by Rieser and Pick (this volume, Chapter 8). They propose that sensory information resulting from different sensory channels gets integrated and results in a unitary representation of space that is used for locomotion. 2. Even if only a small deviation from the target interval is introduced (e.g. the intertap interval is set at 599 ms instead of 600 ms), the error in the asynchrony would accumulate (in our example to a sum of 50 ms after 50 taps), resulting in a clear drift.
241
aapc11.fm Page 242 Wednesday, December 5, 2001 10:01 AM
242
Common mechanisms in perception and action
3. When the tactile pacing signal was applied to the big toe, no asynchrony in 1nger tapping was observed either, but this condition led to large intra- and interindividual variabilities, showing that the task was rather dif1cult for the subjects to perform. This 1nding as well as others revealed in experiments with manipulations on the stimulus side cannot easily be explained by the representational models. They suggest that other factors in addition to the feedback from the tap are involved in the origin of the negative asynchrony (for overviews, see Aschersleben, in press; O’Boyle 1997).
References Aschersleben, G. (1994). Afferente Informationen und die Synchronisation von Ereignissen. Frankfurt: Lang. Aschersleben, G. (2000). Knowledge of results and the timing of actions (Paper No. 1/2000). Munich: Max Planck Institute for Psychological Research. Aschersleben, G. (2001). Effects of training on the timing of simple repetitive movements. Manuscript submitted for publication. Aschersleben, G. (in press). Temporal control of movements in sensorimotor synchronization. Brain and Cognition. Aschersleben, G. and Prinz, W. (1995). Synchronizing actions with events: The role of sensory information. Perception and Psychophysics, 57, 305–317. Aschersleben, G. and Prinz, W. (1997). Delayed auditory feedback in synchronization. Journal of Motor Behavior, 29, 35–46. Aschersleben, G., Gehrke, J., and Prinz, W. (2001). Tapping with peripheral nerve block: A role for tactile feedback in the timing of movements. Experimental Brain Research, 136, 331–339. Aschersleben, G., Gehrke, J., and Prinz, W. (in press). A psychophysical approach to action timing. In C. Kaernbach, E. Schröger, and H. Müller (Eds.), Psychophysics beyond sensation: Laws and invariants in human cognition. Hillsdale, NJ: Erlbaum. Athwal, B., Cole, J.D., Wolpert, D.M., Frith, C., and Frackowiak, R.S.J. (1999). A PET study of motor control in a ‘deafferented’ subject. Journal of Physiology, 518P, 65P. Bard, C., Paillard, J., Teasdale, N., Fleury, M., and Lajoie, Y. (1991). Self-induced versus reactive triggering of synchronous hand and heel movement in young and old subjects. In J. Requin and G.E. Stelmach (Eds.), Tutorials in motor neuroscience, pp. 189–196. Amsterdam: Kluwer. Bard, C., Paillard, J., Lajoie, Y., Fleury, M., Teasdale, N., Forget, R., and Lamarre, Y. (1992). Role of the afferent information in the timing of motor commands: A comparative study with a deafferent patient. Neuropsychologia, 30, 201–206. Billon, M., Bard, C., Fleury, M., Blouin, J., and Teasdale, N. (1996a). Simultaneity of two effectors in synchronization with a periodic external signal. Human Movement Science, 15, 25–38. Billon, M., Semjen, A., Cole, J., and Gauthier, G. (1996b). The role of sensory information in the production of periodic 1nger-tapping sequences. Experimental Brain Research, 110, 117–130. Blakemore, S.J., Wolpert, D.M., and Frith, C. (1998). Central cancellation of self-produced tickle sensation. Nature Neuroscience, 1, 635–640. Cole, J. (1995). Pride and a daily marathon. Cambridge, MA: MIT Press. Cole, J. and Paillard, J. (1995). Living without touch and peripheral information about body position and movement: Studies with deafferented subjects. In J.L. Bermudez and A.J. Marcel (Eds.), The body and the self, pp. 245–266. Cambridge, MA: MIT Press. Cole, J.D. and Sedgwick, E.M. (1992). The perceptions of force and of movement in a man without large myelinated sensory afferents below the neck. Journal of Physiology, 449, 503–515. Cooke, J.D., Brown, S., Forget, R., and Lamarre, Y. (1985). Initial agonist burst duration changes with movement amplitude in a deafferented patient. Experimental Brain Research, 60, 184–187. Drewing, K. and Aschersleben, G. (2001). Reduced timing variability during bimanual coupling: A role for sensory information. Manuscript submitted for publication. Drewing, K., Hennings, M., and Aschersleben, G. (in press). The contribution of tactile reafferences to increased temporal regularity during simple bimanual 1nger tapping. Psychological Research. Dunlap, K. (1910). Reactions on rhythmic stimuli, with attempt to synchronize. Psychological Review, 17, 399–416. Elsner, B. and Hommel, B. (2001). Effect anticipation and action control. Journal of Experimental Psychology: Human Perception and Performance, 27, 229–240.
aapc11.fm Page 243 Wednesday, December 5, 2001 10:01 AM
Timing mechanisms in sensorimotor synchronization
Fraisse, P. (1980). Les synchronizations sensori-motrices aux rythmes. In J. Requin (Ed.), Anticipation et comportement, pp. 233–257. Paris: Centre National. Fraisse, P., Oléron, G., and Paillard, J. (1958). Sur les repères sensoriels qui permettent de contrôler les mouvements d’accompagnement de stimuli périodiques. [On the sensory reference points that allow for controlling movements accompanying periodic stimuli]. L’Année Psychologique, 58, 322–338. Frith, C.D. (1992). The cognitive neuropsychology of schizophrenia. Hove, UK: Erlbaum. Gehrke, J. (1995). Sensorimotor synchronization: The intensity of afferent feedback affects the timing of movements (Paper 15/1995). Munich: Max Planck Institute for Psychological Research. Gehrke, J. (1996). Afferente Informationsverarbeitung und die Synchronisation von Ereignissen. Dissertation at the Ludwig Maximilians University, Munich, Germany. Greenwald, A. (1970). Sensory feedback mechanisms in performance control: With special reference to the ideomotor mechanism. Psychological Review, 77, 73–99. Hazeltine, E. This volume, Chapter 33. Hommel, B., Müsseler, J., Aschersleben, G., and Prinz, W. (in press). The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences. Ingram, H.A., van Donkelaar, P., Cole, J., Vercher, J.L., Gauthier, G.M., and Miall, R.C. (2000). The role of proprioception and attention in a visuomotor adaptation task. Experimental Brain Research, 132, 114–126. James, W. (1890). The principles of psychology. New York: Macmillan. Johnson, W.S. (1898). Researches in practice and habit. Studies from the Yale Psychology Laboratory, 6, 51–105. Koch, R. (1999). Detection of asynchrony between click and tap (Paper 1/1999). Munich: Max Planck Institute for Psychological Research. Kolers, P.A. and Brewster, J.M. (1985). Rhythms and responses. Journal of Experimental Psychology: Human Perception and Performance, 11, 150–167. Kunde, W. (2001). Response-effect compatibility in manual choice reaction tasks. Journal of Experimental Psychology: Human Perception and Performance, 27, 387–394. Lajoie, Y., Teasdale, N., Cole, J.D., Burnett, M., Bard, C., Fleury, M., Forget, R., Paillard, J., and Lamarre, Y. (1996). Gait of a deafferented subject without large myelinated sensory 1bers below the neck. Neurology, 47, 109–115. Ludwig, C. (1992). Experiment zur Synchronisation akustischer Führungssignale. Seminararbeit, Ludwig Maximilians University, Munich, Germany. Mates, J. (1994). A model of synchronization of motor acts to a stimulus sequence. I. Timing and error corrections. Biological Cybernetics, 70, 463–473. Mates, J. and Aschersleben, G. (2000). Sensorimotor synchronization: The in2uence of temporally displaced auditory feedback. Acta Psychologica, 104, 29–44. Mates, J., Müller, U., Radil, T., and Pöppel, E. (1994). Temporal integration in sensorimotor synchronization. Journal of Cognitive Neuroscience, 6, 332–340. Mates, J., Radil, T., and Pöppel, E. (1992). Cooperative tapping: time control under different feedback conditions. Perception and Psychophysics, 52, 691–704. Miedreich, F. (2000). Zeitliche Steuerung von Handlungen. Empirischer Test des Wing–Kristofferson Modells. Aachen: Shaker Verlag. Miyake, I. (1902). Researches on rhythmic action. Studies from the Yale Psychology Laboratory, 10, 1–48. Müller, K., Aschersleben, G., Koch, R., Freund, H.-J., and Prinz, W. (1999). Action timing in an isochronous tapping task: Evidence from behavioral studies and neuroimaging. In G. Aschersleben, T. Bachmann, and J. Müsseler (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 233–250. Amsterdam: Elsevier. Müller, K., Aschersleben, G., Schmitz, F., Schnitzler, A., Freund, H.-J., and Prinz, W. (2001). Modality-speci1c central control units in sensorimotor synchronization. Manuscript submitted for publication. Müsseler, J. and Wühr, P. This volume, Chapter 25. O’Boyle, D.J. (1997). On the human neuropsychology of timing of simple, repetitive movements. In C.M. Bradshaw and E. Szabadi (Eds.), Time and behaviour. Psychological and neuro-behavioural analyses, pp. 459–515. Amsterdam: Elsevier. O’Boyle, D.J. and Clarke, V.L. (1996). On the source of the negative synchronization error during temporaltracking performance. Brain Research Association Abstracts, 13, 40. Paillard, J. (1949). Quelques données psychophysiologiques relatives au déclenchement de la commande motrice. [Some psychophysiological data relating to the triggering of motor commands]. L’Année Psychologique, 48, 28–47.
243
aapc11.fm Page 244 Wednesday, December 5, 2001 10:01 AM
244
Common mechanisms in perception and action
Peters, M. (1989). The relationship between variability of intertap intervals and interval duration. Psychological Research, 51, 38–42. Prinz, W. (1990). A common-coding approach to perception and action. In O. Neumann and W. Prinz (Eds.), Relationships between perception and action: Current approaches, pp. 167–201. Berlin: Springer-Verlag. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Repp, B.H. (2000). Compensation for subliminal timing perturbations in perceptual-motor synchronization. Psychological Research, 63, 106–128. Repp, B.H. (2001). Phase correction, phase resetting, and phase shifts after subliminal timing perturbations in sensorimotor synchronization. Journal of Experimental Psychology: Human Perception and Performance, 27, 600–621. Rieser, J. and Pick, H.L. This volume, Chapter 8. Sanes, J.N., Mauritz, K.-H., Dalakas, M.C., and Evarts, E.V. (1985). Motor control in humans with large-1ber sensory neuropathy. Human Neurobiology, 4, 101–114. Seashore, S.H. and Seashore, R.H. (1941). Individual differences in simple auditory reaction times of hands, feet and jaws. Journal of Experimental Psychology, 29, 342–345. Stenneken, P., Aschersleben, G., Cole, J., and Prinz, W. (in press). Self-induced versus reactive triggering of synchronous movements. A comparative study with a deafferented patient. Psychological Research. Thaut, M.H., Tian, B., and Azimi-Sadjadi, M.R. (1998). Rhythmic 1nger tapping to cosine-wave modulated metronome sequences: Evidence of subliminal entrainment. Human Movement Science, 17, 839–836. Vos, P.G., Mates, J., and van Kruysbergen, N.W. (1995). The perceptual centre of a stimulus as the cue for synchronization to a metronome. Quarterly Journal of Experimental Psychology, 48, 1024–1040. Wohlschläger, A. and Koch, R. (2000). Synchronization error: An error in time perception. In P. Desain and L. Windsor (Eds.), Rhythm perception and production, pp. 115–127. Lisse: Swets. Wolpert, D.M. (1997). Computational approaches to motor control. Trends in Cognitive Sciences, 1, 209–216. Wolpert, D.M., Ghahramani, Z., and Jordan, M.I. (1995). An internal model for sensorimotor integration. Science, 269(5232), 1880–1882. Zießler, M. and Nattkemper, D. This volume, Chapter 32.
aapc12.fm Page 245 Wednesday, December 5, 2001 10:02 AM
12 The embodiment of musical structure: effects of musical context on sensorimotor synchronization with complex timing patterns Bruno H. Repp Abstract. Two experiments demonstrate that musical context facilitates sensorimotor synchronization with complex timing patterns that are compatible with the musical structure. Several very different timing patterns were derived from an analysis of expressive performances of a musical excerpt. A random pattern (Exp. 1) or phase-shifted versions of the musical patterns (Exp. 2) served as comparisons, and an isochronous pattern served as practice. Musically trained participants 1rst attempted repeatedly to synchronize their 1nger taps with click sequences instantiating these timing patterns. Subsequent repetitions of the click sequences were accompanied by the identically timed music, and 1nally, the music disappeared and was only to be imagined in synchrony with the clicks. Compared with the random or phase-shifted patterns, synchronization accuracy for the musical patterns improved as soon as the music was introduced, especially when the pattern was highly typical. This relative improvement was reduced or absent when the music was merely imagined. Nevertheless, both musical context and imagery systematically modulated the timing of 1nger taps in synchronization with strictly isochronous click sequences. Thus perception or imagination of musical structure can involuntarily affect the timing of concurrent action, presumably by modulating the timekeeping processes that pace the motor behavior. This study also demonstrates that radically different timing patterns are compatible with the same musical structure, as they seem to be in expert artistic performance.
12.1 Introduction There is an intimate relationship between music and the human body (see, e.g. Clarke 1993a; Iyer 1998; Pierce and Pierce 1989; Repp 1993; Shove and Repp 1995). Music is produced by moving various extremities across musical instruments, or by engaging the mouth, lungs, and vocal tract. These moving parts of the body are attached to (or embedded in) the trunk which provides structural support and often participates by swaying along. In most cultures, listeners participate in music by dancing, clapping, tapping, or rocking in synchrony with its rhythm. Only in the Western tradition of serious art music, overt movement is proscribed for audiences in concert halls, but listeners still feel a readiness to move, or imagine themselves moving along with the music, or speak about being moved by the music. Thus there is a very close relation between music perception and action, particularly with regard to rhythm and timing. Human music performances are distinguished from machine renditions (unless they successfully simulate human performance) by the presence of many subtle features that originate in the musicians’ movements. Clynes (1983) has referred to these features as ‘expressive microstructure’ which conveys ‘living qualities’. One of these features is expressive timing. It consists in systematic deviations
aapc12.fm Page 246 Wednesday, December 5, 2001 10:02 AM
246
Common mechanisms in perception and action
from temporal regularity which signify to a listener that the music was not produced by a machine but by a thinking, feeling, and moving being. Expressive timing originates from three sources (cf. Penel and Drake 1998, 1999): (1) biomechanical constraints in technically dif1cult passages; (2) obligatory perceptual–motor patterns related primarily to rhythm and rhythmic grouping; and (3) intentional communication of structural or emotional aspects of the music. The present study is mainly concerned with the second of these factors, only incidentally with the third, and not at all with the 1rst. Recent research has produced considerable evidence that a particular musical structure is often associated with a particular expressive timing pattern. This most typical pattern corresponds to the average timing pattern of a large sample of human performances. It is representative of many individual performances (Repp 1998a) and is judged to be aesthetically pleasing (Repp 1997). When musicians are requested to play with perfectly regular timing (as speci1ed in a musical score) or in synchrony with a metronome (Repp 1999c), or when they try to create a perceptually regular performance on a computer by adjusting successive temporal intervals (Penel 2000), they nevertheless produce small but systematic timing variations whose pattern resembles that of the typical expressive timing pattern (Behne and Wetekam 1993; Drake and Palmer 1993; Palmer 1989; Penel and Drake 1998; Repp 1999a,c). A complementary pattern of perceptual biases is observed when listeners are asked to detect local deviations from perfect regularity in a musical passage (Repp 1992b, 1998b,c,d, 1999b,c). These 1ndings suggest that there is a level of subconscious and obligatory timing variation upon which larger intentional expressive variations are superimposed. The obligatory variations seem to be linked to the lowest level of rhythmic grouping in the music, whereas intentional expressive timing re2ects several hierarchical levels of grouping (Penel and Drake, 1998) as well as possibly other factors (meter, melodic contour, harmony, etc.). The similarity of the pattern of obligatory variations to the typical expressive timing pattern may be explained by the fact that they share the lowest level of grouping, which accounts for much of the timing variation (Penel 2000). However, intentional expressive timing does not always follow the most typical pattern. The timing patterns produced by experienced concert artists sometimes represent quite radical departures from the norm (Repp 1992a, 1998a). While such highly individual timing patterns may sound strange on 1rst hearing, the fact that they were produced by outstanding musicians indicates that they are not arbitrary or inappropriate to the musical structure. Nevertheless, it seems that these patterns are not strongly implied by the musical structure, if at all. It appears that creative performers must overcome a natural tendency to produce the most typical timing pattern (Repp 2000b). Penel and Drake (1998) have argued that typical timing is a form of motor compensation for perceptual timing distortions caused by rhythmic grouping. If so, then the typical timing pattern must always be present underlyingly, even if it is overridden by different intentions. Alternatively, the typical timing pattern may be regarded as a natural strategy for representing rhythmic groups in action, a strategy that in turn causes perceptual biases via a motor–perceptual interaction (Repp 1998d; Viviani and Stucchi 1992). Perhaps, then, the typical (obligatory) timing pattern is a consequence of carrying out grouped actions on a musical instrument. However, Repp (1999a,b,c) eliminated this factor by asking participants (including non-pianists and even non-musicians) to tap with the index 1nger in synchrony with piano music that was reproduced under computer control in a perfectly regular fashion. The tap-tone asynchronies and inter-tap intervals were still found to exhibit systematic deviations from regularity that tended to be positively correlated with the typical expressive timing pro1le. Thus, perception of musical structure exerted an in2uence even on a concomitant action pattern that had
aapc12.fm Page 247 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
no structure of its own. The correlation between the obtained timing pattern and the typical expressive timing pattern was relatively small; this may have been due in part to an additional process of automatic error correction in synchronization (Mates 1994; Pressing 1998; Repp 2000a; Vorberg and Wing 1996), which counteracted the emergence of the typical timing pattern. The tentative conclusion from these results, therefore, was that a musical structure tends to induce a tendency towards the typical timing pattern in concurrent motor activity. It may be predicted, then, that this tendency to move expressively should facilitate the synchronization of movements with music that exhibits the typical (intentional) expressive timing pattern, even though that pattern shows much larger deviations from regularity than the obligatory timing variations induced by the music, which are generally below the perceptual detection threshold. This prediction has been investigated previously by asking pianists to tap their index 1nger in synchrony with (1) one of their own previously recorded expressive performances, (2) a computer-generated version that exhibited the typical timing pattern (the average timing pattern of a large number of human performances), and (3) a sequence of clicks that instantiated the typical timing pattern, while participants imagined the music in synchrony with the clicks (Repp 1999a).1 The pianists were quite successful in all three tasks (though not as accurate as in tapping to an isochronous sequence). Moreover, their synchronization was as accurate with the clicks as with the music itself, which suggested that musical imagery could effectively substitute for the musical sound. However, one shortcoming of that study was that it included no other conditions that the pianists’ synchronization accuracy could be compared with. For example, it was not determined how well they could synchronize with the clicks without imagining the music, or with music having expressive timing patterns other than the most typical one, or with non-musical timing patterns of comparable average tempo and variability. Thus it was not clear whether synchronization with the most typical timing pattern in music was better than with other possible timing patterns, or indeed whether the relatively good synchronization performance had anything to do with music at all. It was the purpose of the present study to make these additional comparisons. Two similar experiments were conducted to address 1ve hypotheses or predictions. One hypothesis was that synchronization with music exhibiting a typical expressive timing pattern would be more accurate than synchronization with music exhibiting a less typical (but still structurally appropriate) timing pattern, because the former pattern is more strongly implied by the musical structure than the latter. To that end, several timing patterns of different typicality, derived from an extensive performance analysis (Repp 1998a), were used. Another hypothesis was that synchronization with even the less typical musical timing patterns would be more accurate than synchronization with an arbitrary or structurally inappropriate timing pattern, imposed on the same music. To test this prediction, synchronization with the musical patterns was compared to synchronization with a random pattern (Exp. 1) or with phase-shifted versions of the musical patterns (Exp. 2). A third hypothesis was that the differences just predicted would also emerge, though perhaps be smaller in magnitude, when the music was merely imagined in synchrony with a click sequence instantiating the timing patterns. (This click sequence also accompanied the music when music was present.) A fourth hypothesis was that timing patterns derived from expressive music performance might be easier to synchronize with than arbitrary timing patterns even in the absence of real or imagined music, simply because musical patterns are more regular. Moreover, musical timing patterns may differ from each other in their degree of regularity (i.e. periodicity or predictability), and hence in how dif1cult they are to learn and predict in repeated presentations.2 Therefore, synchronization accuracy was also assessed in a condition in which music was neither present nor imagined (i.e. where
247
aapc12.fm Page 248 Wednesday, December 5, 2001 10:02 AM
248
Common mechanisms in perception and action
the timing pattern was carried only by a click sequence). This condition provided a crucial baseline for interpreting the 1ndings in the music and imagery conditions, and it necessitates an important quali1cation of the 1rst three hypotheses. Speci1cally, their predictions are that synchronization with musical timing patterns should be selectively facilitated when music is present or imagined, compared with a condition in which music is neither present nor imagined. This selective facilitation should be largest for the most typical timing pattern and smaller for the less typical musical patterns. There should be no facilitation and possibly even interference for arbitrary or structurally inappropriate timing patterns. Viewed from an ANOVA perspective, the effects of primary interest in this study thus were interactions between condition and pattern type, not main effects. Finally, a 1fth hypothesis was that synchronization accuracy would improve as a function of repeated presentation of the same timing pattern, but more so for musical patterns than for structurally inappropriate patterns (and most clearly for the most typical pattern) when music is present or imagined. Thus, an interaction between pattern type and trial number was also predicted. To get used to the synchronization task and the three experimental conditions (clicks only, clicks plus music, clicks plus imagined music), participants 1rst tapped in time with an isochronous pattern. This made it possible to address another interesting issue in passing, as it were. As mentioned earlier, tapping in synchrony with isochronous music leads to systematic deviations from regularity in the timing of the taps (Repp 1999a,b,c). One question was whether that 1nding would be replicated when the music merely accompanies an isochronous click sequence that participants try to synchronize with. Even more interesting, however, was the question of whether similar systematic deviations from regularity would be evident when the music was merely imagined in synchrony with the isochronous click sequence. A previous attempt to determine this (Repp 1999a) led to unclear results, perhaps because the instructions had not suf1ciently emphasized musical imagery. If a signi1cant effect of musical imagery were found in this very simple synchronization task, this would constitute convincing evidence of the reality of musical imagery and provide further proof of a close connection between music perception and action.
12.2 Experiment 1 12.2.1 Methods 12.2.1.1 Materials The timing patterns were derived from an analysis of 115 expert performances of the opening (bars 1–5) of Frédéric Chopin’s Etude in E major, op. 10, No. 3 (Repp 1998a). A computer-generated score of this music is shown on top of Fig. 12.1. The second half of the original bar 5 was condensed into a chord to give maximal closure to the excerpt, as heard in the experiment. Below the musical score and vertically aligned with it, Fig. 12.1(a) shows the most typical expressive timing pattern (or timing pro1le) for this excerpt (T0). This is the average pro1le of the 115 performances whose timing was measured from digitized acoustic recordings. It is equivalent to the 1rst unrotated principal component obtained in a principal components analysis of the performance timing pro1les, a component which accounted for 61% of the variance. The graph depicts tone interonset intervals (IOIs) as a function of metrical (score) position, with 8 sixteenth-note subdivisions per bar. The initial upbeat IOI, corresponding to an eighth note in the score, has been excluded from all graphs and statistics; its average duration was 1122 ms. All other IOIs represent nominal sixteenth-
aapc12.fm Page 249 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
Fig. 12.1 (top) A computer-generated score of the opening of Etude in E major, op. 10, No. 3, by Frédéric Chopin. (a) The most typical expressive timing pro1le (T0) for this music. (b), (c), (d) Mutually uncorrelated timing pro1les (T1, T2, T4) representing principal components of the timing patterns observed in expert performances. (e) An arbitrary timing pattern (R1) obtained by randomizing the inter-onset intervals (IOIs) of T1. Solid circles indicate IOIs initiated by melody notes, open circles those initiated by accompaniment notes only.
note intervals. IOIs initiated by melody tones (among other tones) are shown as 1lled circles, those initiated only by accompaniment tones as open circles. The melody, in the highest voice, is divided into six rhythmic groups (runs of 1lled circles in the graph), each ending with a sustained tone during which the accompaniment in the other voices continues. It can be seen that the T0 pattern
249
aapc12.fm Page 250 Wednesday, December 5, 2001 10:02 AM
250
Common mechanisms in perception and action
includes ritardandi (1nal slowing) within each of the melodic segments, as well as a lengthening of the 1nal IOI in bar 3 (which is the initial IOI of the longest melodic group) and sometimes of the 1nal IOI of an accompaniment passage immediately preceding a melodic group (the initial IOIs in bars 2, 3, and 5). The T0 pattern was not used in Experiment 1 because of a concern that its correlation with the other patterns, especially T1, might lead to carry-over effects of pattern learning. However, it was used in Experiment 2. Three additional musical timing pro1les (T1, T2, T4) were used in Experiment 1 and are shown in Fig. 12.1(b), (c), and (d). They represent the 1rst, second, and fourth Varimax-rotated principal components of the timing patterns of the 115 expert performances (Repp, 1998a) and respectively accounted for 31%, 17%, and 11% of the variance.3 Thus, T1 was more typical of expert performance than were T2 or T4, and this was also re2ected in their respective correlations with T0 (see Table 12.1), which may serve as indices of typicality. Being principal components, these three pro1les were mutually uncorrelated. Originally vectors of standard scores, they were converted into IOIs by multiplying them with the average within-performance standard deviation (80 ms) and adding them to the grand average IOI duration of the 115 performances (533 ms). Thus they all had the same basic tempo and degree of timing modulation. A fourth pattern, R1, was generated by randomly scrambling the IOI durations of the T1 pattern (Fig. 12.1(e)). As can be seen in Table 12.1, the typicality of R1 was even lower than that of T4. The R1 pro1le correlated with the three musical pro1les 0.21, −0.18, and −0.04, respectively (all n.s.). The duration of the initial upbeat IOI (not shown) was 1000 ms in all four patterns used in Experiment 1. The four timing patterns also differed in complexity or regularity. For example, T1 is characterized by strong ritardandi within all melodic groups, but it lacks the other timing features seen in T0, and this results in a very clear periodicity. By contrast, T2 shows a striking accelerando in the melodic group of bar 2 and to a lesser degree also in bars 1 and 5, but not at all in bars 3 and 4, which makes this pattern more complex than T1. T4 shows pronounced between-group ritardandi that exceed the within-group ritardandi, as well as a lengthening of the 1nal IOI in bar 3. It seems to be of intermediate complexity. The random pattern, of course, is the most complex pattern. To quantify these intuitions, an index of the degree of pattern periodicity was computed in the form of the lag-8 autocorrelation (ac8), which assesses the average similarity of timing from one bar to the next. A measure of relative pattern complexity was then obtained by subtracting ac8 from 1. These complexity indices are shown in Table 12.1. Furthermore, Table 12.1 includes the lag-1 autocorrelations (ac1) of the four patterns, which will be referred to later. In addition to the four timing patterns, an isochronous sequence with constant IOIs of 500 ms (except for an initial 1000-ms IOI) was presented. Each of these 1ve timing patterns was imposed
Table 12.1 Typicality indices (i.e. correlations with T0), complexity indices (i.e. 1 − ac8; see text for explanation), and lag-1 autocorrelations (ac1) for the four timing patterns used in Exp. 1 Pattern
Typicality
Complexity
T1 T2 T4 R1
0.67 0.46 0.36 0.20
0.31 0.73 0.58 0.90
ac1 0.29 0.59 0.15 – 0.25
aapc12.fm Page 251 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
on a series of what will informally be called ‘clicks’. Each click was in fact a high-pitched tone (C8, MIDI pitch 108, fundamental frequency 4,168 Hz) produced on a Roland RD-250 s digital piano, with a nominal duration of 20 ms. The tones had sharp onsets followed by a rapid decay and a longer soft ringing. Each click sequence comprised 38 identical sounds. When the click sequence was accompanied by the music, the music had exactly the same timing pattern in terms of its top-line tones (the highest tones in all sixteenth-note positions). The clicks coincided (within 1 ms) with the onsets of these top-line tones and were clearly audible above the music. The precise methods for synthesizing the music performances are described in Repp (2000b).
12.2.1.2 Participants Twelve undergraduate students from Yale University were paid to participate. All had advanced musical training, which was a necessary requirement in a study of expressive timing and musical imagery. Three of them were pianists, and the others, several of whom also played the piano, were players of string instruments in the Yale Symphony Orchestra. 12.2.1.3 Procedure Participants were 1rst instructed in the use of the response device, a Fatar Studio 37 MIDI controller (a silent three-octave piano keyboard). They were instructed to hold the controller on their lap, to keep their index 1nger in contact with a self-chosen white key, to release the key fully before pressing it again, to start tapping with the second click in each sequence, to stay in synchrony with the clicks at all times, and not to count the clicks. The response key moved about 10 mm from its resting position to the (cushioned) bottom position, but the electronic contact occurred before the lowest position was reached, which added a small negative constant to the tap-tone asynchronies. The response key did not make any audible sound unless it was struck very hard, so that participants generally had to gauge their synchronization errors cross-modally. The keypresses were registered by a MAX patch running on a Macintosh Quadra 660AV computer, which also controlled the playback of the sequences on a Roland RD-250s digital piano (‘Piano 1’ sound).4 Participants sat in front of the computer monitor, which displayed the trial number, and listened binaurally over Sennheiser HD540 II earphones. The three conditions (clicks only, clicks with accompanying music, and clicks with imagined music—referred to in the following as ‘clicks’, ‘music’, and ‘imagery’) were presented in the same order to all participants, constituting three successive parts of the experimental session. Within each condition, all timing patterns were presented in the same order. Each condition started with the isochronous sequence, but the order of the other four sequences was varied across participants, according to three different 4 × 4 Latin squares. Each timing pattern was presented 10 times in succession, without any preceding practice trials. The participants’ task was to tap in synchrony with each pattern to the best of their ability, and to try to predict the pattern with their taps from the second trial on. In the music condition, the instruction was to tap in synchrony with the clicks and not to pay any special attention to the music. In the imagery task, participants were told to imagine the music in synchrony with the clicks and to be sure not to make an extra tap at the end, since this would indicate that they had not imagined the music correctly. A copy of the musical score (Fig. 12.1, top) was in view throughout the music and imagery conditions, propped up below the computer monitor. There were 3 seconds of silence between trials, short breaks between timing patterns, and longer breaks between conditions.
251
aapc12.fm Page 252 Wednesday, December 5, 2001 10:02 AM
252
Common mechanisms in perception and action
12.2.1.4 Analysis Three different measures of synchronization accuracy were used. One was the standard deviation of the asynchronies (sda). This measure was useful because all sequences used had the same average tempo (i.e. mean IOI duration) and the same average timing modulation (i.e. standard deviation of IOIs). If participants were able to perfectly predict a timing pattern with their taps, then the standard deviation of the asynchronies would be equal to that found with an isochronous sequence. Of course, in view of the complexity of the patterns, prediction was not expected to be perfect in any condition. The other two measures of synchronization accuracy were correlational. One was the lag-0 cross-correlation (r0) between the inter-tap intervals (ITIs) and the click IOIs. If the taps predict the sequence timing pattern accurately, then r0 will be high. The other measure was in a way the converse of r0. Michon (1967) 1rst demonstrated that attempts to synchronize with an auditory sequence whose temporal intervals vary unpredictably result in ITIs that echo the sequence IOIs at a lag of one. (See also Hary and Moore 1985, 1987; Schulze 1992.) This temporal tracking behavior seems to be the consequence of an automatic error-correction process that tries (unsuccessfully) to minimize the synchronization error. It results in a high lag-1 cross-correlation (r1) between ITIs and IOIs, which thus is a measure of the participant’s inability to predict the temporal pattern. Thaut, Tian, and Azimi-Sadjadi (1998) found that tracking occurred even with sequences that were modulated in a regular, periodic fashion, but this may have been due to the small size of the modulations. When larger modulations of a regular, meaningful, or familiar nature are imposed on a stimulus sequence, the participant’s taps will tend to predict the sequence timing, which reduces r1 and increases r0 (Michon, 1967). However, it is problematic to rely on the raw values of r0 and r1. Each of these correlations has a theoretical lower limit that depends on the temporal structure of the sequence. In fact, it seems that both correlations have the same lower limit, namely the lag-1 autocorrelation (ac1) of the sequence timing pattern: when prediction (r0) is optimal, r1 will approach ac1 because the sequence of ITIs is similar to the sequence of IOIs. When tracking (r1) is maximal, r0 will approach ac1 because the ITIs echo the IOIs at a lag of 1. Therefore, a correction was applied to both r0 and r1, in order to take into account the fact that different timing patterns have different ac1 values (see Table 12.1). The prediction index (r0*) thus was computed as (r0 − ac1)/(1 − ac1), and the tracking index (r1*) was computed as (r1 − ac1)/(1 − ac1). Both indices had a theoretical range from near zero to 1. 12.2.2 Results and discussion Because of space restrictions, only the results for one of the three indices of synchronization accuracy, the prediction index (r0*), will be reported in detail. In general, the results for the sda index were similar, whereas those for the tracking index (r1*) were less clear, suggesting that, despite a strong negative relationship with r0*, r1* captures somewhat different aspects of synchronization behavior. The results for r0* are shown as a function of trial number in Fig. 12.2. Rather than comparing the results among all three experimental conditions at once, three separate repeated-measures ANOVAs were conducted, each of which compared two conditions. The 1xed variables in each ANOVA were condition (2), pattern (4), and trial (10). Within each ANOVA, separate comparisons were carried out between each musical pattern and the R1 pattern, which served as the baseline. Additional twoway ANOVAs were conducted on each individual condition. The main effect of pattern was highly signi1cant in all three two-way ANOVAs, F(3, 33) > 7.8, p < 0.0005. Overall, performance tended to be best for the T1 pattern, followed by T4, R1, and T2. It
aapc12.fm Page 253 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
Fig. 12.2 Average prediction indices (r0*) as a function of trial number for four timing patterns in the three conditions of Experiment 1. (a) Click condition. (b) Music condition. (c) Imagery condition.
was surprising that T2 yielded poorer performance than R1, but T2 happened to be the musical pattern with the highest ac1 coef1cient (Table 12.1), so that its r0 coef1cient was most affected by the correction that turned it into r0*. It is possible that this correction was too extreme, as it did not take into account automatic error correction in tracking. The better performance with T1 and T4 is consistent with the lower complexity of these patterns. The main effect of trial was also highly signi1cant in all analyses, F(9, 99) > 8.7, p < 0.0001, due to gradual improvement within conditions. The Pattern × Trial interaction reached signi1cance in the music condition, F(27, 297) = 1.6, p < 0.03, and in the imagery condition, F(27, 297) = 1.9, p < 0.008, but not in the click condition, F(27, 297) = 0.6. These interactions are dif1cult to interpret,
253
aapc12.fm Page 254 Wednesday, December 5, 2001 10:02 AM
254
Common mechanisms in perception and action
Fig. 12.3 Prediction indices (r0*), averaged across trials within conditions, as a function of condition in Experiment 1. (a) Click and music conditions. (b) Click and imagery conditions. however. More rapid improvement for the more typical patterns, as hypothesized in the Introduction, was not evident. Rather, all patterns seemed to improve at about the same rate. The Condition × Trial interaction was signi1cant when comparing the music and imagery conditions, F(9, 99) = 3.0, p < 0.004, and also for clicks vs. music, F(9, 99) = 2.2, p < 0.04, but not for clicks vs. imagery, F(9, 99) = 1.7, p < 0.10. These interactions were due to somewhat greater improvement for all patterns within the music condition than within the other two conditions. The main effect of condition was highly signi1cant in the two ANOVAs involving the click condition, F(1, 11) > 43, p < 0.0001, but not in the music vs. imagery comparison. Performance in these latter two conditions did not differ, but was substantially better than in the click condition. This suggests that pattern prediction was improved by both the presence and imagery of music, but the improvement could also have been due to general pattern learning, as observed within conditions. Therefore, the Pattern × Condition interaction was the crucial statistic. That interaction was signi1cant in all three ANOVAs: clicks vs. music, F(3, 33) = 3.2, p < 0.04; clicks vs. imagery, F(3, 33) = 4.1, p < 0.02; and music vs. imagery, F(3, 33) = 4.9, p < 0.007. The corresponding data, averaged over trials, are shown in Fig. 12.3, which focuses on the two interactions of primary interest. Individual comparisons of each musical pattern with R1 in the clicks vs. music analysis (Fig. 12.3(a)) con1rmed that the presence of music selectively improved prediction performance for T1, F(1, 11) = 6.6, p < 0.03, and for T4, F(1, 11) = 12.3, p < 0.005, but not signi1cantly for T2, F(1, 11) = 3.6, p < 0.09. An interesting aspect of these data is that the selective advantage for the musical patterns was already present in the 1rst trial of the music condition (see Fig. 12.2). In the comparison of the click and imagery conditions (Fig. 12.3(b)), a selective facilitation relative to R1 was evident only for T4, F(1, 11) = 7.5, p < 0.02. The signi1cant Pattern × Condition interaction in the music vs. imagery ANOVA was mainy due to T2, for which prediction performance was worse in the imagery than in the music condition.
12.2.2.1 Average timing pro1les Figure 12.4 shows the results for the isochronous sequence, which is represented by the horizontal dotted line (IOI = 500 ms). The ITIs are shown as data points with double standard errors (roughly, 95% con1dence intervals). In the click condition (Fig. 12.4(a)), the ITIs closely matched the
aapc12.fm Page 255 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
sequence IOIs from the fourth IOI on. The initial three ITIs re2ect a ‘tuning in’ to the sequence (see also Fraisse 1966; Repp 1999b; Semjen, Vorberg, and Schulze 1998): Despite the constant sequence tempo from trial to trial, the 1rst tap tended to occur too late, so that the following ITIs had to be shortened to achieve synchrony; however, there were also substantial individual differences in that respect, as re2ected in the large standard errors. The pattern of tap timing from the fourth ITI on did not show any signi1cant deviation from uniformity in a one-way ANOVA with the independent variable of position (33), F(32, 352) = 1.1. By contrast, the tap timing pro1le in the music condition (Fig. 12.4(b)) did show signi1cant variation from the fourth ITI on, F(32, 352) = 14.7, p < 0.0001, and also showed a different pattern of the initial three ITIs. Moreover, the pattern of systematic deviations from regularity was quite similar to that obtained in a previous study (Repp 1999b: Exp. 3) with the
Fig. 12.4 Average inter-tap interval (ITI) pro1les (with double standard errors) in the three conditions of Experiment 1. (a) Click condition. (b) Music condition. (c) Imagery condition.
255
aapc12.fm Page 256 Wednesday, December 5, 2001 10:02 AM
256
Common mechanisms in perception and action
same musical excerpt, but without superimposed clicks: the correlation was 0.76 (p < 0.001), or 0.86 if the initial three ITIs are included. Thus the earlier results for synchronization with music alone were replicated, even though the present task required synchronization with clicks that were merely accompanied by music. It appears that the effect of the musical structure on tap timing is unavoidable (see also Repp 1998b: Exp. 3). The most interesting and novel 1nding, however, is that this systematic tap timing pattern persisted in attenuated form in the imagery condition (Fig. 12.4(c)). Here there was again a signi1cant deviation from uniformity from the fourth ITI on, F(32, 352) = 9.2, p < 0.0001, and the pattern correlated 0.84 with that in the music condition (Fig. 12.4(b)), or 0.91 if the initial three ITIs are included. Thus musical imagery had a signi1cant effect on motor timing in synchronization with a perfectly isochronous click sequence.5
12.2.2.2 Summary In terms of the 1ve hypotheses outlined in the Introduction, the results may be summarized as follows. The 1rst hypothesis was that more typical musical timing patterns would be synchronized with more accurately than less typical timing patterns when music is actually present. The predicted rank order of T1 > T2 > T4 was only partially con1rmed, due to an unexpectedly (perhaps artifactually) low prediction index for T2. The second hypothesis was that all three musical patterns would be synchronized with more accurately than the R1 pattern when music was present. This was true for T1 and T4 but not for T2, for the same reason as before. The third hypothesis was that the 1rst two predictions would also hold in the imagery condition, though perhaps less clearly. Indeed, the results in the imagery condition were similar to those in the music condition, only less pronounced. The fourth hypothesis was that there would be signi1cant differences among the patterns already in the click condition, due to differences in pattern complexity. Signi1cant differences were indeed obtained, but they did not reflect differences in pattern complexity in a straightforward way. Consideration of these differences led to quali1ed predictions with respect to the 1rst three hypotheses. One prediction was that synchronization with typical musical patterns should be selectively facilitated compared to less typical patterns in the music and imagery conditions. This prediction received little support. The second and most important prediction was that, in comparing the click and music conditions, synchronization with musical patterns should be selectively facilitated compared with the random pattern in the music condition. This prediction received substantial support. The third hypothesis, that the same would be true in the comparison of the click and imagery conditions, received only weak support. Finally, the 1fth hypothesis, that pattern learning would be faster for musical than for random patterns when music is present or imagined, was not supported. Instead, it appeared that music facilitated the learning of all patterns to some extent.
12.3 Experiment 2 Experiment 1 provided reasonable evidence that synchronization with complex timing patterns derived from music performance is facilitated when the appropriate music is heard or imagined, relative to a condition in which the music is neither heard nor imagined. The results were not as strong as expected, however, and this may be attributed to a methodological weakness having to do with the R1 pattern. In hindsight, it was not a good idea to employ only a single random pattern for comparison; it would have been better to use a different random pattern for each participant. Accidentally, R1 had some features in common with T1, namely long IOIs at the ends of several melodic groups
aapc12.fm Page 257 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
(see Fig. 12.1). Thus, this pattern was not as inappropriate to the music as it could have been and may actually have received some slight facilitation from the musical context. Experiment 2 took a different approach. Instead of constructing arbitrary timing patterns for comparison with the musical patterns, a phase-shifted version of each musical timing pattern (a method employed previously by Clarke 1993b, in an imitation study) was constructed to serve as its speci1c comparison. Without a musical context, the phase shift had little signi1cance, but once the music was present or imagined, the original patterns were properly aligned with the musical structure whereas the phase-shifted patterns were not. Thus the prediction was that synchronization with each musical pattern would be selectively facilitated relative to its phase-shifted version in both the music and imagery conditions, but not in the click condition. Indeed, it was considered possible that musical context would even impair synchronization with phase-shifted patterns, relative to the click condition. Experiment 1 provided only limited support for the hypothesis that the degree of facilitation of synchronization with musical patterns in musical contexts would be positively related to the typicality of these patterns in music performance, in the form of an advantage of T1 over T2 and T4. However, the experiment did not include the most typical musical timing pattern, T0, for which the greatest amount of facilitation should be expected. Experiment 2 included this pattern as well, at the risk of some carry-over of learning between it and the fairly similar T1 pattern (see Fig. 12.1). Another methodological change concerned the arrangement of the three experimental conditions. In Experiment 1, all timing patterns were presented in one condition before being presented in the next one. The main advantage of this design was that participants did not hear the music until after the click condition. A possible disadvantage was the temporal separation of the music and imagery conditions, which may have weakened the strength of the musical imagery. In Experiment 2, the design was blocked by timing pattern instead. For each timing pattern, an unbroken series of trials was presented, in the course of which the three conditions followed each other in the same 1xed order as previously. This design had the advantage of revealing the transitions between the three conditions more clearly, but the disadvantage that participants might feel tempted to imagine the music during the click condition, despite instructions that discouraged this strategy. The new design was motivated by the intriguing observation in Experiment 1 that the selective advantage for the musical patterns seemed to be present on the very 1rst trial in the music condition. In Experiment 2, the immediacy of such contextual effects could be observed more directly, without any intervening breaks.
12.3.1 Methods 12.3.1.1 Materials The materials were the same as in Experiment 1, except for the following differences. The R1 pattern was no longer employed. Instead, there were four musical patterns (T0, T1, T2, T4) and a phase-shifted version of each (T0′, T1′, T2′, T4′). The phase-shifted patterns were obtained by moving the 1rst two IOIs (following the initial 1000-ms ‘upbeat’ IOI) to the end of the pattern. Thus the phase shift amounted to one-eighth note, or −90 degrees relative to the metrical cycle de1ned by the musical bars.6 Table 12.2 shows that the phase-shifted versions were all atypical of expressive performance, with one (T4′) actually contradicting the most typical pattern. However, the complexity and ac1 indices were only slightly affected by the IOI manipulation. When the music accompanied
257
aapc12.fm Page 258 Wednesday, December 5, 2001 10:02 AM
258
Common mechanisms in perception and action
Table 12.2 Typicality indices (i.e. correlations with T0), complexity indices (i.e. 1 − ac8; see text for explanation), and lag-1 autocorrelations (ac1) for the eight timing patterns used in Exp. 2 Pattern
Typicality
Complexity
ac1
T0 T0′ T1 T1′ T2 T2′ T4 T4′
1.00 –0.17 0.67 –0.03 0.46 –0.15 0.36 –0.54
0.53 0.31 0.31 0.14 0.73 0.62 0.58 0.60
0.19 0.29 0.29 0.35 0.59 0.59 0.15 0.15
the clicks, it started and stopped with the click sequence and followed the same timing pattern. An isochronous pattern was also included, mainly for practice but also to replicate the intriguing effect of musical imagery on tap timing (Fig. 12.4(c)).
12.3.1.2 Participants Twelve musically trained Yale undergraduates were paid for their participation. Nine of them were players of string instruments in the Yale Symphony Orchestra. Five of them had participated in Experiment 1, but since one year had elapsed between experiments, no carry-over of learning was expected. The remaining three participants had less advanced musical training but instead had considerable practice in synchronization tasks. 12.3.1.3 Procedure Each temporal pattern was repeated 20 times, with 3 s of silence between repetitions. Trials 1–8 constituted the click condition; the 1rst two of these trials were considered practice and were not analyzed. Trials 9–14 constituted the music condition, and trials 15–20 the imagery condition. Participants were urged not to imagine the music during the initial 8 trials; otherwise, the instructions were the same as in Experiment 1. The isochronous pattern was always presented 1rst, and the remaining 8 patterns were presented in an order that was counterbalanced across participants according to 1.5 Latin squares, constructed so that original and phase-shifted patterns alternated and the other three patterns intervened between the original and phase-shifted versions of the same pattern. The musical score was in view throughout the experiment. 12.3.2 Results and discussion The data were again analyzed in terms of the three indices of synchronization accuracy, but only the results for r0* are reported here. The results in terms of sda were similar, whereas r1* again yielded a less clear picture. The ANOVAs were largely analogous to those in Experiment 1 but included the variable of version (2) in addition to pattern, condition, and trial. Figure 12.5 shows that the difference between the original and phase-shifted versions of each timing pattern increased substantially in favor of the original version in the music condition relative to the click condition, and decreased again in the imagery condition. The Condition × Version
aapc12.fm Page 259 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
Fig. 12.5 Average prediction indices (r0*) as a function of trial number for eight timing patterns in Experiment 2. (a) T0 and T0′. (b) T1 and T1′. (c) T2 and T2′. (d) T4 and T4′.
interaction was highly signi1cant when comparing the click and music conditions, F(1, 11) = 24.0, p < 0.0006, and also when comparing the music and imagery condition, F(1, 11) = 36.6, p < 0.0002, but not when comparing the click and imagery conditions, F(1, 11) = 1.5, p < 0.25. The triple interaction with pattern was not signi1cant, indicating that the Condition × Version interaction was similar for all four patterns. The main effect of version in favor of the original patterns was signi1cant not only in the comparisons involving the music condition but also in the comparison of the click and imagery conditions, F(1, 11) = 5.0, p < 0.05; however, it did not change signi1cantly between these two conditions, nor did it interact signi1cantly with pattern in any condition. Prediction performance increased across trials in the music condition, F(5, 55) = 7.9, p < 0.0001, but not in the click and imagery conditions, where the main effect of trials was nonsigni1cant. This was also re2ected in a signi1cant Condition × Trials interaction for clicks vs. music, F(5, 55) = 2.9, p < 0.03, and for music vs. imagery, F(5, 55) = 5.8, p < 0.0003, but not for clicks vs. imagery. In the music condition, there was also a Pattern × Trials interaction, F(15, 165) = 2.4, p < 0.004. The largest improvement over trials was shown by T0/ T0′ and the smallest by T2 /T2′. Note that original and phase-shifted versions improved at the same rate; the Version × Trials interaction was nonsigni1cant.
259
aapc12.fm Page 260 Wednesday, December 5, 2001 10:02 AM
260
Common mechanisms in perception and action
12.3.2.1 Average timing pro1les The average ITI pro1les for the isochronous sequence were extremely similar to those of Experiment 1 (Fig. 12.4) and therefore are not shown separately. Apart from the initial three ITIs (which were omitted in the statistical analyses), there was no signi1cant deviation from uniformity in the click condition, F(32, 352) = 1.2, p < 0.23, whereas there were highly signi1cant deviations in both the music condition, F(32, 352) = 9.9, p < 0.0001, and the imagery condition, F(32, 352) = 4.8, p < 0.0001. The pattern of the deviations was highly similar in these two conditions, r(31) = 0.81, p < 0.0001, although it was less pronounced in the imagery condition, and the ITI pro1les also correlated highly with those obtained in Experiment 1, r(31) = 0.93 and 0.80, p < 0.0001, for the music and imagery conditions, respectively. These results indicate that participants were imagining the music correctly in the imagery condition, even though they had heard it only 6 times at that point in the experiment. One curious result worth noting is that the (nonsigni1cant) pattern of deviations from regularity in the click condition exhibited some resemblance to the (signi1cant) patterns obtained in the music and imagery conditions, r(31) = 0.52 and 0.50, p < 0.01, respectively. This had not been the case in Experiment 1. In neither experiment had the participants yet heard the music. However, the participants in Experiment 2 knew which music they would be hearing subsequently and had the musical score in front of them. Thus it is possible that some participants imagined the music spontaneously from the notation (cf. Brodsky, Henik, Rubinstein, and Zorman 1998), especially since emphatic instructions not to imagine the music during the click condition were given only after the isochronous practice sequence.7 12.3.2.2 Summary The results of Experiment 2 largely con1rm those of Experiment 1. The 1rst hypothesis, predicting that more typical original timing patterns would be synchronized with more accurately than less typical original timing patterns when music was actually present, received some support in that performance for T0 and T1 was better than for T2 and T4. The second hypothesis, that all original patterns would be synchronized with more accurately than their phase-shifted versions when music was present, was strongly supported. The third hypothesis, that the 1rst two predictions would also hold in the imagery condition, was supported in the case of the second prediction only. The fourth hypothesis, that there would be signi1cant differences among the patterns in the click condition, was supported, but not for the reason originally envisioned: differences in pattern complexity did not seem to play an important role. The quali1ed predictions of the 1rst two hypotheses, which take differences among patterns in the click condition into account, were strongly con1rmed in that more typical patterns bene1ted more from musical context than less typical (original) patterns, and especially in that original patterns bene1ted more than phase-shifted patterns. However, the quali1ed third hypothesis, concerning imagery, was not supported by the results of Experiment 2, which did not demonstrate a selective bene1t of musical imagery for musically appropriate patterns. The 1fth hypothesis, that pattern learning would be faster for original than for phase-shifted patterns when music is present or imagined, was not supported. Instead, in agreement with Experiment 1, the results suggested that audible music facilitates the learning of timing patterns regardless of their appropriateness.
12.4 General discussion The present study investigated the ability of musically trained participants to synchronize a simple motor activity with complex timing patterns derived from expressive timing in music performance.
aapc12.fm Page 261 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
These patterns were of a kind not previously investigated in pattern learning or synchronization tasks: they were neither isochronous nor rhythmic nor random (except in one case), but are best described as semi-regular or quasi-periodic in various degrees. Their regularities derived from their original association with a musical structure. In the click condition, especially in Experiment 1 (where the participants were unaware that music would be introduced later) but also in Experiment 2 (to the extent that the participants followed instructions to refrain from musical imagery), the question of interest was whether the regularities inherent in the timing patterns would help participants to learn and predict the timing variations to some extent. The participants’ success in this task was limited, which is not surprising in view of the small number of trials (10 in Exp. 1, 8 in Exp. 2). Their synchronization performance was characterized primarily by tracking, which, as Michon (1967) and others have shown, is the characteristic response to unpredictable temporal patterns in a synchronization task. Only in Experiment 1 was there evidence for improvement across trials within the click condition. However, this improvement did not differ among timing patterns, which suggests a general learning effect that was independent of pattern structure. Nevertheless, there were signi1cant differences among patterns from the very beginning in the click condition. For example, the T1 and T4 patterns exhibited larger r0* indices than T2 in Experiment 1 (Fig. 12.2(a)), and T4 was more predictable than all other patterns in Experiment 2 (Fig. 12.5). The reasons for these differences are not well understood at present. Differences in pattern complexity, de1ned here as the degree of periodicity, did not seem to be the only cause. The music condition was the primary focus of interest here. The main hypothesis was that complex timing patterns derived from expressive music performance, which are quite meaningless when carried by a click sequence, would suddenly gain meaning and structural support when they are appropriately instantiated by the accompanying music, and that this would automatically facilitate pattern prediction in synchronization. Synchronization with random or phase-shifted timing patterns, by contrast, was not expected to bene1t from the musical context. These predictions received strong con1rmation in both experiments. In Experiment 2, all four original patterns were shown to bene1t much more from the musical context than their phase-shifted versions. In fact, three of the phaseshifted patterns seemed to suffer interference from the music, at least on the 1rst music trial (see Fig. 12.5). These effects evidently derive from the relative compatibility of the timing patterns with the musical structure, particularly with the rhythmic grouping in the melody (cf. Clarke 1985, 1993b). Auditory perception of musical structure primes certain action patterns that are expressive of that structure, and timing is the most important characteristic of these action patterns. Shin and Ivry (1999), in a recent study, proposed a similar explanation for their 1nding that incidental learning of arbitrary temporal patterns occurred only when these patterns were systematically associated with a constant action pattern, in their case spatial hand movements in response to visual stimuli. (One of their manipulations also involved a phase shift of the temporal pattern relative to the spatial pattern.) Timing is a property of actions or events. In the case of music, appropriate actions are implied by the sound structure which de1nes compatibility with regard to timing. Previous research has demonstrated that the most typical timing pattern, T0, has a privileged relation to the musical structure: it is representative of many individual performances (Repp 1998a); it is observed when pianists try to play in strict time (Repp 1999a,c); it is aesthetically appealing (Repp 1997); it biases imitation of expressive timing (Repp 2000b); and perception of timing in music exhibits strong biases whose pattern closely resembles T0 (Repp 1998b,c,d) and which have
261
aapc12.fm Page 262 Wednesday, December 5, 2001 10:02 AM
262
Common mechanisms in perception and action
been attributed to basic auditory and/or motor grouping processes (Penel and Drake 1998; Repp 1998d). Therefore, it was of special interest to see whether timing patterns other than T0 would be selectively facilitated by the music in the present task. There was clear evidence for facilitation of T1 in both experiments, but that pattern is moderately correlated with T0 and hence fairly typical as well. By contrast, the T2 and T4 patterns are of low typicality, although they do resemble the expressive timing patterns of some outstanding pianists. Nevertheless, selective facilitation of these patterns by the music did occur. This seems to be the 1rst demonstration, other than by the original pianists’ performances themselves, that radically different timing patterns can be compatible with the same musical structure, as hypothesized by Repp (1998a). The differences between the original timing patterns and their phase-shifted versions can also be viewed in terms of relative typicality. The seemingly equal bene1t bestowed by musical context on the four original patterns relative to their phase-shifted versions may be a consequence of the fact that the lower relative typicality of the phase-shifted versions varied in parallel with the higher relative typicality of the original patterns (see Table 12.2). An interesting and somewhat unexpected 1nding was that synchronization performance improved more during the music condition than during the preceding click condition or the following imagery condition, and that this improvement occurred regardless of pattern typicality. It appears that the musical context provided a structural framework that facilitated pattern learning, regardless of the appropriateness of the pattern. In other words, the temporal pattern could be ‘pegged to’ the musical structure, which served as a memory aid. This process presumably also accounts for musicians’ ability to reproduce structurally inappropriate timing patterns reasonably well in an imitation task (Clarke 1993b; Repp 2000b). The present study also addressed the question of whether musical imagery can have behavioral effects similar to those of music actually heard. In the present context, musical imagery refers to the generation of auditory and/or motor images from a memory representation of recently heard music. Basically, this amounts to an ‘inner singing’ of the melody, perhaps with accompaniment notes 1lled in where there are no melody note onsets. How vivid or detailed the participants’ imagery was is not known. What is clear from the results, however, is that imagery was not an effective substitute for hearing the music. Evidence for a bene1t due to imagined music was weak in Experiment 1 and effectively absent in Experiment 2. This is a somewhat disappointing result, but it may simply indicate that the participants’ imagery was not strong enough. In Repp’s (1999a) study, skilled pianists who had played the Chopin Etude excerpt earlier in the same experimental session were capable of equally accurate synchronization performance in music and imagery conditions; their synchronization performance was also much better overall as that of the present participants. Thus, more experienced or more practiced individuals may well show a clearer bene1t of musical imagery. In Experiment 2, a tendency of some participants to imagine the music during the click condition may have worked against 1nding a relative bene1t in the imagery condition. Indeed, 45% of the trials in the click condition did not exhibit an extra tap at the end, which indicates that the end of the sequence had not come as a surprise. However, strategies other than outright imagery (e.g. counting, grouping, or memory for local temporal pattern features near the end) could also have been responsible. By contrast, 96% of the imagery trials ended without an extra (or missing) tap, which suggests that the music was imagined correctly, though perhaps not vividly enough, in synchrony with the clicks. Evidence that musical imagery occurred also comes from the results with isochronous sequences. Here imagery induced systematic deviations from regularity in the 1nger taps, similar to those that
aapc12.fm Page 263 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
are observed in tapping to isochronous music (Repp 1999a,b) or, as also demonstrated here, to isochronous clicks accompanied by isochronous music. The deviations induced by imagery were smaller than those induced by real music, which again shows that imagery was less effective than hearing the actual sound. However, the 1nding that musical imagery can have involuntary effects on motor timing is theoretically interesting. It suggests a close connection between musical imagery and movement timing, just as there is a close connection between music perception and movement timing. The pattern of systematic deviations from regularity in tapping may represent a combination of expressive tendencies and automatic error correction, which is required to maintain synchronization. This issue is in need of further research, however. Automatic error correction is also responsible for the tracking tendency which dominated synchronization performance, especially in Experiment 2. Tracking is the consequence of unsuccessful synchronization, where each large asynchrony is partially corrected on the next tap while simultaneously a new large asynchrony may arise from the unpredicted time of occurrence of the next tone. The underlying mechanism is likely to be phase correction (Mates 1994; Pressing 1998; Vorberg and Wing 1996), which is an obligatory process that commonly occurs without awareness (Repp 2000a). A second error-correction mechanism hypothesized to underlie synchronization performance, timekeeper period correction (Mates 1994), probably does not play any important role in tracking as long as the average tempo of the sequence is constant, as it was in the present experiments. However, the period correction mechanism may well be responsible for the prediction of a learned pattern. In other words, remembered aspects of timing patterns as well as perceived or imagined musical structure may in2uence tap timing via intentional or unintentional modulations of the timekeeper period. Period correction may in part be a top-down mechanism which mediates temporal expectations and governs intentional temporal control, whereas phase correction is largely bottom-up and input-driven (Repp, 2001). If this interpretation is correct, then the error correction mechanisms that have been identi1ed in simple synchronization tasks may have broader implications for temporal pattern learning, motor control, and perception. Indeed, Large and Jones (1999) have proposed a perceptual model of beat tracking that incorporates analogous mechanisms. The possible parallels between error correction processes in perception and production of timing warrant further study.
Acknowledgments This research was supported by NIH grant MH-51230. I am grateful to Paul Buechler and Steve Garrett for assistance. Address correspondence to Bruno H. Repp, Haskins Laboratories, 270 Crown Street, New Haven, CT 06511–6695 (e-mail:
[email protected]).
Notes 1. Throughout this article, the term ‘clicks’ is used to refer to what was actually a series of very high-pitched digital piano tones (see Methods). 2. The term prediction rather than anticipation is used to avoid confusion with the anticipation tendency (i.e. taps precede sequence events) commonly observed in synchronization tasks (see, e.g. Aschersleben and Prinz 1995). As the terms are used here, prediction is pattern-speci1c whereas anticipation is not. 3. The third principal component was of little interest because it mainly consisted of a greatly lengthened IOI following the initial downbeat (cf. Fig. 12.1(a)).
263
aapc12.fm Page 264 Wednesday, December 5, 2001 10:02 AM
264
Common mechanisms in perception and action
4. A MAX patch is a program written in the graphic MAX programming environment. Due to a peculiarity of that software, real-time durations were 2.4% shorter than speci1ed or recorded, and than reported here. 5. Figures comparing the IOIs and average ITIs for the four modulated timing patterns may be found in an electronic appendix to this article on the author’s web page: <www.haskins.yale.edu/haskins/STAFF/repp.html> 6. A graphic example of T0 and T0′ is included in the electronic appendix. 7. For a 1gure illustrating the average ITI pro1les for the T0 and T0′ timing patterns, see the electronic appendix.
References Aschersleben, G. and Prinz, W. (1995). Synchronizing actions with events: The role of sensory information. Perception and Psychophysics, 57, 305–317. Behne, K.-E. and Wetekam, B. (1993). Musikpsychologische Interpretationsforschung: Individualität und Intention. Musikpsychologie, 10, 24–37. Brodsky, W., Henik, A., Rubinstein, B., and Zorman, M. (1998). Demonstrating inner hearing among highlytrained expert musicians. In S.W. Yi (Ed.), Proceedings of the Fifth International Conference on Music Perception and Cognition, pp. 237–242. Seoul, Korea: Western Music Research Institute. Clarke, E.F. (1985). Structure and expression in rhythmic performance. In P. Howell, I. Cross, and R. West (Eds.), Musical structure and cognition, pp. 209–236. London: Academic Press. Clarke, E.F. (1993a). Generativity, mimesis and the human body in music performance. Contemporary Music Review, 9, 207–219. Clarke, E.F. (1993b). Imitating and evaluating real and transformed musical performances. Music Perception, 10, 317–342. Clynes, M. (1983). Expressive microstructure in music, linked to living qualities. In J. Sundberg (Ed.), Studies of music performance, pp. 76–181. Stockholm: Royal Swedish Academy of Music. Drake, C. and Palmer, C. (1993). Accent structures in music performance. Music Perception, 10, 343–378. Fraisse, P. (1966). L’anticipation de stimulus rythmiques: Vitesse d’établissement et précision de la synchronisation. L’Année Psychologique, 66, 15–36. Hary, D. and Moore, G.P. (1985). Temporal tracking and synchronization strategies. Human Neurobiology, 4, 73–77. Hary, D. and Moore, G.P. (1987). Synchronizing human movement with an external clock source. Biological Cybernetics, 56, 305–311. Iyer, V.S. (1998). Microstructures of feel, macrostructures of sound: Embodied cognition in West African and African-American musics. Unpublished doctoral dissertation, University of California, Berkeley. Large, E.W. and Jones, M.R. (1999). The dynamics of attending: How we track time-varying events. Psychological Review, 106, 119–159. Mates, J. (1994). A model of synchronization of motor acts to a stimulus sequence. I. Timing and error corrections. Biological Cybernetics, 70, 463–473. Michon, J.A. (1967). Timing in temporal tracking. Assen, NL: van Gorcum. Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331–346. Penel, A. (2000). Variations temporelles dans l’interpretation musicale: Processus perceptifs et cognitifs. Unpublished Ph.D. dissertation, University of Paris 6. Penel, A. and Drake, C. (1998). Sources of timing variations in music performance: A psychological segmentation model. Psychological Research, 61, 12–32. Penel, A. and Drake, C. (1999). Seeking ‘one’ explanation for expressive timing. In S.W. Yi (Ed.), Music, mind, and science, pp. 271–297. Seoul, Korea: Seoul National University Press. Pierce, A. and Pierce, R. (1989). Expressive movement: Posture and action in daily life, sports, and the performing arts. New York: Plenum Press. Pressing, J. (1998). Error correction processes in temporal pattern production. Journal of Mathematical Psychology, 42, 63–101. Repp, B.H. (1992a). Diversity and commonality in music performance: An analysis of timing microstructure in Schumann’s ‘Träumerei’. Journal of the Acoustical Society of America, 92, 2546–2568.
aapc12.fm Page 265 Wednesday, December 5, 2001 10:02 AM
The embodiment of musical structure
Repp, B.H. (1992b). Probing the cognitive representation of musical time: Structural constraints on the perception of timing perturbations. Cognition, 44, 241–281. Repp, B.H. (1993). Music as motion: A synopsis of Alexander Truslit’s ‘Gestaltung und Bewegung in der Musik’ (1938). Psychology of Music, 21, 48–72. Repp, B.H. (1997). The aesthetic quality of a quantitatively average music performance: Two preliminary experiments. Music Perception, 14, 419–444. Repp, B.H. (1998a). A microcosm of musical expression: I. Quantitative analysis of pianists’ timing in the initial measures of Chopin’s Etude in E major. Journal of the Acoustical Society of America, 104, 1085–1100. Repp, B.H. (1998b). Obligatory ‘expectations’ of expressive timing induced by perception of musical structure. Psychological Research, 61, 33–43. Repp, B.H. (1998c). The detectability of local deviations from a typical expressive timing pattern. Music Perception, 15, 265–290. Repp, B.H. (1998d). Variations on a theme by Chopin: Relations between perception and production of deviations from isochrony in music. Journal of Experimental Psychology: Human Perception and Performance, 24, 791–811. Repp, B.H. (1999a). Control of expressive and metronomic timing in pianists. Journal of Motor Behavior, 31, 145–164. Repp, B.H. (1999b). Detecting deviations from metronomic timing in music: Effects of perceptual structure on the mental timekeeper. Perception and Psychophysics, 61, 529–548. Repp, B.H. (1999c). Relationships between performance timing, perception of timing perturbations, and perceptual–motor synchronization in two Chopin preludes. Australian Journal of Psychology, 51, 188–203. Repp, B.H. (2000a). Compensation for subliminal timing perturbations in perceptual–motor synchronization. Psychological Research, 63, 106–128. Repp, B.H. (2000b). Pattern typicality and dimensional interactions in pianists’ imitation of expressive timing and dynamics. Music Perception, 18, 173–211. Repp, B.H. (2001). Processes underlying adaptation to tempo changes in sensorimotor synchronization. Human Movement Science, 20, 277–312. Schulze, H.-H. (1992). The error correction model for the tracking of a random metronome: Statistical properties and an empirical test. In F. Macar, V. Pouthas, and W.J. Friedman (Eds.), Time, action, and cognition, pp. 275–286. Dordrecht, The Netherlands: Kluwer. Semjen, A., Vorberg, D., and Schulze, H.-H. (1998). Getting synchronized with the metronome: Comparisons between phase and period correction. Psychological Research, 61, 44–55. Shin, J.C. and Ivry, R.B. (1999). Concurrent temporal and spatial learning in a serial reaction time task. Poster presented at the Annual Meeting of the Psychonomic Society, Los Angeles, CA. Shove, P. and Repp, B.H. (1995). Musical motion and performance: Theoretical and empirical perspectives. In J. Rink (Ed.), The practice of performance, pp. 55–83. Cambridge, UK: Cambridge University Press. Thaut, M.H., Tian, B., and Azimi-Sadjadi, M.R. (1998). Rhythmic 1nger tapping to cosine-wave modulated metronome sequences: Evidence of subliminal entrainment. Human Movement Science, 17, 839–863. Viviani, P. and Stucchi, N. (1992). Motor–perceptual interactions. In G.E. Stelmach and J. Requin (Eds.), Tutorials in motor behavior II, pp. 229–248. Amsterdam: Elsevier. Vorberg, D. and Wing, A. (1996). Modeling variability and dependence in timing. In H. Heuer and S. W. Keele (Eds.), Handbook of perception and action, Vol. 2, pp. 181–262. London: Academic Press.
265
aapc13.fm Page 266 Wednesday, December 5, 2001 10:03 AM
13 Action, binding, and awareness Patrick Haggard, Gisa Aschersleben, Jörg Gehrke, and Wolfgang Prinz Abstract. This chapter focuses on the process of intentional action, on our conscious awareness of some events occurring during that process and on the chronometry of these conscious states. We 1rst introduce the concept of efferent binding: a hypothesized neural process which links representations of intentions to act to representations of the action itself, and 1nally to representations of the external consequences of action. We then describe two experiments investigating the perceived times of actions and of associated stimulus events. Our results provide evidence for an efferent binding process which influences conscious awareness, and which amounts to a common principle for conscious coding of perception and action.
13.1 Introduction Intentional action is fundamental to human existence. We all believe we have the ability to do what we want. We safeguard this ability by quite careful cultural formulations within our legal system, and we consider highly disabling any pathological condition in which people cannot act to realize their wishes and desires. At these more social levels, intentional action typically implies some very complex behaviour such as publishing a newspaper, walking in the mountains, or crossing a geopolitical frontier. However, at its most basic level, the problem of intentional action can be reduced to the problem of how ‘I’ generate movements of my muscles. The key components of intentional action are as follows. First, a movement must occur. For example, simply thinking about doing something could not constitute an intentional action, whereas doing the same thing might. Second, the behaviour must be generated, in the sense that it must be produced by a goal-directed thought. There must be an identi1able link between a mental state (i.e. the intention), and the behaviour subsequently performed to achieve that goal. Third, the generation of behaviour must some how come from within me. The mental state that ultimately gives rise to the behaviour must be my mental state. This connection between intentional action and ‘I’ was expressed very de1nitively in the earliest psychological studies of volition: La conscience de l’action doit donc être considerée comme . . . la forme d’intervention du moi phénomenal dans la vie psychique. (Michotte and Prum, 1910)
Intentional actions, then, are things that I do. Thus reflex behaviours, such as the doctor produces by tapping on my tendons, are not intentional actions, because I have not caused them. We are left then with the key problem of intentional action:—how do ‘I’ manage to make my body move? This question has traditionally been the province of philosophers, and only recently has experimental evidence about neural and mental function been seen as relevant to answering it. The brain processes
aapc13.fm Page 267 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
underlying intentional action have been reviewed elsewhere (Frith et al. 2000) and will not be discussed further here. Instead, this paper addresses a more psychological question about the relationship between intentional action and conscious awareness. As stated above, the position that the self or ‘I’ must be the generator of intentional actions immediately raises the questions of whether intentions are conscious, and of how the primary consciousness of intentions is related to the secondary self-consciousness of ‘I’. Perhaps the best-known position, and certainly the one that most clearly underlies lay belief, in Western cultures, is the Cartesian position of the conscious thinking self generating actions from her or his conscious free will. On this model, I consciously decide to do something, and that conscious state is suf1cient to lead to appropriate muscular movement to realize my goal. The key role of consciousness in the question of intentional action arises through the necessary involvement of the self or ‘I’. The portion of the chain leading from intention to action could in principle operate without any of the participating representations reaching conscious awareness, in the sense of being accessible to verbal report. Indeed, this appears to be the case in Alien or Anarchic Hand Syndrome. Della Sala, Marchetti, and Spinnler (1991) described a patient who exhibited a right anarchic hand following a combined callosal and mesial frontal lesion. On being given a cup of hot tea, the patient announced that she would not drink it yet, but would wait for it to cool. Nevertheless, the right anarchic hand reached out for the hot tea in an apparently well-formed goal-directed action, which the patient had to resist by restraining her right hand with her left. In this case, it seems that the action of the right hand may well be intentional, because the right hand’s movement was caused by a mental representation of the action to pick up the cup. On the other hand, the intention was not conscious: the patient’s decision and verbal report were not to pick up the cup. It seems possible then, at least in pathology, that unconscious intentions exist. Della Sala et al.’s case demonstrates the possibility of unconscious intention, but does not meet the full conditions for intentional action described above. This is because the unconscious intention that moves the right hand does not belong to the patient’s ‘I’. The right hand’s action is owned, but the source of the action (the intention) is not owned by the ‘I’. (We are grateful to Tony Marcel for this form of words.) That is why this particular movement has such a strange phenomenology for the patient. These cases are, of course, the exception. Intentions need not always be conscious, but normal intentional action requires both 1rst-order consciousness (consciousness of intention) and second-order consciousness (self-consciousness, ‘I’). The dominant neuroscienti1c account of consciousness in intentional action is the central monitoring account of Frith (1992). This account neatly links 1rst and second-order consciousness through the process of attribution. When an event occurs in the environment, it is important to know whether I caused it, or whether it is an external event. By monitoring my intentions, and comparing the predicted consequences of my intentions to the stream of perceptual events, I can distinguish internally generated events from external events. The central monitoring process therefore plays a crucial role in separating perceptual awareness from the self-conscious states surrounding willed action.
13.2 The contribution of Libet Given that intentional actions can involve both conscious and nonconscious states, we can ask: what does consciousness add to them? Any discussion of this question must begin with the work of Libet. The most influential study in this area has been that of Libet, Gleason, Wright, and Pearl (1983). Their study combined the electrophysiological measurement of neural precursors of intentional
267
aapc13.fm Page 268 Wednesday, December 5, 2001 10:03 AM
268
Common mechanisms in perception and action
Fig. 13.1 Set-up for a typical experiment based on the method of Libet et al. (1983). Figure taken from Haggard and Eimer (1999).
action with a psychophysical method for measuring subjects’ conscious awareness of intention. A schematic of the arrangement used in our replication of Libet et al.’s study (Haggard and Eimer 1999) is shown in Fig. 13.1, but the description below refers to the original experiment of Libet et al. Briefly, subjects viewed a slowly rotating clock hand on a screen. The clock hand rotated once every 2560 ms. Subjects were instructed to make ‘freely voluntary acts’, that is, discrete movements of the right hand, at a time of their own choosing. Some random interval after the subject performed each intentional action, the clock hand ceased to rotate, and the subject was asked to report the position of the clock hand at which they had 1rst become consciously aware of the intention to produce the action. At the same time, Libet and colleagues measured the readiness potential over the motor cortical areas as in index of neural initiation of these intentional actions. The experimenter can then compare the subjective time of events (based on reports of clock positions) with the objective times of the corresponding events (based on recordings of EEG and muscle activity) to calculate a judgement error. A number of methodological points require special mention. First, since the intertrial variability of such judgements is high, a mean of several trials is typically used. Second, the subjects are given no feedback about their judgement error at any point in the task, since to do so would presumably rapidly reduce judgement errors to zero. Third, and most importantly, Libet and colleagues asked their subjects to use the rotating clock as an external metric for judging the time of their conscious intentions without letting the clock itself control the generation of their actions. Libet et al.’s results describe the temporal relations between preparation of intentional action, and conscious awareness of intentions. Briefly, readiness potential recordings showed that subjects began the neural preparation of their action at least 700 ms before the 1rst onset of muscle activity. In contrast, subjects’ verbal reports using the clock hand position suggested that they became aware of their
aapc13.fm Page 269 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
intentions to act only some 200 ms before the muscle became active. Therefore, during a period of at least 500 ms, the brain is processing the generation of an action of which ‘I’ am as yet unaware. Since backwards causation is impossible, it follows that neither ‘I’ nor my conscious intentions can be the cause of my actions. For Libet, the causation is in fact the other way round: my readiness potential causes my conscious intention (but see Haggard and Eimer 1999, for a revision of this view). Libet et al.’s results appear to represent a major dif1culty for the traditional Cartesian concept of conscious free will, and pose a real problem for the generative role of consciousness in intentional action. While Libet et al.’s result has been criticized on several counts (see the replies to Libet’s (1985) target article in Behavioural and Brain Sciences for a selection, including Breitmeyer 1985; Bridgeman 1985; Rugg 1985) the basic result showing the temporal order of neural and conscious events appears to hold (Haggard and Eimer 1999).
13.3 Generative versus constructive accounts of conscious intention What then is the role of consciousness in the process of generating action? We suggest that consciousness has at least two qualitatively distinct contributions to the process of intentional action. The 1rst, which has dominated almost all previous enquiries since Descartes, is the generative process whereby conscious states activate the sequence of events that ultimately generates muscular contraction. Libet et al.’s account, for example, can be seen as an experimental disproof of this generative role of consciousness. This problem is essentially a philosophical one, and it is not clear how a reductive neuroscientist (i.e. someone who ultimately believes that mental states are brain states) can usefully engage with the question. The second role that consciousness may play in intentional action has received far less scienti1c attention. We will call it the constructive role of consciousness in intentional action. On the constructive view, consciousness provides the background set of conditions against which intentional action can take place. Without the experience of conscious intention, ‘I’ would never acquire the concept of intentional action. Therefore, although consciousness would not be necessary to make any particular intention give rise to its associative action, consciousness may be necessary for us to acquire and retain the ability to make the process of intentional action work. In particular, consciousness may be crucially related to a sense of agency: if we did not formulate conscious intentions, and did not consciously represent the actions that those intentions produce, and the consequences of those actions for ourselves, then we would not produce any intentional actions. Conscious awareness of our intentions, and of the bodily and environmental consequences they cause, is required to construct the possibility of intentional action. The constructive view of consciousness implies a difference between goal-directed behaviours and true intentional actions on the basis of this background mental history. A nonconscious organism may be capable of goal-directed action, but only if the organism produced the behaviour from a mental background of conscious representation of its own internal and external states would we admit that the goal-directed behaviour also satis1ed the criteria for an intentional action. In summary, the generative and constructive views of conscious intention differ in the roles of consciousness that they emphasize. The generative role emphasizes how conscious states can have causal power over the material body, while the constructive role emphasizes how conscious representations contribute
269
aapc13.fm Page 270 Wednesday, December 5, 2001 10:03 AM
270
Common mechanisms in perception and action
to the sense of ‘I’ as an agent. This sense of agency is in turn required for us to bother making any intentional actions in the 1rst place.
13.4 Efferent binding How then does consciousness construct this relation between intention and action which is required to make agency possible? One simple possibility is that consciousness participates in a process which we will call efferent binding. This is a hypothetical brain process akin to the perceptual binding process that occurs in visual object perception (Engel et al. 1999). Efferent binding would associate intentions with representations of the actions that they produce, and with perceptual representations of the environmental consequences of these actions. Efferent binding is the way that we learn the relationship between our intentions and their results. If the intention does not produce the desired result, then the relationship between intention and action needs adjusted on future attempts. Motor learning therefore requires the efferent binding process. Efferent binding between intention, action, and consequence can take place entirely unconsciously. We have already seen that an unconscious intention may exist. Similarly, actions, in the sense of body movement, and sensory representation of their consequences can certainly occur without conscious awareness. An unconscious process such as Pavlovian conditioning could produce an association between these representations. Therefore efferent binding, in the sense of some association between representations of actions and their consequences, might occur without any conscious awareness either of the binding process or of any of the bound events. We suggest that efferent binding has both nonconscious and conscious elements. Nonconscious association between action and consequence representations occurs in many learning and performance situations. Several human behavioural studies show that representations of actions and of effects are integrated (for a selection, see Stoet and Hommel 2001; Ziessler and Nattkemper 2001; Hazeltine 2001; Elsner and Hommel, in press). Further, operant learning in animals constitutes a whole 1eld of psychology in which animals associate effects with their own actions (Dickinson 1980). At a more physiological level, the widely accepted reafference principle of action control (von Holst and Mittelstaedt 1950) requires matching afferent information about the effects of action with efferent information about motor commands. However, we here focus speci1cally on the conscious consequences of the hypothesized efferent binding process. In many circumstances the binding between action and effect seems to be highly relevant to consciousness. First, we are often conscious of events when efferent binding fails: the moment we realize that we meant to press one button, but actually pressed another can be phenomenologically very vivid. Moreover we are often conscious of intentional action, and external consequence representations which are successfully bound. An example occurs in motor learning. Sometimes we suddenly ‘feel’ the relation between an action and an effect, for example when learning to use the clutch in a new car. Consciousness and efferent binding are therefore intimately related, even if the former is not necessary for the latter. This chapter concerns solely the conscious aspects of the binding process: thus from now on we will use the term efferent binding to refer solely to the associations between conscious representations of actions and consequences. We believe that a principled account of when associations between actions and effects influence awareness, and when they do not is a priority for future research, as this important question has generally been neglected in associationist psychology.
aapc13.fm Page 271 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
In addition, the efferent binding process is important in making sense of our relation to the world around us. Speci1cally, efferent binding is required to answer the attribution question ‘did I do that?’ As outlined above, when an event occurs in the outside world, it is important to distinguish between whether it is a consequence of my own action, or an unexpected external event which I need to process and possibly to respond to. That is, the mind must be able to attribute actions to agents, and particularly to the self. Most previous research in this area has focused on the concept of and internal forward model (Kawato and Wolpert 1998), which predicts the consequences of our intentions, and attempts to cancel these predicted consequences against perceptions of actual external events. The best-known example is the classic outflow theory of the stability of the visual world (Sherrington 1898). In this, the visual world remains stable even when we move our eyes, because the actual retinal slips induced by eye movement can be cancelled against the retinal slips predicted from the motor command or intention to produce a speci1c eye movement. More recently, Blakemore et al. (1999) have shown a similar efferent binding process to be important in cutaneous sensation. They have argued that one cannot tickle oneself because the proprioceptive afference is cancelled by the predicted sensory consequences of the efference associated with the tickling movement. When somebody else tickles one, in contrast, the proprioceptive input is not cancelled in this way. The above example is important because it demonstrates that the efferent binding process has important implications for our conscious awareness of physical stimuli. We know from everyday experience that the phenomenology of tickling is very strong indeed. Therefore, the success or otherwise of efferent binding may be very important for the conscious awareness we have of our own actions and of external events. More speculatively, it seems likely that efferent binding may play a key role in constructing self-consciousness. To represent ‘I’ as a conscious agent may depend on binding my conscious intentions, to my actions and to their effects. There is a clear relation between the efferent binding process and Frith’s central monitor (see above, and Frith 1992). We suggest that efferent binding is a speci1c mental process with conscious consequences that occurs when the monitor detects a match between an external event and an intention. The two representations are then bound together, and tagged as an intentional action belonging to ‘I’.
13.5 Experimental measures of efferent binding The conscious aspects of efferent binding have rarely been studied experimentally. In this paper we use the perceived time of events as a useful experimental method for studying binding of conscious representations. Speci1cally, where the representation of an intention, of an action, and of the environmental sensory consequences of the action are bound together, we would expect the perceived times of those three separate events to be attracted towards each other. Several strands of logic underlie this claim. First, as Hume (1750) originally demonstrated, association of percepts across time is a crucial element in providing the idea of causation in general, and of free will in particular. Perceptual attraction seems a natural consequence of such association: certainly it is more so than perceptual repulsion. Second, just as reductions in reaction time with learning are often attributed to strengthening of associations between stimulus and response representations, by the same argument reductions in the interval between percepts is evidence for association between the conscious elements of those representations. Third, perceptual attraction has some clear similarities to learning to predict. When an animal learns an assocation between two events, the response to the unconditioned stimulus is temporally shifted back towards the predictive conditioned stimulus (e.g. Yeo, Lobo, and Baum 1997). We suggest that temporal shifts in percepts may also reflect association or binding between conscious states. Fourth,
271
aapc13.fm Page 272 Wednesday, December 5, 2001 10:03 AM
272
Common mechanisms in perception and action
perceptual attraction 1ts well with the function of efferent binding of matching conscious states across time into a coherent representation of intentional action. While perceptual attraction does not imply complete fusion of the several percepts involved, it would help to provide a uni1ed conscious experience of intentional action. Such a uni1ed experience would be required to develop a coherent sense of agency. Finally, we add that temporal attraction effects may be merely one class of evidence for efferent binding. Other classes of evidence may exist. Investigation of attraction between spatial representations in willed action would be a particularly fruitful area for future research. However, any such other evidence would be characterized by attraction effects rather than repulsion effects. On the other hand, temporal attraction effects between percepts are highly consistent with the function of consciousness of unifying experience across space and time (Kant 1781, 1963). We can therefore compare the perceived times of events which occur within an intentional action context with the perceived time of the same events occurring alone. If binding occurs, the bound representations should be attracted towards each other. Attraction effects on perceived time of occurrence of events are thus a useful experimental sign of the efferent binding process. Note that efferent binding does not imply that all of the bound components (intention, action, and consequence) come to form a single indivisible perceptual unit. They may indeed remain perceptually quite distinct. The process of binding will, however, strengthen associations between these representation so that they will influence each other. This strengthening of association might bring the events closer together in consciousness but should not push them further apart. We now report two experiments investigating attraction effects in simple manual actions. These experiments suggest that: (1) an efferent binding process exists which integrates the conscious representations of actions and of events; (2) this efferent binding process is speci1c to actions, rather than being an instance of general perceptual attraction; (3) efferent binding forms part of a wider process integrating our actions with the causes and effects in which they occur.
13.6 Experiment 1: perceived time of stimuli and of actions Our 1rst experiment focused on attraction effects between stimuli and actions. We reasoned that a binding process should produce attraction between the perceived times of these events. We wanted to measure the direction of these attraction effects; do actions tend to be attracted towards their consequences or, alternatively, do consequences tend to be attracted towards the actions that cause them? Second, we wished to compare whether the efferent binding process was comparable between intentional actions and other forms of sensorimotor association, such as simple reactions to imperative stimuli. We therefore used the method of Libet et al. (1983) in which subjects judge the perceived time of sensory or motor events. In our experiment, subjects judged the perceived time of onset of any of 6 events according to condition. The 1rst two conditions were control conditions, in which only one event occurred per trial, and subjects judged the position of the clock at which they perceived the event onset to occur. In the 1rst condition the event was a 1 kHz pure tone of 100 ms duration. In the second condition, the event was an intentional key press, which subjects made with the index 1nger of their right hand at a time of their own free choice. The remaining conditions required them to judge the perceived time of either a similar pure tone stimulus, or of a similar keypress action, when these events occurred in a sensorimotor context. The sensorimotor contexts were of two kinds: either simple reactions, or intentional operant actions. The simple reaction time (SRT) task was studied in conditions 3 and 4. In condition 3, the 1 kHz pure tone occurred at a random latency from the start of the trial, and subjects responded to it as quickly as possible with a right manual keypress. They then judged the time
aapc13.fm Page 273 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
of occurrence of the pure tone. In condition 4, subjects again responded to the auditory tone, but this time they used the clock hand to judge the onset of their keypress response. The 5th and 6th conditions used an intentional operant task, again combining the same auditory stimulus and manual keypress action. In condition 5, subjects made an intentional right index 1nger keypress at a time of their own choice. This was followed after 200 ms by a 1-kHz pure tone. The choice of operant interval was made to be similar to subjects’ predicted reaction times in the SRT task. Subjects used the clock hand to judge the perceived time of the manual action. In condition 6, subjects again made intentional operant actions, which again elicited the auditory stimulus at a latency of 200 ms. In this condition, however, they used the clock hand to judge the perceived time of the auditory stimulus. In summary, the conditions differed according to the sensorimotor context in which the events occurred (single, reactive, or operant), and the event judged (stimulus or action). In each condition, the subject’s judgement allowed us to calculate a judgement error, de1ned as the difference between the actual time of occurrence of the judged event (stimulus or action, according to condition) and the perceived time of its occurrence. Following convention, a negative judgement error is used for anticipatory awareness of events (the subject thought the event happened before it really did), and a positive judgement error is used for delayed awareness (the subject thought the event happened after it really did). Table 13.1 shows the sequence of events. S denotes the stimulus, A denotes the manual action, and subscript j denotes the event judged. In other respects, the experiments were similar to others reported previously (Haggard and Eimer 1999; Haggard and Magno 1999; Haggard, Newman, and Magno 1999). Briefly, subjects sat comfortably facing a computer screen on which a small clock face was shown. The clock had a single hand which rotated at a speed of 2560 ms per revolution. The clock was marked with a conventional 5-minute visual scale, though subjects were encouraged to report intermediate values to the maximum precision possible. Subjects began each trial at a time of their own choosing by pressing a key. This began the revolution of the clock. In conditions requiring intentional action (i.e. conditions 2, 5, and 6) subjects made an intentional keypress with their right index 1nger at a time of their own choosing. Subjects were instructed not to act in a stereotyped way, to avoid acting at a 1xed latency after the start of the trial, to avoid choosing to act at speci1c predecided positions of the clock hand, and to ensure that the clock rotated at least once prior to their action. In the conditions where the auditory stimulus was the 1rst event (i.e. conditions 1, 3, and 4) the auditory stimulus occurred at a random latency from the start of each trial.
Table 13.1 Experiment 1: design and results Condition
Mean Judgement Error (ms)
1
Sj
Judge time of a beep
– 30
2
Aj
Judge time of a willed action
–9
3
SjA
Judge time of a beep to which subject responds
– 19
4
SAj
Judge time of response to a beep
– 57
5
AjS
AjS Judge time of a willed operant action
6
ASj
Judge time of beep elicited by willed operant action
1 – 71
273
aapc13.fm Page 274 Wednesday, December 5, 2001 10:03 AM
274
Common mechanisms in perception and action
Twenty-1ve subjects from the subject pool at the Max Planck Institute for Psychological Research participated in this experiment. Each condition was performed in a separate block of the experiment. Each subject performed the blocks in a different random order. Each block contained 70 trials. A small (never more than 6%) number of trials had to be discarded on some blocks due to technical failures or subjects’ failure to follow instructions. Inspection of the raw judgement error data showed a number of subjects who produced exceptionally high standard deviations of judgement error in most conditions. These standard deviations were as high as 1 s in some instances, implying that subjects were sometimes not even sure of which side of the clock face the clock hand was on when the judged event occurred! This group of subjects was readily identi1able both in the data table, and, more objectively, by a cluster analysis. Both methods identi1ed the same group of 7 subjects as having unusually high standard deviations. The standard deviation of several perceptual estimates is related to the perceptual saliency of the judged event. Events with a distinct and vivid phenomenology should produce low trial-to-trial variability in perceived time judgements, whereas events with indistinct phenomenology should produce larger trial-to-trial variabilities. It seems, then, that for some people the phenomenology of the sensorimotor events in this experiment was too vague for genuine judgement. Alternatively, these subjects could have been very bad at temporal judgement. We concluded that these subjects could not perform the task, and we therefore excluded their data. Importantly, this decision was made on the basis of the standard deviation of judgement errors across trials, rather than the mean judgement errors.
13.6.1 Results and discussion The mean judgement errors from the 18 remaining subjects in each condition are shown in Table 13.1. The mean estimates for the action-only and stimulus-only conditions are comparable with previous reports. In particular, actions are perceived to occur rather earlier than they actually do (cf. Libet et al. 1983; Haggard 1999). More importantly, the conjunction of stimuli with actions in the operant and reactive conditions produced clear evidence of attraction effects. In condition 4 (SAj) an action performed in response to a stimulus was perceived to occur earlier than an action which is performed by itself. This suggests that the action percept is attracted by or bound to the stimulus percept. Similarly, in condition 6 (ASj), a stimulus which follows an operant action is perceived to happen earlier than a stimulus which occurs alone. The stimulus percept appears to be attracted by the operant action which caused it. Even more importantly, these attraction effects appear to be asymmetric. In SRT condition 3 (SjA), the perceived time of the imperative stimulus is only very slightly later than the perceived time of a stimulus occurring alone. That is, the stimulus percept is only minimally attracted by the action which follows it. In operance (condition 5, AjS) the action is perceived to occur only slightly later than the same action occurring alone without a subsequent stimulus. Therefore, the action percept is only minimally attracted by the subsequent stimulus. To study these attraction effects statistically, we treated each subject’s mean estimate in the stimulus-only (Sj) and action-only (Aj) conditions as baseline control values. We assumed that these estimates would vary with the sensory transmission of each subject, with the salience of the sensory or motor phenomenology for each subject, and with the particular division of attention between clock and sensory motor event generated by each subject. Because of such factors, the speci1c numerical estimates obtained for the perceived times of events in the Libet task are not informative by themselves (Haggard and Eimer 1999). However, these sensory and attentional factors can be assumed constant across conditions within a single subject. Therefore, the difference between two
aapc13.fm Page 275 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
Libet estimates of the same physical event obtained under different task circumstances may be informative. We therefore calculated the change in judgement error, or perceptual shift, for stimuli and actions in the operant and reactive conditions, by subtracting the appropriate baseline values obtained from each subject in the stimulus-only and action-only conditions. That is, we subtracted each subject’s judgement error in judging the perceived time of a stimulus presented alone (condition 1: Sj) from the perceived time of the physically identical stimulus presented in a simple reaction context (condition 3: SjA), and from the perceived time of the physically identical stimulus presented in an operant context (condition 6: ASj). Likewise, we subtracted each subject’s estimate of the perceived time of an action occurring alone (condition 2: Aj) from the perceived time of a physically similar action occurring in a simple reaction context (condition 4: SAj) or in an operant context (condition 5: AjS). These perceptual shifts are shown in Fig. 13.2.
Fig. 13.2 Perceptual shifts in Experiment 1 categorized by task.
The data of Fig. 13.2 were subjected to factorial ANOVA, using two within-subject factors. These were the type of event judged (stimulus or action) and task context (reactive or operant). This analysis showed no signi1cant effect of event type (F(1, 17) = 0.174, NS), no signi1cant effect of task context (F(1, 17) = 0.208 NS), but a highly signi1cant interaction between the two factors (F(1, 17) = 11.238, p = 0.004). This interaction arose because SRT tasks produced large attraction effects on action, but small attraction effects on stimuli. Operance produced large attraction effects on stimuli and small attraction effects on actions. Put another way, the cause, or 1rst event to occur in each sequence was not substantially shifted towards the effect or second event, but the second event was substantially shifted towards the 1rst. To emphasize the importance of causal status, we factorialized the same data in another way, using a factor of cause or effect, and a second factor of judged event type. This analysis is shown in Fig. 13.3, which is informationally equivalent to Fig. 13.2.
275
aapc13.fm Page 276 Wednesday, December 5, 2001 10:03 AM
276
Common mechanisms in perception and action
Fig. 13.3 Perceptual shifts in Experiment 1 categorized by causal status. Data is the same as in Fig. 13.2. This analysis shows only a signi1cant main effect of cause versus effect. Therefore, we conclude that the causal status of an event is highly relevant to the time at which it is perceived to occur. A process with the characteristics of efferent binding attracts sensory and motor percepts in action contexts. The main effect of causation applied to the data in Fig. 13.3 does not strictly test the hypothesis that effects are attracted more than causes. This is because efferent binding would produce attraction effects in opposite directions for causes and for effects. In efferent binding causes should be attracted towards effects whereas effects should be attracted towards causes. To test for differences in attraction effects, rather than perceptual shifts in general, it is therefore necessary to invert the sign of the shift for the causes, to accommodate the fact that attraction effects operate in different directions on the two classes of causal event. Statistical testing of the data after this transformation showed only a main effect of causation (F(1, 17) = 4.957, p = 0.04) therefore, the stronger hypothesis, that effects are attracted more than causes, is sustained. Thus, the efferent binding process apparent in our data appears to be asymmetric. Finally, we calculated the mean reaction time in the SRT task (conditions 3 and 4). This was 208 ms. Recall that the action–stimulus interval in the operant conditions was 1xed at 200 ms in this experiment. Thus, the operant and SRT tasks had very similar temporal extents.
13.7 Experiment 2: attraction effects in causation and in sequences In our discussion of the frst experiment we suggested that attraction effects are a sign of a process of efferent binding. We also claimed that effects are attracted towards causes more than causes are attracted towards effects. However, our 1rst experiment confounded the causal relation between
aapc13.fm Page 277 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
stimulus and action with the temporal relation between stimulus and action. On the basis of our 1rst experiment alone, we cannot distinguish whether effects are bound to causes, whether the 1rst percept of a sequence is bound to the second. The effects seen in Experiment 1 could also re2ect recency effects in memory for event times. Experiment 2 aimed to resolve this ambiguity by extending the design or Experiment 1 to include sequences of 2 stimuli, or 2 actions. When two external stimuli occur with a 1xed interstimulus interval, there is no reason why they should be represented as causally related, and no reason why they should be subject to any efferent binding process in conscious awareness. The same applies for two independent yet successive actions. We call such arrangements sequence contexts, as opposed to causal contexts. Note that this argument does not deny the possibility of some perceptual attraction between sequences of actions. It merely denies that any attraction effect between sequences could arise from the efferent binding process. Experiment 2 therefore compared attraction effects in sequence contexts and in causal contexts. There were ten experimental conditions. These are shown in Table 13.2, using the same notation as Table 13.1. The 1rst six conditions were identical to those in Experiment 1. In conditions 7 and 8 the subject judged the onset times of the 1rst and second of two successive auditory stimuli, respectively. In conditions 8 and 9 subjects judged the time of the 1rst or second of a pair of successive keypresses, respectively. In most respects, the apparatus, design, and analysis of the data resembled that in Experiment 1, so we will only mention here the speci1c respects in which Experiment 2 differed. Experiment 2 was based on a desired interval between the two events of 250 ms, rather than the 200 ms used in Experiment 1. Thus, for example, the operant delay used in conditions 5 and 6 was 250 ms, as was the inter-onset interval between the two auditory stimuli in conditions 7 and 8. In conditions 9 and 10, we trained subjects to produce key presses separated by an interval of 250 ms by giving subjects eight practice trials before each of the appropriate blocks of the experiment. In the practice trials, subjects heard pairs of tones separated by a 250 ms interval. They were asked to tap in synchrony with these tones, and to attend precisely to the interval between them. No feedback about their performance was given. In view of the increased number of judgement conditions in Experiment 2, we presented a prompt on the screen at the end of the trial. When the Libet clock 1nally ceased to rotate at the end of each trial, a single line of text appeared on the screen asking the subject to judge the clock position at the onset of ‘the 1rst beep’, ‘the second keypress’, or whatever was appropriate for the condition. During the trials, all keypress responses were made with the right index 1nger on the F9 key of the keyboard. The number of trials performed in each condition was reduced to 40 to ensure the experiment was not excessively long. Each condition was studied in a separate block of the experiment. Sixteen subjects participated in the experiment. All were right-handed, with normal or corrected to normal vision, and with no history of neurological impairment. Most were students at University College London. Eight random orders of the ten different conditions were used, with one subject experiencing each random order in the forward direction, and a second subject experiencing the same random order in the reverse direction. The mean judgement error and its standard deviation across trials were computed as for Experiment 1. No evidence was found in this experiment of a subset of subjects with egregiously high standard deviations. The mean judgement errors for the ten conditions of Experiment 2 are shown in Table 13.2, together with the standard deviation across subjects. As in Experiment 1, we subtracted the mean judgement error for each subject in the control conditions Sj and Aj from their judgement errors in
277
Experiment 2: design and results
Condition
Task
Event judged
Event position
Context
Mean judgement error (ms)
S
1
Single
−4
Standard deviation (ms)
1
Sj
Judge time of a beep
46
2
Aj
Judge time of a willed action
A
1
Single
4
48
3
SjA
Judge time of a beep to which subject responds
S
1
Causal
25
52
4
SAj
Judge time of response to a beep
A
2
Causal
−49
124
5
AjS
Judge time of a willed operant action
A
1
Causal
23
59
6
ASj
Judge time of beep elicited by willed operant action
S
2
Causal
−46
67
7
SjS
Judge time of 1rst of two beeps
S
1
Sequential
8
54
8
SSj
Judge time of second of two beeps
S
2
Sequential
17
78
9
AjA
Judge time of 1rst of two keypress actions
A
1
Sequential
−14
53
10
AAj
Judge time of second of two keypress actions
A
2
Sequential
22
46
aapc13.fm Page 278 Wednesday, December 5, 2001 10:03 AM
Table 13.2
aapc13.fm Page 279 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
the other conditions. The resulting data set represents the perceptual shift in the perceived time of stimulus and of action in the causal and sequential conditions. These can be arranged as a 2 by 2 by 2 factorial with the following three factors: (1) the event judged in each condition (stimulus or action); (2) the position of the judged event within the trial (1rst or second); and (3) the type of context (causal or sequential). The mapping of the various conditions onto the cells of this factorial design is given in Table 13.2. For example, the judgement of a stimulus in an SRT task, as previously performed in Experiment 1, is now represented as judgement of a stimulus in the 1rst position of a causal context. Judging the time of the second of two successive actions is now represented as judging an action in position 2 of a sequential context.
13.7.1 Results and discussion The mean perceptual shifts were subjected to a repeated measures ANOVA using the factorial arrangement described above. This revealed a nearly signi1cant main effect of context (F(1, 15) = 4.214, p = 0.058), due to perceptual shifts being more anticipatory for causal contexts (mean = – 12 ms) than for sequential contexts (mean = 8 ms). The main effect of position also showed a trend toward signi1cance (F(1, 15) =3.575, p = 0.078), due to an overall attraction effect. The 1rst event in each pairing showed a perceptual shift towards delay (mean = 10 ms) while the second event in each pairing showed a perceptual shift towards anticipation (mean = – 15 ms). These results are consistent with a weak overall tendency towards mutual attraction between the events studied. Most importantly, however, a highly signi1cant interaction between the type of context and the position of the judged event was found (F (1, 15) = 18.330, p = 0.001). This interaction can be seen clearly in Fig. 13.4, which shows that the attraction effect is marked for the causal contexts, but is absent, and indeed is replaced by repulsion, for the sequential context. Finally, this effect did not interact with the event judged. Both stimuli and actions showed attraction effects in causal contexts. Neither stimuli nor actions showed attraction effects in sequential context.
Fig. 13.4 Perceptual shifts in Experiment 2.
279
aapc13.fm Page 280 Wednesday, December 5, 2001 10:03 AM
280
Common mechanisms in perception and action
A planned comparison between the unsigned attraction effects for causes and effects showed the same direction as Experiment 1, but failed to achieve statistical signi1cance for Experiment 2. Thus, while Experiment 2 replicates the 1nding of signi1cant attraction effects, the stronger assertion that these apply asymmetrically to causes and effects was not supported, and requires further investigation. We performed an additional analysis of the reaction times in the SRTs (conditions 3, 4) of Experiment 2, and of the interkey press interval in the action sequences (conditions 9, 10). The mean interval was 254 ms, very close to the operant and interstimulus intervals of 250 ms, and there were no signi1cant differences in the value of this interval across conditions. Therefore differences between conditions in the perceived time of events cannot arise from differences in the actual time of events.
13.8 General discussion We begin by briefly restating our results. When a stimulus elicits an action (SRT), or when an action produces a stimulus (operance), the perceived time of the events shifts as a result of the sensorimotor context in which they occur. These shifts represent attraction effects between the percepts of stimuli and of movements. They are consistent with an efferent binding process linking conscious representations of stimulus events and of actions. The effects were comparable across stimulus and action percepts, and across SRT and operant tasks. Therefore, the underlying binding process appears to be quite general. Experiment 2 further showed that the effects occur only when stimuli and actions are linked in a causal context, and do not occur in mere repetitive sequences of stimuli and actions. Thus our effects can be attributed to a speci1c process governing conscious representation of interactions between the subject and the external world, rather than a general process of time or sequence perception. Second, we take our results as an interesting validation of the method of Libet et al. (1983). This method has previously been criticized quite harshly, as discussed in the Introduction. Our Experiment 2 replicated the effects seen in Experiment 1 in a different laboratory and a different population of subjects. Moreover, the numerical values obtained for perceptual shifts in the two experiments are quite close. We have emphasized above that single numerical estimates in the Libet task are not informative. However, the present study suggests that differences between Libet estimates may be replicable, and may provide a useful method for studying perceived time of events. The Libet method has a strong advantage over other psychophysical timing methods such as the Temporal Order Judgement (Sternberg and Knoll 1973). Temporal Order Judgement requires presenting two events, and varying the interstimulus interval to 1nd the value at which the two onsets appear simultaneous. In intentional action, the event to be judged is internally generated by the subject. Therefore, the experimenter would not reliably be able to time a reference stimulus to occur just before it (though just after is of course possible). In the Libet task, in contrast, the timing reference is constantly present, in the form of the rotating clock.
13.9 Artefactual explanations of attraction effects The Libet method has been criticized many times. In particular, it involves cross-modal matching, and the numerical estimates it yields will depend on the division of attention between the clock and
aapc13.fm Page 281 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
the judged event, due to the prior entry phenomenon (Sternberg and Knoll 1973). These criticisms have been considered in detail elsewhere (Haggard 1999). Such general criticisms are mitigated by using differences between subjects’ estimates across conditions, as in the present experiment, and are not discussed further here. We now deal with speci1c criticisms of the method as applied to the context conditions studied here. Experiment 2 showed that attraction effects exist between stimuli and actions in causal contexts, such as SRT and operance, but are absent or replaced by repulsion when stimuli or actions merely form a sequence. This contrast rules out several possible artefactual interpretations of our results. First, our attraction effects cannot merely be examples of a general process of perceptual con2ation or perceptual centres. Morton, Marcus, and Frankish (1976) coined the term ‘perceptual centre’ or ‘P-centre’ to refer to the fact that the perceived onset of a speech sound is typically rather later than its physical onset. This suggests that judgements about discrete properties of a temporally extended stimulus are made with reference to an abstract point equivalent to the stimulus’ centre of perceptual gravity, at least in the case of speech stimuli. A P-centre hypothesis would predict equal attraction in sequence contexts, yet this was not observed in Experiment 2. Second, comparison with the sequence conditions shows that attraction effects are not simply due to predictability or expectancy. In the SS sequence conditions, for example, the 1rst beep perfectly predicted the occurrence and time of the second beep, yet the percept of the second beep was not attracted towards the 1rst. Third, the absence of attraction effects in the sequence conditions rules out any explanation based on refractory periods. Perceptual refractoriness might in principle mean that the percept of the second event was adjusted while the 1rst event was processed. However, classical accounts of refractoriness (see Jolicoeur, Oriet, and Tombu 2001) would predict a delay rather than an advance in the processing of the second event. This should produce a delay rather than an advance in the perceived time of the second event. In addition, any effect of refractoriness on the conscious awareness of the second event should apply equally to both causal and sequential context. Finally, there is one possible feature of our study which could produce an artefactual explanation of the attraction effect of an effect towards its cause. In causal contexts, the 1rst and second events occur in different perceptual streams, whereas in sequential contexts they occur in the same stream. Thus, in ASj and SAj conditions, subjects switch attention to a new perceptual stream for the purposes of judgement. A sceptic could argue that this switch is responsible for the attraction effect. We believe this sceptical explanation can be resisted for a number of reasons. First, this switching effect could simulate attraction of the second event by the 1rst, but cannot explain the attraction of the 1rst event by the second observed in our data for Experiment 2. Thus, the overall conclusion of efferent binding is not undermined. Second, switching attention is a time-consuming process, and should therefore produce a delay in the second percept, whereas our data show an advance. The sceptic would need to posit that subjective time is shortened so as to compensate for attention-switching delays, to explain our data. Third, in recent preliminary results (Haggard and Clark, unpublished data) we observed that the perceptual attraction characteristic of operant action is replaced by a repulsion effect if the ‘action’ is an involuntary muscle twitch produced by transcranial magnetic stimulation, rather than a voluntary contraction of the same muscles. In this last case, the modalities of the physical events are identical in both conditions, yet the percepts differ dramatically. Simple temporal effects of shifting attention cannot account for this pattern of results.
281
aapc13.fm Page 282 Wednesday, December 5, 2001 10:03 AM
282
Common mechanisms in perception and action
13.10 Generality of the present results Efferent binding should occur whenever we interact with our environment. Therefore, if the present results truly reflect a conscious consequence of the binding process, they should apply rather generally to a range of situations. First, there is a clear generality internal to the results presented here: the binding process applies equally to stimulus and action percepts, and applies equally to SRT and to operance. Therefore, we speculate that the effects reported here reflect the operation of a basic process constructing our awareness of interactions between the self and the world. Do the effects reported here also generalize to other tasks? While few studies have considered implications of efferent binding for awareness, we believe one study, in particular, provides evidence which converges with our own. Deubel, Irwin, and Schneider (1999) asked subjects to make either voluntary or relexive saccades, and to estimate their direction of gaze at the time of a brief test stimulus. They found that the perceived direction of gaze shifts up to 250 ms before the saccade itself. This single numerical value recalls Libet et al.’s (1983) observation that the perceived time of a manual action precedes the onset of muscle activity. More importantly, this anticipatory awareness is greatest for test stimuli presented at the location of the saccade target. We believe their results can be interpreted as an oculomotor efferent binding effect. In this case, the conscious representation of the saccade is bound with the visual consequence of the saccade, i.e. the visual stimulation at the target location. The association between a saccade and what is seen when the saccade 1nishes is much more direct than the arbitrary associations between keypresses and auditory signals we have studied here. The targetspeci1city of Deubel et al.’s effect is consistent with our view (see above) that conscious awareness reflects efferent binding of speci1c causal associations between our actions and their consequences. This study has reported effects of efferent binding on conscious representations, but it seems these effects may also generalize to performance. Ziessler and Nattkemper (2001) provide an elegant series of RT experiments, from which they conclude that ‘the planning of goal-directed actions always includes the anticipation of possible action effects’. They speculate that this could occur either by standard rules of associationist learning, or by a ‘presetting’ of the cognitive system. This ‘presetting’, in our view, is remarkably similar to the classical concepts of conscious intention and conscious volition. Ziessler and Nattkemper’s results can be interpreted as evidence for an efferent binding between nonconscious representations of action plans, of actions, and of their effects. We suggest that comparison of efferent binding processes in performance and in conscious awareness of the same events may prove a fruitful vein for future research. Finally, two directions of research are still required to ensure the generality of these results. First, we have studied a narrow range of inter-event intervals in largely arbitrary stimulus–action pairings. We plan to investigate how the compatibility between events and the time interval between events inluence efferent binding in future research. A second interesting aspect of generalization involves the extension of efferent binding to awareness of other people’s actions. Do actions of others show the same binding effects as my own, due to an inference about others’ intentions? If they do, this would suggest that both action production and action understanding activate a common conscious mechanism, perhaps paralleling similar nonconscious mechanisms underlying imitation performance (see Bekkering and Wohlschläger 2001).
13.11 Common coding Here we relate our results to the common-coding view (Prinz 1992). This asserts that a common form of mental representation exists for both external stimuli and our responses to them. The
aapc13.fm Page 283 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
common-coding view was developed to account for stimulus–response mapping tasks (see, e.g. Stoet and Hommel 2001; Ziessler and Nattkemper 2001), and has not been as extensively applied to intentional actions. However, we believe one aspect of our results is highly consistent with a common-coding view, while another is clearly not. First, we found that attraction effects apply interchangeably to stimulus and action representations. That is, both stimulus and motor representations are subject to a single efferent binding process. This represents a strong sense in which the brain operations underlying conscious awareness are no different for stimulus and for action codes. On the other hand, a central claim of the common-coding hypothesis has been that actions are represented in terms of their environmental effects (Prinz 1992). This is not borne out in our data. In Experiment 1, we found that operant actions were perceptually stable, whereas the percept of their auditory effects was labile. We believe this result is in the opposite direction from that predicted by the common-coding hypothesis.
13.12 Summary and conclusions Our work can be summarized with a model of efferent binding (see Fig. 13.5). The neural events underlying action are shown as a causal chain on the left-hand side of the model. Our interest has been in relating them to the conscious awareness of intention, action, and effect: these conscious representations are shown on the right-hand side. Our key point is that the neural events do not map one-to-one onto conscious states. That is, arrows connecting neural events to conscious events in the model are not all horizontal, but form a web. For example, previous studies have shown that our awareness of intention is not a simple consequence of neural preparation, as Libet et al. (1983) proposed. Rather, information about the speci1c movement to be performed contributes signi1cantly to the content of our conscious intentions (Haggard and Eimer 1999; Haggard and Magno 1999). Similarly, information about preparation of movement contributes to the percept of the movement itself (Haggard et al. 1999). Thus, the mappings between neural and conscious events are many-to-many, rather than one-to-one. We suggest that the efferent binding of conscious representation is an expression of this many-to-many mapping. It generates a strong association between representations of intention and representations of the actions they produce: this binding is conceptualized by the thin solid boxes within the conscious events section of the framework.
Fig. 13.5
A model of efferent binding. See text for discussion.
283
aapc13.fm Page 284 Wednesday, December 5, 2001 10:03 AM
284
Common mechanisms in perception and action
In this paper, we have studied the conscious representation of action and effect, rather than intention and action. Here, we have shown evidence for a second aspect of efferent binding, which associates actions with the stimuli that cause them, or that they cause. This further form of efferent binding between conscious representations equates to the lower solid box in the model of Fig. 13.5. In the context of the experimental paradigms used here, ‘causation’ involves a relation between one stimulus and one action. In future research we will test the speci1city of these binding effects, by investigating whether, for example, a stimulus which is not caused by the subject’s operant action is less subject to efferent binding than one that is so caused. In our introduction, we distinguished between a generative and a constructive role of consciousness in voluntary action. We have shown that a speci1c mental operation occurs during voluntary action, to pull together in time the conscious representations of the physical events that occur. That is, the conscious events are compressed in time relative to the neural events with which they are linked. We suggest that this temporal uni1cation of voluntary actions in a relatively restricted zone of conscious experience forms part of how the human mind constructs the strong association between intentions, actions, and consequences that underlie the self, and the sense of self-agency.
Acknowledgements This research was supported by MRC, BBSRC, DAAD, and MPG. We are grateful to Marisa Taylor-Clarke, Sam Clark, Rob van Beers, and Andy Wohlschläger for help and comments.
References Bekkering, H. and Wohlschläger, A. (2001). Action perception and imitation. This volume, Chapter 15. Blakemore, S.J., Frith, C.D., and Wolpert, D.M. (1999). Spatio-temporal prediction modulates the perception of self-produced stimuli. Journal of Cognitive Neuroscience, 11, 551–559. Breitmeyer, B. (1985). Problems with the psychophysics of intention. Behavioral and Brain Sciences, 8, 539. Bridgeman, B. (1985). Free will and the functions of consciousness. Behavioral and Brain Sciences, 8, 540. Della Sala, S., Marchetti, C., and Spinnler, H. (1991). Right-sided anarchic (alien) hand: a longitudinal study. Neuropsychologia, 29, 1113–1127. Deubel, H., Irwin, D.E., and Schneider, W.X. (1999). The subjective direction of gaze shifts long before the saccade. In W. Becker, H. Deubel, and T. Mergner (Eds.), Current oculomotor research: Physiological and psychological aspects. New York: Kluwer. Dickinson, A. (1980). Contemporary animal learning theory. Cambridge: CUP. Elsner, B. and Hommel, B. (in press). Effect anticipation and action control. Journal of Experimental Psychology: Human Perception and Performance. Engel, A.K., Fries, P., König, P., Brecht, M., and Singer, W. (1999). Temporal binding, binocular rivalry, and consciousness. Consciousness and Cognition, 8, 128–151. Frith, C. (1992). The cognitive neuropsychology of schizophrenia. Hove/Hillsdale: Erlbaum. Haggard, P. (1999). Perceived timing of self-initiated actions. In G. Aschersleben, T. Bachmann, and J. Müsseler (Eds.), Cognitive contributions to the perception of spatial and temporal events. Amsterdam: NorthHolland. Haggard, P. (in press). Conscious awareness of intention and of action. To appear in J. Roessler and N. Eilan (Eds.), Agency and self-awareness: Issues in philosophy and psychology. Oxford: Oxford University Press. Haggard, P. and Clark, S. (unpublished data). Voluntary action and conscious awareness. Manuscript in preparation.
aapc13.fm Page 285 Wednesday, December 5, 2001 10:03 AM
Action, binding, and awareness
Haggard, P. and Eimer, M. (1999). On the relation between brain potentials and the awareness of voluntary movements. Experimental Brain Research, 126, 128–133. Haggard, P. and Magno, E. (1999). Localising awareness of action with Transcranial Magnetic Stimulation. Experimental Brain Research, 127, 102–107. Haggard, P., Newman, C., and Magno, E. (1999). On the perceived time of voluntary action. British Journal of Psychology, 90, 291–303. Hazeltine, E. (2001). The representational nature of sequence learning: Evidence for goal-based codes. This volume, Chapter 33. Hume, D. (1750). Philosophical essays concerning human understanding. London: Millar. Jolicoeur, P., Oriet, C., and Tombu, M. (2001). From perception to action: Making the connection. This volume, Chapter 28. Kant, I. (1781/1963). Critique of pure reason. London: Macmillan. Kawato, M. and Wolpert, D.M. (1998). Internal models for motor control. In Novartis Foundation Symposium, 218, 291–304. Libet, B. (1985). Unconscious cerebral initiative and the role of conscious will in voluntary action. Behavioral and Brain Sciences, 8, 529–566. Libet, B., Gleason, C.A., Wright, E.W., and Pearl D.K. (1983). Time of conscious intention to act in relation to onset of cerebral activity (readiness-potential): The unconscious initiation of a freely voluntary act. Brain, 106, 623–642. Michotte, A. and Prum, E. (1910). Etude expérimentale sur le choix volontaire. Annales de Psychologie, 10, 194–279. Morton, J., Marcus, S.M., and Frankish, C.R. (1976). Perceptual centres (P-centres). Psychological Review, 83, 405–408. Prinz, W. (1992). Why don’t we perceive our brain states? European Journal of Cognitive Psychology, 4, 1–20. Rugg, M.D. (1985). Are the origins of any mental process available to introspection? Behavioral and Brain Sciences, 8, 552. Sherrington, C.S. (1898). Further note on the sensory nerves of muscles. Proceedings of the Royal Society, B62, 120–121. Sternberg, S. and Knoll, R.L. (1973). The perception of temporal order: Fundamental issues and a general model. In S. Kornblum (Ed.), Attention and Performance IV. New York: Academic Press. Stoet, G. and Hommel, B. (2001). Interaction between feature binding in perception and action. This volume, Chapter 26. von Holst, E. and Mittelstaedt, E. (1950). Das Reafferenz-Prinzip. Naturwissenschaft, 37, 464–476. Yeo, C.H., Lobo, D.H., and Baum, A. (1997). Acquisition of a new-latency conditioned nictitating membrane response—major, but not complete, dependence on the ipsilateral cerebellum. Learning and Memory, 3, 557–577. Ziessler, M. and Nattkemper, D. (2001). Effect anticipation in action planning. This volume, Chapter 32.
285
aapc13.fm Page 286 Wednesday, December 5, 2001 10:03 AM
This page intentionally left blank
aapc14.fm Page 287 Wednesday, December 5, 2001 10:03 AM
III Action perception and imitation
aapc14.fm Page 288 Wednesday, December 5, 2001 10:03 AM
This page intentionally left blank
aapc14.fm Page 289 Wednesday, December 5, 2001 10:03 AM
14 Processing mechanisms and neural structures involved in the recognition and production of actions Introduction to Section III Raffaella Ida Rumiati
14.1 Introduction Interest in the recognition and imitation of actions has grown considerably in the last ten years among neuroscientists, as testi1ed by the very ample review of related theories and empirical studies made available by Bekkering and Wohlschläger (this volume, Chapter 15). First, the authors stress the importance of imitation in the 1eld of social and developmental psychology. Second, they provide the reader with several conceptualizations of imitation and, by comparing the research on humans and non-human primates or on other species in this 1eld, they try to answer the question of who can and who cannot imitate. Finally, after discussing the most in2uential theories of imitation, Bekkering and Wohlschläger illustrate their own view on the mechanisms involved in imitation. In short, their theory of goal-directed imitation holds that, when humans imitate an action, for instance touching an ear, they map the goal of the action irrespective of the effector used for performing the movement (e.g. the left or right hand), and of the movement path (e.g. ipsi- or contralateral). In addition to that tutorial, the papers included in this section represent a substantial contribution to the understanding of both the neural structures (Gallese, Fadiga, Fogassi, and Rizzolatti; Jellema and Perrett) and of the processes underlying the ability to recognize and to reproduce actions (Castiello, Lusher, Mari, Edwards, and Humphreys; Shiffrar and Pinto).
14.2 Is there a common system for recognition and production of actions or are there two representations underlying these two processes? With regard to the studies on the neurophysiology of recognition and action, two views seem to be dominant in Gallese et al.’s and in Jellema and Perrett’s chapters. The one favored by the authors of the 1rst chapter holds that perception and action rely on a common system (Di Pellegrino, Fadiga, Fogassi, Gallese, and Rizzolatti 1992; Gallese, Fadiga, Fogassi and Rizzolatti 1996). Jellema and Perrett’s 1ndings, that most neurons in the upper bank of the superior temporal sulcus (STSa) of the monkeys’ brain code both form and space, do not 1t the original theory of the visual system
aapc14.fm Page 290 Wednesday, December 5, 2001 10:03 AM
290
Common mechanisms in perception and action
proposed by Ungerleider and Mishkin (1982). These authors proposed that the visual system is organized in two parallel processing pathways—the ventral stream, dealing with shape and identity of the stimulus (the ‘what’ stream), and the dorsal stream, coding the spatial characteristics of the stimulus (the ‘where’ stream). This dichotomy has been reformulated by Milner and Goodale (1995). In their revised model, the identity and the location of an object are processed by both pathways but for different purposes. Thus, the ventral stream serves visual recognition (‘what’), while the dorsal (‘how’) stream sustains the visual control of action. We shall now brie2y consider the chapter by Gallese et al. and that by Jellema and Perrett in turn. Rizzolatti and colleagues found that in the premotor area F5 of the monkey’s brain there are many bimodal neurons that discharge during a monkey’s goal-directed movements, and also when a monkey observes similar movements executed by another monkey (Di Pellegrino et al. 1992) or by an experimenter (Gallese et al. 1996). Given such properties, these cells were called ‘mirror neurons’. By means of a single-unit recording technique, Gallese et al. (this volume, Chapter 17) discovered that a considerable percentage of neurons in the anterior part of the inferior parietal lobule (area PF) of one macaque monkey’s brain had visual and motor properties. This means that, like the neurons previously described in F5, several PF neurons 1red when the monkey observed other individuals performing actions such as grasping and reaching, as well as when the monkey itself performed the same movements. Besides the discovery of mirror neurons in area PF, there are two more 1ndings in Gallese et al.’s study that deserve attention. First, a subset of the mirror neurons in area PF matched observed hand actions (input) to mouth responses of the monkey (output). As an alternative to the ontogenetic interpretation provided by the authors, I propose that this particular type of response may be explained in terms of the goal-directed theory of imitation proposed by Bekkering, Wohlschläger, and Gattis (2000; for an extensive presentation of this theory see also Bekkering and Wohlschläger, this volume, Chapter 15). According to Bekkering and colleagues’ view, when a subject imitates an observed action s/he reproduces the goal of the action (e.g. reaching for the ear/object), whereas the means are ignored (e.g. the effector employed or the trajectory of the reaching movement). The second interesting 1nding is that the hand performing the observed action in2uenced the discharge intensity. In particular, the monkey studied by Gallese et al. showed a preference for the experimenter’s left hand. Although this observation was made on a single animal, recording only from its left hemisphere, there is some ground for speculating that the mirror system may be lateralized. The mirror system observed in monkeys seems to exist in humans, too, as argued both in a study by Fadiga, Fogassi, Pavesi, and Rizzolatti (1995), who used Transcranial Magnetic Stimulation (see Gallese et al., this volume, Chapter 17), and in several brain-imaging studies. The tasks employed in these studies were quite different from each other. Participants were required to observe a model grasping three-dimensional objects (Grafton, Arbib, Fadiga, and Rizzolatti 1996; Rizzolatti, Fadiga, Matelli, Bettinardi, Perani, and Fazio 1996b), or to manipulate simple objects (Binkofsky et al. 1999). In the study carried out by Iacoboni et al. (1999), the observation and the imitation of 1nger movements were employed and brain activation patterns similar to those observed in other studies were obtained. The common neural network activated both during action recognition and during action production includes sectors of the Broca’s area in the premotor cortex (Brodmann areas 44– 45, corresponding to the F5 area in monkeys) and the inferior parietal lobule (Brodmann area 40, corresponding to PF area in monkeys). In a PET study, Decety et al. (1997) contrasted two cognitive strategies—the observation of actions for later recognition vs. for later imitation—and two types of stimuli, meaningful vs. meaningless
aapc14.fm Page 291 Wednesday, December 5, 2001 10:03 AM
Processing mechanisms and neural structures involved in the recognition and production of actions
actions. Irrespective of the type of actions, the observation of actions for later recognition enhanced the activity in the ventral pathway bilaterally, whereas the observation for later imitation led to the activation of the dorsal pathway, bilaterally as well. The authors proposed that while the ventral structures sustain the semantic object processing and the recognition of actions (see also Rizzolatti et al. 1996b), the dorsal pathway could be necessary for generating visuomotor transformations. In sum, the neural circuit supporting both recognition and production of actions documented in the studies reviewed above consists of premotor, parietal, and temporal structures. Ultimately, the mirror system has also been attributed a social function in that it allows an individual to understand the actions of other individuals (Gallese and Goldman 1998). Similar to the PF mirror-like neurons (i.e. neurons with visual but devoid of motor properties) are those found in STSa of the macaque monkey where Jellema and Perrett (this volume, Chapter 18) recorded. The most interesting information reviewed in their paper is that STSa neurons enable the viewer to understand social signals. One kind of social signal coded by STSa cells is where another animal is directing its attention. This information is extracted from the visual cues characterizing the face and the body of the agent, as well as from their movements. In other words, the neurons in this region of the monkey’s brain seem to signal where in the environment someone else is looking. How can this be achieved? To start with, the cells build up discrete descriptions of seen eyes, head, and body; subsequently the outputs of these lower-level descriptions are integrated into a hierarchical scheme to form a somewhat more conceptually abstract tuning. Interestingly, preliminary data reported by the authors suggest that the spatial coding may be widespread in the temporal lobe, a suggestion which is at variance with what was originally predicted by the strict ‘what–where’ dichotomy put forward by Ungerleider and Mishkin (1982). According to these authors, the spatial information should be processed by the dorsal stream, whereas the region where Perrett and colleagues record, the STS, lies between the dorsal and ventral streams, with its anterior sections belonging anatomically to the ventral stream. Therefore Jellema and Perrett (this volume, Chapter 18) argue that the assignment of the STS to either the dorsal or the ventral stream is pointless and suggest considering the functions of the cells instead.
14.3 Psychophysical evidence supporting the view of common structures subserving action recognition and production A variety of experimental paradigms have been employed to investigate the mechanisms underlying the performance of recognition and imitation of action tasks. An extensive review can be found in the chapters by Shiffrar and Pinto and by Castiello et al. in this volume. In particular, Shiffrar and Pinto provide us with psychophysical evidence supporting Rizzolatti and colleagues’ view (Di Pellegrino et al. 1992; Rizzolatti et al. 1996a) according to which both motor responses and recognition processes rely upon the same neural circuit. Shiffrar and Pinto argue that their 1ndings do not 1t the theory of Milner and Goodale (1995), whereby motor responses and recognition processes require different visual representations. In distinguishing human versus non-human movements, Shiffrar and Pinto propose that human movements are processed based on an internal model while the non-human ones are not. Thus, movements that are consistent with an observer’s internal model of possible movements are analyzed by mechanisms underlying action perception, whereas inconsistent movements may be analyzed by mechanisms underlying the perception of physical events. In addition, a PET study (Stevens, Fonlupt, Shiffrar, and Decety 2000) indicated a signi1cant bilateral activity in the
291
aapc14.fm Page 292 Wednesday, December 5, 2001 10:03 AM
292
Common mechanisms in perception and action
premotor cortex when the observers saw two-frame apparent motion sequences. However, when these same picture pairs were presented more rapidly, participants perceived impossible paths of human movements, and the selective activity in the premotor cortex was no longer observed. Related to Shiffrar and Pinto’s approach is that followed by Castiello et al. (this volume, Chapter 16). In four experiments the authors tested whether the observation of grasping movements performed by a human actor or by a robot equally primed motor responses of normal participants. They found that motor priming only occurred when participants are exposed to human grasping. Castiello et al. also studied three levels of priming. The 1rst simply re2ects whether the participant observed a human or a robot arm. The remaining two levels reveal that priming effects depend on model kinematics. Overall, these 1ndings are in agreement with what is known from neurophysiology and developmental psychology. Gallese et al. have (1996) clearly shown that the mirror neurons were speci1cally activated when the actions observed by the monkey involved the interaction between a hand of the agent and an object. However, they remained silent when, for instance, the agent used a tool (e.g. pliers) in order to grasp an object. Woodward (1998, 1999) demonstrated how small children attend preferably to an object that is grasped by a human but not when the object is grasped by a mechanical device. Overall this section of the book offers a competent, exhaustive overview of the theoretical and empirical issues related to the recognition and the production of actions in the context of imitation. The predictions and the discussions of the 1ndings contained in these papers are based on a shared psychophysical and neurophysiological knowledge of the research on actions. This is undoubtedly a successful example of an interdisciplinary approach to the study of a cognitive domain.
Acknowledgements I would like to thank Alessio Toraldo and Wolfgang Prinz for their useful comments. This article was supported by a Co1nanziamento MURST (2000–02) awarded to Tim Shallice and to the author.
References Bekkering, H., Wohlschläger, A., and Gattis, M. (2000). Imitation of gestures in children is goal-directed. Quarterly Journal of Experimental Psychology, 53A, 153–164. Binkofski, F., Buccino, G., Stephan, K.M., Rizzolatti, G., Seitz, R.J., and Freund, H.J. (1999). A parieto-premotor network for object manipulation: Evidence from neuroimaging. Experimental Brain Research, 128, 210– 213. Craighero, L., Fadiga, L., Rizzolatti, G., and Umiltà, C.A. (1998). Visuomotor priming. Visual Cognition, 5, 109–125. Decety, J., Grèzes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E., Grassi, F., and Fazio, F. (1997). Brain activity during observation of actions: In2uence of action content and subject’s strategy. Brain, 120, 1763–1777. Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180. Fadiga, L., Fogassi, L., Pavesi, G., and Rizzolatti, G. (1995). Motor facilitation during action observation: A magnetic stimulation study. Journal of Neurophysiology, 73, 2608–2611. Gallese, V. and Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Science, 2, 32–36. Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609.
aapc14.fm Page 293 Wednesday, December 5, 2001 10:03 AM
Processing mechanisms and neural structures involved in the recognition and production of actions
Grafton, S.T., Arbib, M.A., Fadiga, L., and Rizzolatti, G. (1996). Localization of grasp representations in humans by position emission tomography. Experimental Brain Research, 112, 103–111. Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C., and Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526–2528. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action. Oxford: Oxford University Press. Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996a). Premotor cortex and the recognition of motor action. Cognitive Brain Research, 3, 131–141. Rizzolatti, G., Fadiga, L., Matelli, M., Bettinardi, V., Perani, D., and Fazio, F. (1996b). Localization of cortical areas responsive to the observation of hand grasping movements in human: A PET study. Experimental Brain Research, 111, 246–252. Stevens, J., Fonlupt, P., Shiffrar, M., and Decety, J. (2000). New aspects of motion perception: Selective neural encoding for apparent human movements. Neuroreport, 11, 109–115. Ungerleider, L.G. and Mishkin, M. (1982). Two visual systems. In D.J. Ingle, M.A. Goodale, and R.J.W. Mans1eld (Eds.), Analysis of visual behaviour, pp. 549–586. Cambridge, MA: MIT Press. Woodward, A.L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69, 1–34. Woodward, A.L. (1999). Infants’ ability to distinguish between purposeful and nonpurposeful behaviours. Infant Behaviour and Development, 22(2), 145–165.
293
aapc15.fm Page 294 Wednesday, December 5, 2001 10:04 AM
15 Action perception and imitation: a tutorial Harold Bekkering and Andreas Wohlschläger Abstract. Currently, imitation, or performing an act after perceiving it, is in the focus of attention of researchers from many different disciplines. Although this tutorial attempts to provide some interdisciplinary background, it will concentrate on possible cognitive mechanisms that underlie imitation performance in human beings. First, the importance of imitation in the 1eld of social and developmental psychology will be stressed. Then, some important notions are introduced about what imitation is and who can and cannot imitate. In the second part of this tutorial, some of the currently most widely cited theories of imitation will be described. The third section gives an overview of the major 1ndings that have led to a new view on the mechanisms underlying imitation—the so-called goal-directed theory of imitation. The concluding remarks start with a discussion about the confusion of the goal concept and a taxonomical suggestion is made to capture the different meanings of the word ‘goal’. Finally, a functional model is proposed that describes the sensorimotor processes that transform action goals into the execution of movements, and vice versa, the transformation from movement perception into the recognition of action goals.
15.1 Introduction Imitation, or performing an act after perceiving it, is currently in the focus of attention of researchers from many different disciplines. Regarding human beings, there is little doubt about the fact that we to some extent do learn to interact with the surrounding world by means of imitation. An introduction to some in2uential theories on the importance of imitation in 1elds like social and developmental psychology will be offered in the section below. However, a more complex question is whether non-human primates or other species learn by imitation. Therefore, comparative research will be overviewed in the sections about what is imitation and who can imitate. In the second part of this tutorial, some of the currently most widely cited theories of imitation will be described. These ideas will be contrasted in the third section with our own recently developed view about the mechanisms underlying imitation, in the so-called goal-directed theory of imitation. For a more detailed description of the functional mechanisms involved in action perception, we should like to direct the reader’s attention to the chapters by Shiffrar and Pinto (Chapter 19) and by Castiello et al. (Chapter 16) in this volume. Neurophysiological support for the existence of separate mechanisms for action perception can be found in the chapter by Jellema and Perrett (Chapter 18). In addition, Gallese et al., Chapter 17, describe possible neurophysiological structures involved in action perception and action execution in the monkey brain. A nice aside to this chapter regarding possible human neurophysiological mechanisms in imitation is provided by a recent fMRI-study of Iacoboni et al. (1999). From the neuropsychological 1eld, many recent insights into the functional control of imitative behavior are derived from studies of patients with apraxia (see Leiguarda and Marsden 2000, for a recent review).
aapc15.fm Page 295 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
15.1.1 The in2uence of imitation research on social and developmental psychology By watching a model’s behavior, an observer can learn how to do something he or she did not know before. In fact, many cultural traditions, for instance the very complex Japanese tea ceremony, can only be explained by assuming imitative patterns as a way of inducting the child into adult ways. Providing models is not only a means of speeding up what might otherwise be a very tedious learning process. Rather, in some cases it is absolutely essential, for alternative-learning procedures would entail too high a risk. Learning to drive a car by trial and error would cost too much in terms of upended pedestrians (Bandura and Walters 1963). Aristotle recognized the importance of imitation, or learning to perform an act from a modeled example. He referred to imitative capacities as an important manifestation of intelligence: ‘Imitation is natural to man from childhood, one of his advantage over the lower animals being this, that he is the most imitative creature of the world, and learns at first by imitation’ (Aristotle, in McKeon 1941, p. 448b). Ever since then, imitation has been a major topic of interest, particularly within psychology. Within developmental psychology, Piaget’s ideas (e.g. 1975) are very in2uential to this day. In contrast to other views, he claimed that imitation cannot be seen as a mental activity of a lower order and has to be regarded as closely connected to the general cognitive development of children. He pointed out that the resulting growth in the capacity for imitation is a vital prerequisite for many further aspects of intellectual development including the acquisition of language (see below). During the 1rst few months infants are only capable of pseudo-imitation. If the father does something the baby did just a moment before (such as babbling) the baby is likely to resume this activity. This observation can be seen as an extension of the circular-reaction phenomenon—the sensory feedback from each behavior. For example, the child scratches an object and then gropes at it, scratches and gropes again, primes its own immediate recurrence. The difference is that in pseudo-imitation, the cycle is reactivated from the outside. In other words, the infant treats the father’s ‘dada’ as if it were a ‘dada’ of his or her own. In this view, imitation becomes more genuine with increasing age. From about four months on, infants can imitate actions they did not perform themselves just moments before, but they can only do so if their parent’s action leads to sights or sounds similar to those the infants encounter when they perform that action themselves. Examples are hitting the table or squeezing a pillow. What the infant sees when she watches her parents’ hands is similar to what she sees when her own hand goes through the same motions. Therefore, Piaget suggested that, at this age, imitation is restricted to movements that are visible to the imitator (however, see the Meltzoff studies described below for some striking contra-evidence). For movements like sticking out the tongue, the sensory–motor schemas need to be well developed to enable a correspondence of the organ end states of the infant’s own body to that of the other (only possible from nine months on). In general, Piaget makes the notion that imitation can only take place if the observer has a welldeveloped comprehension of what the model is doing, together with a schema that allows the translation of a desired perceptual outcome into motor patterns that bring it about, a process that starts in early development but continues into adulthood. Imitation is only possible when the appropriate schemas are already formed. Another very in2uential thought on the importance of imitation derives from social psychology. First, social learning theorists pointed out that imitative learning is not a species of instrumental conditioning. In particular, examples of imitation where the observer does not copy the model’s actions at the time he sees them (learning without performance) and where he imitates even though
295
aapc15.fm Page 296 Wednesday, December 5, 2001 10:04 AM
296
Common mechanisms in perception and action
he neither receives a reward himself nor sees the model receive one (learning without reinforcement, see also Gleitman 1981, p. 499) argue against typical instrumental-conditioning explanations. Second, a number of experiments have been conducted which show that the performance of an observed act depends in part upon the characteristics of the model. Not surprisingly, subjects are more likely to imitate people they like, respect, and regard as competent (e.g. Lefkowitz, Blake, and Mouton 1955). Finally, performance of an observed act will also depend upon the consequences that seem to befall the model. In an in2uential study, several groups of nursery-school children were shown a film that featured an adult and a large plastic ‘Bobo-doll’. The adult walked over to the doll and ordered it to get out of the way. When the doll did not comply, the adult punched and kicked it around the room, stressing her attacks with phrases such as ‘Right on the nose, boom, boom’. One group of subjects saw the film up to this point but no further. Another group saw a 1nal segment in which this behavior was shown to come to a bad end. A second adult arrived on the scene, called the aggressor a big, bad bully, spanked her, and threatened further spankings if she should ever do such bad things again. After seeing the films, all children were brought into a room that contained various toys, including a Bobo-doll. They were left alone, but filmed by a video camera. The children who never saw the bad ending of the film, imitated the model’s aggressive acts with the Bobo-doll. In contrast, the children who had seen the model’s punishment behaved much more pacifically (Bandura and Mischel 1965).
15.1.2 What is imitation? However, many issues about imitation have not yet been answered. For instance, how can imitation been answered be best de1ned, or to put it simply: ‘What is imitation?’ Also, is imitation restricted to humans, to primates, to mammals, to birds—in other words: ‘Who can imitate?’ The notion of ‘imitation’ has probably been under debate for as long as the concept has existed. Thorndike’s (1898) pragmatic de1nition of ‘learning to do an act from seeing it done’ focused on the key role observation plays in imitation without specifying any details about which aspects of the model are imitated or how imitation is achieved by the imitator. Somewhat later, Spence (1937) pointed out that what people typically refer to as imitation might be stimulus enhancement. That is, seeing some act done in a particular place, or with some particular object, has the effect of increasing the observer’s probability of going to that place or interacting with that object. In a more powerful formulation, this tendency would be specifc to cases where the conspeci1c obtains valued rewards by its action. As a consequence, this narrowing of attention ensures that the individual’s subsequent behavior becomes concentrated on the key variables of the action, and it is likely that many observations formerly considered to be imitation are thereby explained away. The view that stimuli can be seen as reinforcers for social learning has more recently also been called observational conditioning (Mineka, Cook, and Keir 1984). More recently, Tomasello (1990) introduced the concept of emulation. Whereas stimulus enhancement changes the salience of certain stimuli in the environment, emulation changes the salience of certain goals. In emulation (Köhler 1976; Tomasello 1990), the purpose of the goal towards which the demonstrator is striving is made overt as a result of his actions, and so becomes a goal for the observer, too. The observer attempts to ‘reproduce the completed goal . . . by whatever means it may devise’ (Tomasello 1990, p. 284). Lately, the de1nition of emulation has been broadened and refers to the possibility of learning about the physical situation as an indirect consequence of another’s behavior. In other words, many things may be learned from what happens to objects in the
aapc15.fm Page 297 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
environment as a result of an individual’s actions, quite apart from learning the actions themselves, for instance, the strength, brittleness, weight, what an object is made of or contains, and so forth. In emulation, actions of equivalent ultimate effect on the environment are replicated without the particular nuances or techniques of the behavior copied (see also Call and Carpenter, in press). Thus, in emulation the observer learns something about the environment, but nothing about the behavior of the model directly. Stimulus enhancement and emulation both refer to environmental learning processes, which might even take place without a model being present, as in the case of an apple falling from a tree. An experimental technique often used to discover whether the action performed depends on, or copies, the action observed is to compare the probability of the occurrence of an observed action in relationship to baseline. Imitation is operationally de1ned as a signifcant elevation in the frequency of an observed action over the normal probability of its occurrence (e.g. Byrne and Russon 1998). Important improvements to this basic technique have been developed. In animal research, for instance, two groups of animals are typically used, each seeing the same problem solved by a conspeci1c (the demonstrator) but in different ways. Importantly, the groups can then be compared in the frequencies of performing each technique. Imitation is subsequently de1ned as a signifcant frequency divergence between the groups in frequencies of using the two observed actions (see also Whiten 1998; Whiten and Ham 1992). In developmental psychology, typically an adult in front of an infant repeatedly performs one of a set of several different target gestures. Imitation is then de1ned as the selective increase in the frequency of making one gesture, and not the other gesture, compared with baseline performance. There now is considerable support among developmental and comparative psychologists that by this criterion the ability to imitate has been detected. For example, signi1cantly more infant tongue protrusion after observing adult tongue protrusion than after observing adult mouth opening has been reported and vice versa (Meltzoff and Moore 1977, 1983). Some researchers (e.g. Byrne and Russon 1998; Bekkering and Prinz, in press), however, have argued that it is hard to decide whether an increase in the performance of a specific action is a consequence of the stimulus events generated by a conspeci1c’s action—and should therefore not be called imitation—or whether it is a consequence of observing a conspeci1c’s action—and can be called imitation. An alternative explanation for the latter kind of imitation could be response facilitation. Therefore, Byrne and Russon (1998) bring into account the concept of novelty. However, de1ning novelty as a necessary condition for imitation would require full access to the behavioral history of the imitator. More important, the concept of novelty ignores the view of most (developmental) psychologists that new information is constantly being integrated into pre-existing programs to enable the animal to become adapted to new circumstances. The new information does not replace old information; rather, it is intimately woven into programs by a mutual process of assimilation and accommodation (e.g. Piaget 1975). Assimilation refers to the process of incorporating an internal operative act through which environmental data are acquired. Accommodation is the outwardly directed process of adapting an overall structure to a speci1c case. In this sense, it always includes an element of novelty, but it is an already present structure that becomes differentiated through observational learning. From this perspective, it is difficult to see how the definition of novelty could be reconciled with such a dynamic and mutual process of learning (see also Huber 1998). In summary, several conceptualizations of imitation have been formulated, each of them explaining how certain aspects of social learning can occur, without necessarily involving imitation. Our conclusion about what imitation might be will be delayed to the conclusion section.
297
aapc15.fm Page 298 Wednesday, December 5, 2001 10:04 AM
298
Common mechanisms in perception and action
15.1.3 Who can imitate? In his famous monograph, Thorndike (1898) presented the results of experiments with cats, dogs, and chicks that needed to escape from a box to get food by manipulating some simple mechanism (e.g. by pulling down a loop of wire, depressing a lever, or turning a button). No evidence whatsoever had been found that these animals could have learnt these acts from seeing their conspeci1c’s doing them. Thorndike’s behaviorist conclusion therefore was that the responses are ‘formed gradually by the stamping in of successes due to trial and error, and are not able to be formed by imitation or by one’s being put through them’ (1898, p. 553). This led Thorndike to state that associations in animals are not homologous with anything in the human association mechanisms. Comparative psychologists seem to differ fundamentally in their opinions on the question of whether non-human primates can imitate or not. In an in2uential paper, Byrne and Russon (1998) present a considerable amount of data to stress the point that gorillas, orangutans, and chimpanzees use a goal hierarchy when, for instance, eating different kinds of leaves. Is this imitation? Great apes, as for instance gorillas, from the 1rst days of life de1nitely know from their mother what parts of a plant are edible. In addition, the stimulus enhancement would tend to focus a young gorilla’s attention on the growing plant as a potential object to investigate. The interesting question is how the young gorilla 1rst acquires the elaborate sequence of coordinated actions that converts, say, nettle plants to edible mouthfuls. According to Byrne and Russon, the key concept is novelty, which lies in the arrangement of the acts. Thus the skill learnt by imitation is to arrange some basic repertoire of actions into novel and complex patterns rather than to learn new basic actions. The evidence they present suggests that young gorillas have functional control over hierarchical structures of actions comparable to their adult models. In some case studies of orangutans (e.g. Russon 1996; Russon and Galdikas 1993), they found imitation behavior of complex object–object relations. For instance, orangutans were observed stealing soap and laundry by canoe, weeding paths, or (unsuccessfully) trying to light a 1re. However, Tomasello, Savage-Rumbaugh, and Kruger (1993b) observed that mother-reared chimpanzees are much poorer at imitatively learning novel actions on objects than enculturated chimpanzees. Thus, they conclude that the enculturation process is crucial in observing imitation instead of emulation (for more details, see the theoretical section below). In another study, Nagell, Olguin, and Tomasello (1993) reported two studies in which chimpanzees and young children observed a human demonstrator using a tool that resembled a rake in order to retrieve an out-of-reach reward (food or a toy for chimpanzees and children, respectively). The demonstrator began with the rake in one of two positions, with either the teeth or the crossbar down. When the rake began in a teeth-down position, the experimenter 2ipped the rake so that the crossbar was down, and then used the crossbar to drag the object within reach. When the rake began in a crossbar-down position, the experimenter simply dragged the object within reach, again using the crossbar of the rake. A similar rake was provided for human and chimpanzee observers, always resting in a teeth-down position. The question of interest was not simply whether children and chimpanzees used the rake to obtain the reward, but how they used it, and whether the way they used it was in2uenced by the demonstrator’s behavior. Children who observed the demonstrator 2ipping the rake to a crossbar-down position before beginning to pull, were more likely to do the same and to use the crossbar-down rake to drag the object within reach, as compared with children who had not observed the 2ipping action. Children who observed the demonstrator pulling but not 2ipping used the rake in the teeth-down position and simply pulled. In contrast, chimpanzees 2ipped and pulled or pulled only with equal likelihood in both
aapc15.fm Page 299 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
observer conditions. In other words, while both children and chimpanzees learned by observation to use the tool to obtain the reward, the demonstrator’s behavior in2uenced the behavioral strategy employed by children but not that of the chimpanzees. Nagell et al. (1993) concluded that chimpanzees attended to the end result of the task (obtaining a reward) and to the functional relations involved in the task (obtaining the reward by using the rake) but failed to adopt the strategy used by the human model. Tomasello and colleagues named such behavior ‘emulation learning’ (e.g. Tomasello and Call 1997). Most scientists seem to agree that the behavior Tomasello and colleagues call emulation, such as the behaviors of the chimpanzees above, does not count as imitation. The imitation capacity of songbirds, on the other hand, is well established. The most widely accepted hypothesis of vocal imitation in birds states that vocal learning involves two steps: (1) an auditory memory is laid down, and then (2) vocal output is modi1ed until the auditory feedback it generates matches the model. It is also known that the pathways involved in song production respond to sound, an observation that blurs the demarcation between what an auditory and what a motor circuit is (see, for a review, Nottebohm 1991). For instance, in a well-controlled study, it was found that male zebra 1nches (Taeniopygia guttata) master the imitation of a song model 80 to 90 days after hatching and retain it with little change for the rest of their lives (e.g. Lombardino and Nottebohm 2000). Interestingly, a juvenile male zebra 1nch, kept singly with its father, develops a fairly complete imitation of the father’s song. The imitation is less complete when other male siblings are present, possibly because, as imitation commences, model abundance increases (Tchernichovski Lints, Mitra, and Nottebohm 1999). Recently, Doupe and Kuhl (1999) have argued that there are numerous parallels between human speech and birdsong. Both humans and songbirds learn their complex vocalizations early in life, exhibiting a strong dependence on hearing the adults they will imitate as well as themselves as they practice, and both show a waning of this dependence as they mature. Innate predispositions for perceiving and learning the correct sounds exist in both groups, although more evidence of innate descriptions of species-speci1c signals exists in songbirds, where numerous species of vocal learners have been compared. Humans also share with songbirds an early phase of learning that is primarily perceptual, and then serves to guide later vocal production. Both humans and songbirds have evolved a complex hierarchy of specialized forebrain areas in which motor and auditory centers interact closely, and which control the lower vocal motor areas also found in non-learners. In both these vocal learners, however, the way auditory feedback is processed during vocalization in these brain areas is surprisingly unclear. Finally, humans and songbirds have similar critical periods for vocal learning, with a much greater ability to learn early in life. In both groups, the capacity for late vocal learning may be decreased by the act of learning itself as well as by biological factors, such as the hormones of puberty. Although some features of birdsong and speech are clearly not analogous, such as the capacity of language for meaning, abstraction, and 2exible associations, there are striking similarities in how sensory experience is internalized and used to shape vocal outputs, and how learning is enhanced during a critical period of development. Empirical evidence for vocal imitation in infants derives from a study by Kuhl and Meltzoff (1996). They examined developmental changes in infants’ vocalizations in response to adults’ vowels at 12, 16, and 20 weeks of age. Vocal imitation was documented: Infants listening to a particular vowel produced vocalizations resembling that vowel. Another piece of evidence of vocal imitation derives from a study by Poulson, Kymissis, Reeve, Andreators, and Reeve (1991). Three infants, aged 9 to 13 months, and their parents participated in 2 to 4 experimental sessions per week for 2 to 4 months. During each 20-min session, the parent presented vocal models for the infant to imitate. During the model-alone condition, no social praise
299
aapc15.fm Page 300 Wednesday, December 5, 2001 10:04 AM
300
Common mechanisms in perception and action
was programmed for infant imitation. During the model-and-praise condition, the parents for infant imitation provided social praise on training trials, but not on probe trials. All three infants showed systematic increases in matching during training trials following the introduction of the model-andpraise condition. Although matching during probe trials was not directly reinforced, probe-trial responding increased systematically with training-trial responding. Furthermore, non-matching infant vocalizations did not increase systematically with the introduction of the model-and-praise procedure. Together, these 1ndings provide a demonstration of generalized vocal imitation in infants. Nevertheless, a critical note on vocal imitation derives from the study of Siegel, Cooper, Morgan, and Brenneise-Sarshad (1990). Children between 9 and 12 months of age were studied to determine if they would spontaneously imitate either the average fundamental frequency or the fundamental frequency contour of their speaking partners. In the 1rst experiment, children were recorded at home as they interacted with their fathers and mothers. Acoustic analyses failed to reveal any tendency on the part of the infants to adjust vocal pitch, amplitude, or duration to those of their speaking partners. In a second experiment, children were recorded while interacting with their parents in a laboratory setting. Again, there were no indications that the children imitated the vocal patterns of their speaking parents. Can infants imitate beyond speech? The studies of Meltzoff and Moore (see the 1997 paper for a recent overview) suggest that neonates can imitate facial gestures, such as tongue protrusion, which they cannot see themselves perform. In the original study (Meltzoff and Moore 1977), mouth-opening and tongue-protrusion gestures were shown to 3-week-old infants while they were engaged in the competing motor activity of sucking on a paci1er. The adult terminated the gestural demonstration, assumed a neutral face, and only then removed the paci1er. Three-week-old infants differentially performed both gestures despite the fact that the adult was no longer showing them. If one contemplates seriously the hypothesis that imitation is not just a certain kind of stimulus enhancement or response facilitation, but rather re2ects a behavioral acquisition through observation, a valid argument would be to show that deferred imitation can span a long delay of at least several hours or even days. In the 1994 study by Meltzoff and Moore, 6-week-old infants saw a person performing a speci1c gesture on Day 1, and then after a 24-hour retention interval they saw the same adult in a neutral pose. Different groups of infants saw different gestures on day 1, and they all saw the same neutral pose on day 2. Strikingly, the infants differentially performed the gestures they saw the day before. In another study, Meltzoff (1988) investigated imitation performance after a oneweek delay in 14-month-old children on six object-oriented actions. One of the six actions was a novel behavior that had a zero probability of occurrence in spontaneous play. In the imitation condition, infants observed the demonstration but were not allowed to touch the objects to prevent them attempting any immediate imitation. The results showed that infants in the imitation conditions produced signi1cantly more of the target actions than infants in control groups, who were not exposed to the modeling. Interestingly, there was also a strong evidence for the imitation of the novel act. Together, these studies were path-breaking for the notion that young infants are already able to learn and memorize a behavior through observation. Can young infants imitate novel actions? In the Meltzoff and Moore (1994) paper they also presented the 6-week-old infants with novel gestures such as tongue-protrusion-to-the-side (however, for a critical note on imitation in newborns, see Anisfeld 1996). They observed that the infants initially performed a small tongue movement with no lateral component. However, after some trials, through a kind of correction process, the baby’s tongue matches the organ end state of the adult’s tongue and thus results in a novel behavior that was not initially present.
aapc15.fm Page 301 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
15.2 Theories of imitation In the following section, three in2uential theories on imitation will be addressed: (1) The active intermodal mapping (AIM) theory by Meltzoff and Moore; (2) the social enculturated theory of imitation by Tomasello; and (3) the program and action level theory of imitation by Byrne and Russon.
15.2.1 The active intermodal mapping (AIM) theory of imitation The key claim of the active intermodal mapping theory of Meltzoff and Moore (for a recent elaborated version, see Meltzoff and Moore 1997) is that imitation is a matching-to-target process. The model is based on a series of experiments in which newborns seeing particular facial gestures are able to produce matching motor output, or in other words an intermodal mapping of visual input onto proprioceptive output. The active nature of the matching process is captured by the proprioceptive feedback loop. The loop allows infants’ motor performance to be evaluated against the seen target and serves as a basis for correction. The major components of the 1997 version of the model are (a) the perceptual system’s functions that provide the perception of the infant’s own body and the external world, (b) the supramodal representational system that allows comparison between the organ relations of an external target and the current position of the infant’s own body, and (c) the action system that executes a goal-directed act as long as a mismatch between the organ relations of the perceived model and the infant’s self exists. The organ relations provide the common framework in which the acts of the self and the other are registered. ‘Tongue-to-lips’, which is an organ relation, would be a description that cuts across modality of perception and could describe both the target and the self. A match indicates that the motor act seen and the motor act done are equivalent. This recognition of the equivalence of acts is speculated to be the starting ground for infants’ apprehension that the other is, in some primitive sense, ‘like me’ (Meltzoff and Moore 1998). The goal-directed component was added to the model after 1nding the correction process in the tongueto-the-side experiment described above.
15.2.2 The social enculturated theory of imitation Tomasello, Kruger, and Ratner (1993a) identify three strict criteria to specify imitational learning: (a) the imitated behavior should be novel for the imitator, (b) it should reproduce the behavioral strategies of the model, and (c) it should share the goal. Behaviors not satisfying these criteria are not considered true imitation. Although quite some evidence has been reported about imitation or at least action mimicking in infants and for instance songbirds, Tomasello et al. (1993b) exclude the possibility that wild animals display true imitative behavior (see also Whiten 1998, for a critical note on non-human primate imitation). For instance, non-human primates who have received no special treatment from humans do not seem to imitate novel actions on objects (see Section 15.1.3). On the contrary, enculturated chimpanzees do seem to have the ability to imitate. Tomasello et al. (1993b) propose that what is developing on chimpanzees as a result of their enculturation are not imitative abilities per se, but rather more fundamental cognitive skills. In their view, the most important ones are social interactions in which there is a joint focus of attention on some third entity such as an object (Savage-Rumbaugh 1990). Human enculturators encourage and structure such interactions in a way that adult chimpanzees in their natural environment do not. This scaffolding and intentional
301
aapc15.fm Page 302 Wednesday, December 5, 2001 10:04 AM
302
Common mechanisms in perception and action
instruction serves to ‘socialize the attention’ (Tomasello et al. 1993b, p. 1702) of the chimpanzee much in the same way that human children have their attention socialized by adults (Vygotsky 1978). They also argue that these broadly de1ned skills of social cognition might be a prerequisite for the acquisition of language (see also Tomasello 1992). In the typical environment of their species, young chimpanzees have very little chance to interact with others around objects, and when they do, the other chimpanzee does not attempt to direct their attention or to teach them a behavior. In human-like cultural environments, young chimpanzees must adapt to the complex triadic interactions around objects that constitute the majority of social interactions. As a result, speci1c social–cognitive abilities might emerge. Two alternatives theories need to be considered though. First, the mother-reared chimpanzees probably had less experience with human artifacts than the enculturated chimpanzees. However, this is experimentally rendered unlikely by the observation that without demonstrations, that is, in the free play and teaching trials, the mother-raised chimpanzees performed the target actions as often as the enculturated chimpanzees. Second, the understanding of what the chimpanzees were supposed to do in the experimental session needs to be looked at. The typical ‘Do what I do’ format was adopted for this study (for a discussion, see Whiten and Ham 1992). However, it is possible that the mother-reared chimpanzees still did not understand in the same way as the other chimpanzees what they were supposed to do.
15.2.3 Program-level and action-level imitation In an in2uential paper on imitation, Byrne and Russon (1998) advocate the view that voluntary behavior is organized hierarchically and, therefore, imitation can occur at various levels. A clear distinction can be made between the ‘action level’, a rather detailed and linear speci1cation of sequential acts, and the ‘program level’, a broader description of subroutine structure and the hierarchical layout of a behavioral ‘program’. At the bottom level of the hierarchy, the precise details of the manual actions and individual uses are probably learned without imitation. That is, each animal was found to have a different preferred set of functionally equivalent variants (Byrne and Byrne 1993), presumably a sign of trial-and-error learning. At a slightly coarser level, for instance the pattern of hand preferences (which is very strong in every animal), no evidence for imitation has been found either. That is, the hand preference of an offspring correlates neither with that of the mother nor the silverback male. Yet, when the overall form of the process was investigated, the order of action sequences was rather 1xed. And although environmental constraints can partially explain this 1xed pattern, learning by individual experience alone seems to be highly unlikely. Program-level imitation is de1ned as copying the structural organization of a complex process (including the sequence of stages, subroutine structure, and bimanual coordination), by observation of the behavior of another individual, while the implementation at the action level might arise from individual learning. Imitation at the program level, then, would consist of copying from a model’s action a novel arrangement of elements that already exist in the repertoire. Thus to imitate, the individual must have a mental apparatus that allows to assemble hierarchical frameworks, to organize the goal structure, and to maintain the goal structure while its detailed enactment is built. Byrne and Russon (1998) suggest that the everyday use of imitation is closer to the program level than to the action level. Interestingly, this notion is in sharp contrast to other notions which have stressed that ‘true imitation’ is only evident when a novel act is added as an unmodi1ed whole to an individual’s motor repertoire (Tomasello 1990; Whiten and Ham 1992).
aapc15.fm Page 303 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
15.3 Goal-directed imitation Recently, a new view on the representation that mediates perception and action in imitation has been proposed (Bekkering, Wohlschläger, and Gattis 2000; Gattis, Bekkering, and Wohlschläger, in press; Wohlschläger, Gattis, and Bekkering, submitted). This view postulates: 1. That behaviors are not simply replicated as uni1ed, non-decomposed motor patterns. Rather, imitation involves 1rst a cognitive decomposition of the motor patterns into their constituent components, and second a reconstruction of the action pattern from these components. 2. The decomposition–reconstruction process is guided by an interpretation of the motor pattern as a goal-directed behavior. Thus, the constituent elements in the mediating representation involve goals rather than motor segments. 3. It is assumed that these goals are organized hierarchically, with some of the encoded goals being dominant over others. The hierarchy of goal aspects follows the functionality of actions. Ends (objects and treatments) are more important than means (effectors and movement paths). 4. The reconstruction of the motor pattern from its analyzed goals in participants is subject to capacity limitations; only a few goal aspects are selected. 5. Finally, imitation follows the ideomotor principle. The selected goals elicit the motor program with which they are most strongly associated. This motor program does not necessarily lead to matching movements. Evidence in favor of the goal-directed theory of imitation was recently found in some of our studies (Bekkering et al. 2000; Gleißner, Meltzoff, and Bekkering 2000; Wohlschläger and Bekkering, submitted; for further discussion, see also Gattis et al., in press, and Bekkering and Prinz, in press). In an imitational setting, it was observed that young children always moved to the correct goal, such as an object or a particular ear to reach for, but widely ignored the agent (a particular hand to move with), or the movement path (ipsi- or contralateral to the object, see Fig. 15.1 for the gestures we used). This led us to assume that the action recognition process was strongly affected by the observed action effects. We proposed that imitation entails representing an observed behavior as a set of goals (possible action effects), which then automatically activates the motor program that is most strongly associated with these action effects. Goals may include objects (say, a particular ear), agents (a particular hand), a movement path (crossing the body or moving parallel to the body), or salient features (crossing the arms). We also proposed that these goals are represented hierarchically, with some goals dominating over others. When processing capacity is limited so that multiple goals compete, goals higher in the hierarchy are reproduced at the expense of goals lower in the hierarchy. Our results suggested that objects occupy the top of this hierarchy—children always grasped the correct ear, but in cases of errors used the wrong hand and the wrong movement path. The most common error was the so-called contra–ipsi error. In this case, although the adult model used the contralateral hand, children quite frequently touched the correct ear with the ipsilateral hand, the motor program most strongly associated with touching the ear. Because young children have dif1culty in processing multiple elements and relations, failures to reproduce all the goal aspects of an action are more likely than in adults. This proposal predicts that children’s imitation errors are malleable, depending on the number of goals identi1ed in the task as a whole. We tested this prediction in several additional experiments. One experiment limited the
303
aapc15.fm Page 304 Wednesday, December 5, 2001 10:04 AM
304
Common mechanisms in perception and action
Fig. 15.1
The six hand gestures used in Bekkering 2000.
movements to only one ear, thereby eliminating the necessity for children to specify the goal object. Nine children with a mean age of 4:4 years copied the movements of a model who always touched her right ear (randomly with either the left or right hand), or who always touched her left ear (again randomly using the left or right hand). In this circumstance, children made virtually no errors, grasping the ear contralaterally whenever the model did so. Eliminating the necessity of specifying the goal object thus enabled children to reproduce other goals in the imitative act, such as using the correct hand and the correct movement path. A further experiment compared the presence and absence of objects in the action set while keeping the total number of modeled gestures constant. Children (32 children, mean age 4:4 years) sat at a desk across from a model who made four unimanual gestures similar to those described above, but performed on a desk rather than at the ears. Half of the children saw four dots on the table, two in front of the model and two in front of the child. The model covered her dots ipsi- and contralaterally, sometimes with her right hand and sometimes with her left hand. Children were encouraged to copy the model, and directed their own actions at two corresponding dots. No dots were placed on the table for the other half of the children, and the model and child instead directed their actions at locations on the table. Children in the dot condition produced the same error pattern already observed in the hand-to-ear task, substituting ipsi- for contralateral gestures. In contrast, children in the no-dot condition who saw the identical movements directed at locations in space rather than dots produced signi1cantly fewer contralateral errors. We concluded that manipulating the presence or absence of a physical object had effectively manipulated the necessity of specifying objects as goals. Despite the fact that the movements of both model and child were identical in both conditions, removing the dots from the table eliminated the object goal, and allowed children to reproduce other goals in the imitative act, such as the agent and the movement. Further research has explored the question of how those goals are speci1ed and organized. Using a similar paradigm to the one described above, Gleißner et al. (2000) manipulated whether the
aapc15.fm Page 305 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
On
Near Contralateral
Ipsilateral
Contralateral
Unimanual
Ipsilateral
Gesture # 2
Gesture # 7
Gesture # 8
Gesture # 11
Gesture # 12
Bimanual
Gesture # 1
Gesture # 5
Gesture # 6
Fig. 15.2 Some of the gestures as used in Gleißner et al. 2000. gestures were directed at locations on the body, or locations near the body (see Fig. 15.2). A model performed ipsilateral and contralateral movements with her right or left hand or with both hands. She either touched a body part (an ear or a knee), or directed her movement at a location in space near the body part. Three-year-olds imitated less accurately when the model’s actions were directed at locations on the body than when her actions were directed at locations near the body. These results con1rmed the proposal of Bekkering et al. (2000) that objects (such as an ear or a knee) are high in the goal hierarchy and are able to displace other goals, such as agent and movement. Also, these results provide evidence for the view that perceiving unambiguous action effects automatically activates the motor program that is most strongly associated with this action effect, thereby largely ignoring the motor output observed. Whether the body part was visible or not (knee versus ear) did not signi1cantly in2uence imitative behavior, suggesting that visual feedback does not play an important role in specifying goals in gesture imitation. Wohlschläger et al. (submitted) compared other salient features—an open versus closed hand— against goals previously investigated. In an action similar to one of the conditions used by Gleißner et al., a model reached to the location in space near her left or right ear, using the left or right hand so that gestures were sometimes ipsilateral and sometimes contralateral. Simultaneously, the model either made a 1st or opened her hand, with the palm facing the child. We reasoned that the open versus closed hand introduced a new gesture goal, and wished to explore whether that goal would now displace the object goal, just as the object goal displaced other goals such as agents and movements. This was indeed the case. Children reproduced the open versus closed hand of the model every time, but now frequently performed the gesture on the contralateral different side of the head. In addition, since the con1guration of the hand and its position relative to the head were completed at the same time, it is clear that the goal is not simply de1ned by the end state.
305
aapc15.fm Page 306 Wednesday, December 5, 2001 10:04 AM
306
Common mechanisms in perception and action
15.3.1 Further evidence for goal-directed imitation 15.3.1.1 The presence or absence of an end-state goal There are several other studies providing additional support for the idea that goals or action effects are inferred from observing actions. For example, it was found that 16- to 24-month-old children imitated enabling sequences of events more accurately than arbitrary event sequences (Bauer and Mandler 1989; Bauer and Travis 1993). The novel–arbitrary sequences involved novel actions with simple objects, such as putting a sticker on a chalkboard, leaning the board against an easel, and drawing on the board with chalk. Novel–enabling sequences also involved novel actions with objects, with the difference that actions in the novel–enabling sequence enabled other actions in the sequence, and ultimately led to a salient novel event, such as a toy frog ‘jumping’ into the air. The frog-jump sequence, for example, involved putting a wooden board on a wedge-shaped block to form a lever, placing a toy frog on one end of the board, and hitting the other end of the board, causing the toy frog to appear to jump into the air. Children of all ages performed the modeled actions in the modeled order more frequently for novel–enabling sequences than novel–arbitrary sequences, clearly indicating that the presence of an unambiguous, observable goal leads to more accurate imitative behavior in young children. Further, Travis (1997) demonstrated that the presence of an end-state goal in a modeled action sequence led to more frequent imitation of those actions compared with action sequences omitting the goal action. Twenty-four-month-old children were shown interleaved pairs of three-step action sequences similar to the novel–enabling sequences described above. They saw either all six actions (three actions for each pair), or only 1ve actions, with the goal action for one sequence omitted. Interestingly, when shown two-goal pairs, children imitated both action sequences equally. In contrast, when shown one-goal pairs, children imitated more actions from the goal-present sequence than from the goal-absent sequence. In addition, children in both conditions performed actions leading to a particular goal as a temporally contiguous sequence—despite the fact that goal-related actions were not temporally contiguous in the modeled sequence, since they were interleaved with actions from another sequence. Thus, a variety of experimental data indicates that observers interpret the actions they observe above the level of elementary perceptual–motor maps. For instance, the presence of an end-goal in a sequence of actions increases the likelihood that those actions will be imitated and, presumably, it organizes subsequent behavior. It is worth noting that the end-goals used by Bauer, Travis, and colleagues were physical acts involving movement, noise, or both. However, Travis points out that a goal, strictly de1ned, is ‘a mental state representing a desired state of affairs in the world’ (1997, p. 115), and can therefore only be observed in the outcome of intentional actions. Identifying the goals of an observable action requires an inference beyond any mapping or parsing method as described previously. 15.3.1.2 Inferences about action goals Another line of evidence in favor of goal-directed imitation derives from a study by Want and Harris (1998). In their experiment, subjects were shown how to poke an object out of a horizontally mounted transparent tube, in which there was a ‘trap’. Only if the poking was done from one end could the object be obtained. Half of the children saw the action performed perfectly, while the other half saw the model put the stick into the wrong end 1rst, then remove it, and poke from the other end (the same successful performance as shown to the other group). Interestingly, the children
aapc15.fm Page 307 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
who saw the incorrect sequence did not copy this; however, they did learn signi1cantly more quickly than those who saw only error-free demonstrations. Other examples stressing that goals play an important role in imitative behavior come from developmental psychologists interested in children’s understanding of the intentions of others (Carpenter, Akhtar, and Tomasello 1998; Meltzoff 1995). These experiments demonstrate that even very young children are capable of inferring goals from observed actions, and that inferred goals in2uence imitative behavior. Meltzoff (1995) compared 18-month-old children’s re-enactments of an attempted but failed action, with an attempted and achieved action, using 1ve unique test objects. For example, an adult experimenter moved a rectangular wooden stick toward a rectangular recessed button on a box, and either inserted the stick in the hole, activating a buzzer, or touched an adjacent area on the box, missing the hole and not activating the buzzer. When given the opportunity to manipulate the objects immediately after the adult’s demonstration, children shown an attempted but failed act were just as likely to perform the target act (e.g. inserting the stick in the hole and activating the buzzer) as children shown an attempted and achieved act. This result is especially surprising because children who had seen a failed attempt never actually saw the target act performed. Children in both groups performed the target act approximately four times as often as did children in control conditions. The fact that 18-month-olds imitated intended acts just as often as achieved acts suggests that even very young children infer the goals of others’ behaviors, and imitate those inferred goals. In a similar paradigm, Carpenter, Akhtar, and Tomasello (1998) compared 14- to 18-month-old children’s re-enactments of verbally marked intentional and non-intentional acts. An experimenter performed two unrelated actions on a unique test object, for instance, lifting the top of a bird feeder, and pulling a ring on a string attached to the feeder. These actions were accompanied by vocal exclamations marking each action as either an intended act (‘There!’) or an accidental act (‘Whoops!’), with some children seeing 1rst an intentional and then an accidental act, and others seeing them in the reversed order. After both actions had been performed, a salient event occurred (e.g. a party favor attached to the bird feeder moved and made a noise). Irrespective of the order of the modeled actions, children reproduced the intentional acts approximately twice as often as the non-intentional acts. Together these experiments suggest that imitation in children relies on the presence of unambiguous, observable goals, and, importantly, on inferences about the actor’s intentions of the observed act as well. Furthermore, strong support was found for the notion that these (inferences about) goals or intentions in2uence subsequent imitative behavior.
15.3.1.3 Goal-directed interference effects Traditionally, the importance of goals in action perception and/or imitation has been addressed in developmental psychology. However, as argued elsewhere (e.g. Wohlschläger et al., submitted), we have stressed the view that there is no fundamental reason to assume that children imitate fundamentally differently from adults. Rather, we have argued that children are ideal subjects to investigate the issue of goals because of their limited working-memory capacity. Thus, the fact that adults are able to imitate a contralateral hand movement to the left ear correctly does not mean that the goal of a left ear does not primarily activate the motor program belonging to an ipsilateral hand movement, as seen in children. To investigate this issue in more detail, we recently measured adults’ response latencies in the hand-to-ear task described above (Wohlschläger and Bekkering, submitted). The latency data in
307
aapc15.fm Page 308 Wednesday, December 5, 2001 10:04 AM
308
Common mechanisms in perception and action
adults showed the same pattern as the error data in children. That is, although adults made almost no errors, when comparing latencies, contralateral hand movements were clearly initiated later than ipsilateral ones. This result is in parallel with the contra–ipsi error in children. However, in order to make the point that the increased response latency was due to the presence of goals, we also replicated the dot experiment (Exp. 2 of Bekkering et al. 2000; see above) in adults. We asked adults to imitate ipsi- and contralateral 1nger movements presented on a screen. In one block of trials, the 1nger movements presented on the screen were directed towards two red dots. In the other block of trials, the same movements were shown, but now without the dots. The data showed that contralateral 1nger movements were more delayed than ipsilateral 1nger movements, but only if dots were present. Our interpretation of this 1nding is that the presence of dots activates the more ideomotor-like ipsilateral 1nger movement 1rst. The ideomotor theory of action (e.g. Greenwald 1970) states that the anticipatory representation of an action’s sensory feedback (a response image) is used for action control. In other words, the anticipation of the sensory consequences of an action (e.g. the tactile sensation of touching an ear or the visual sensation of covering a dot) is responsible for the action selection processes and favors the response most strongly associated with these sensory consequences. However, adults can inhibit these strong connections, in order to imitate more precisely. This inhibition is time-consuming, as re2ected by the increased response latencies (Wohlschläger and Bekkering, submitted). Stimuli depicting goals cause interference not only when subjects are asked to imitate but also if subjects are instructed to respond in a pre-speci1ed way (Stürmer, Aschersleben, and Prinz, 2000). In their set-up, participants had to either spread their 1ngers apart or make a grasping hand movement from a neutral middle starting position. The stimulus on the screen consisted of a sequence of pictures that showed a similar hand in the same neutral starting position. After a random time period, the hand either spread or it closed to a 1st, before returning to the neutral position again. The subject’s instruction for response selection was provided by a cue, such as the color of the hand. That is, subjects were instructed to make their response as soon as the color was added to the pictures (e.g. ‘Make a grasping movement if the stimulus turns red’). The imperative cue (that is, the color), was presented during different times of the hand movement sequence. The main 1nding was a typical Simon-like correspondence effect. That is, subjects responded faster if the observed hand movement was in correspondence with the instruction given by the color, than in the case of non-correspondence between observed and instructed hand movement. The results of a second experiment were particularly interesting with respect to our goal-directed theory of imitation. In this experiment, instead of dynamic hand movements the static images of the end positions (1st vs. spread 1ngers) were presented to the subjects. Now the compatibility effects were even stronger than those observed in the 1rst experiment. Thus, the 1ndings of Stürmer et al. (2000) suggest that the observation of a static hand posture interferes more with selection processes than seeing the whole hand movement, a 1nding that nicely agrees with the ideomotor theory (responses are selected by their intended effects or goals) mentioned above. In another series of experiments (Brass, Bekkering, Wohlschläger, and Prinz 2000), it was tested whether observed 1nger movements have a stronger in2uence on 1nger movement execution than a symbolic or spatial cue. In the 1rst experiment, we compared symbolic cues with observed 1nger movements using an interference paradigm. Observing 1nger movements strongly in2uenced movement execution, irrespective of whether the 1nger movement was the relevant or the irrelevant stimulus dimension. In the second experiment, effects of observed 1nger movements and spatial 1nger cues were compared. The observed 1nger movement dominated the spatial 1nger cue. A reduction in the
aapc15.fm Page 309 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
similarity of observed and executed action in the third experiment led to a decrease of the in2uence of observed 1nger movement, which demonstrates the crucial role of the imitative relation of observed and executed action for the described effects.
15.3.1.4 How do we perceive and arrange goals? Having said all this about the relevance of goals in imitation, we would like to 1nish this section with some critical notes about the notion of goal-directed imitation, the concept of goals in general, and some future work that has to be done to clarify some essential issues still to be answered. A major criticism of the theory could be that whatever aspects the imitator imitates from the model’s action, this will be viewed as being the goal of the action. Therefore, the theory is hard to falsify since one cannot individuate the goal hierarchy independently from what the imitator does in an imitation situation. Although this in fact is the basis of our theory—you do imitate what you perceive to be the goal of the model’s movement—we think that we have been able to 1nd some ways to deal with this criticism. First of all, in the dot experiment (Experiment 3 of Bekkering et al. 2000) with the kindergarten children, we were able to predict a change in the imitative behavior of the children by adding or removing goals in the modeled act. That is, children did or did not copy similar hand movements observed, depending on the presence of higher goals like objects, as proposed by the theory. Second, maybe even more convincingly, the response-time differences in the 1nger-imitation experiments of the Wohlschläger and Bekkering (2000) study, show that the observation of goals can in2uence motor programming although the 1nger movements to be programmed are identical. That is, although the adult subjects were able to imitate the observed contralateral 1nger movements, it took them more time if the observed movements on the screen were directed to the dots. Here, a nice dissociation between perceived action goals (a movement to a left dot activates a left ipsilateral 1nger movement) and the executed imitative act (the right contralateral 1nger movement) was arranged, while the presence of a higher goal was still re2ected in the response-time latencies. Third, the studies mentioned above of Brass et al. (2000) and those of Stürmer et al. (2000) have shown that the presence of action goals even outside the scope of imitation can interfere with action initiation. That is, the observation of 1nger movements or hand postures on a computer screen seems to directly in2uence the premotor system in activating another task. Neurophysiological support for a close coupling between the action observation system and the action execution system comes from a 1nding by Rizzolatti and colleagues. They observed single-cell activity in the rostral part of the inferior premotor cortex, area F5, of the monkey during goal-directed hand movements such as grasping, holding, and tearing, but also when the monkey only observed these actions performed by the experimenter (e.g. di Pellegrino, Fadiga, Fogassi, and Rizzolatti 1992). In most of these so-called mirror neurons, there needed to be a clear link between the effective observed movement and that executed by the monkey in order to 1nd discharges in the same neuron (e.g. Gallese, Fadiga, Fogassi, and Rizzolatti 1996), which led the authors to propose that these mirror neurons form a system for matching observation and execution of motor actions (for an overview, see Gallese, this volume). A recent study by Fadiga, Fogassi, Pavesi, and Rizzolatti (1995) took this notion one step further. The results of their transcranial magnetic stimulation experiment showed that the excitability of the motor system increased when an observer watched grasping movements performed by a model. Furthermore, the pattern of muscle activation evoked by the transcranial magnetic stimulation during action observation was very similar to the pattern of muscle contraction present during the execution of a similar action.
309
aapc15.fm Page 310 Wednesday, December 5, 2001 10:04 AM
310
Common mechanisms in perception and action
Importantly, the mirror neuron experiments showed that the activity of the F5 neurons is correlated with speci1c hand and mouth motor acts and not with the contractions of individual muscle groups. That is, typically, the neurons are only active when a goal, an object, is present and they stay silent when the same movement is performed without this goal. Also, these neurons are very sensitive to the treatment of the object by the effector. That is, one neuron might 1re when the object is picked up with a precision grip, while it stays silent during a full grip and vice versa. Recently, neurophysiological evidence for the link between ‘imitative’ effects as observed in the Brass et al. (2000) paper and the mirror neuron system were provided by an fMRI study by Iacoboni et al. (1999). Using the stimuli of Brass et al. they found that two distinct areas were more highly activated in an imitative 1nger movement task than in the symbolic 1nger movement task. These were, first, and most interestingly, Broca’s area 44, and also the areas PE/PC in the parietal lobe. The authors proposed that Broca’s area 44, which has been suggested to be the human homologue of monkey area F5, might describe the observed action in terms of its motor goal (e.g. lift the 1nger) without de1ning the precise details of the movement. In contrast, the parietal lobe might code the precise kinesthetic aspects of the movement, as suggested by Lacquaniti, Guigon, Bianchi, Ferriana, and Caminiti (1995).
15.4 Concluding remarks 15.4.1 The concept of ‘goals’ As has been reflected upon in this chapter, a widely used concept for explaining imitative behavior is to say that imitation is ‘goal-directed’. Interestingly, the different theories that use the concept of goals refer to totally different mechanisms. We would like to 1nish this tutorial with an overview of the goal concepts in the major theories, and a taxonomical suggestion is made to capture the different meanings of the word ‘goal’ in the domain of imitation. Per de1nition, the goal-directed theory of imitation is concentrated around the concept of goals. Here, goals typically refer to physical things such as dots and ears. That is, in the experiments mentioned before, imitators always move to the correct goal, such as an object or a particular ear to reach for. However, this theory also uses the concept of goals at another more functional level, as re2ected in the ideomotor principle. The ideomotor principle states that the selected physical goals elicit the motor program with which they are most strongly associated. In other words, the physical goals are represented in certain neural codes and these representations affect the selection and initiation processes of imitative actions. Thus, goals now refer to a functional mechanism necessary to initiate an imitative action. To make things even more complicated, and as mentioned before, Travis already has pointed out that a goal, strictly de1ned, is ‘a mental state representing a desired state of affairs in the world’ (1997, p. 115), and can therefore only be observed in the outcome of intentional actions. Identifying the goals of an observable action requires an inference beyond any mapping or parsing method as described previously. A good example of the role of mental goals can be found in this book in the chapter of Gallese and colleagues (Chapter 17). In agreement with earlier observations of Perrett’s group (see Jellema and Perrett, this volume, Chapter 18) in higher visual areas, mirror neuron cells were also found to be active when they either observe, or execute an action on an object that is hidden at the moment that the action is performed. In this case, the physical goal of the action is only mentally present. The importance and automatic in2uences of such intentional strategies on action
aapc15.fm Page 311 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
performance has recently received a great deal of attention in the 1eld of social cognition (see, for a review, Gollwitzer and Schaal 1998), but falls behind the scope of this tutorial. The newer version of the active intermodal mapping theory (Meltzoff and Moore 1997) also speaks about goal-directed acts. Here, the goal of an infant is to match their own body with the observed model’s body organ relations, which clearly refers to the functional action level of the goal concept. The emulation theory (Tomasello and colleagues) argues that one can only speak about imitation if not only the same behavioral repertoire but also the same goals are shared between model and observer. In this view, goals typically refer to physical things. The program-level and action-level imitation theory of Byrne and Russon argues that, to imitate, the individual must have a mental apparatus that allows assembling hierarchical frameworks, to organize the goal structure, and to maintain the goal structure while its detailed enactment is built. Here, again, the functional level of the goal concept is meant. To overcome misunderstandings about the different meanings of the word goal we propose the following taxonomy for the domain of imitation: • Physical goals refer to existing objects in the immediate surroundings; • Action goals refer to the functional mechanisms/neural processes that are induced by the physical goals, which are necessary to initiate an imitative action; • Mental goals refer to a desired state of affairs in the world.
15.4.2 A functional view on imitation Instead of arguing about what imitation is, and who can imitate, it might be more interesting to study the functional organization of action recognition and action execution processes per se. In our opinion, the key concept for understanding imitative behavior is that of action goals. As suggested in this tutorial, two sensorimotor mechanisms are involved in the execution and the perception of action goals in imitation, respectively. First, the ideomotor principle—selected physical goals elicit the motor program with which they are most strongly associated, elucidates how both the agent and the imitator translate action goals into the execution of movements. Second, the mirror neuron system—matching the observation of movements with the individual motor actions—gives details of how seeing goal-directed movements can give rise to the recognition of action goals. Thus, the recognition of action goals from the observation of movements and the execution of movements from having action goals can be conceptually clari1ed by proposing two inverted functional mechanisms. The mirror neuron system transforms movements perceived from others into the actor’s own possible action goals, while the ideomotor principle transforms the intended action goals of an actor into movement execution. To conclude, the functional model of imitation, as described here, stresses the importance of the personal action repertoire in both the perception and the execution of goaldirected imitative actions.
References Anisfeld, M. (1996). Only tongue protrusion modeling is matched by neonates. Developmental Review, 16, 149–161.
311
aapc15.fm Page 312 Wednesday, December 5, 2001 10:04 AM
312
Common mechanisms in perception and action
Bandura, A. and Mischel, W. (1965). Modi1cation of self-imposed delay of reward through exposure to live and symbolic models. Journal of Personality and Social Psychology, 2, 698–705. Bandura, A. and Walters, R.H. (1963). Social learning and personality development. New York: Holt, Rinehart and Wilson. Bauer, P.J. and Mandler, J.M. (1989). One thing follows another: Effects of temporal structure on 1- to 2-yearolds’ recall of events. Developmental Psychology, 25, 197–206. Bauer, P.J. and Travis, L.L. (1993). The fabric of an event: Different sources of temporal invariance differentially affect 24-month-olds’ recall. Cognitive Development, 8, 319–341. Bekkering, H. and Prinz, W. (in press). Goal representations in imitative actions. In K. Dautenhahn and C.L. Nehaniv (Eds.), Imitation in animals and artifacts. Cambridge, MA: MIT Press. Bekkering, H., Wohlschläger, A., and Gattis, M. (2000). Imitation of gestures in children is goal-directed. Quarterly Journal of Experimental Psychology. Section A: Human Experimental Psychology, 53A, 153–164. Brass, M., Bekkering, H., Wohlschläger, A., and Prinz, W. (2000). Compatibility between observed and executed 1nger movements: Comparing symbolic, spatial and imitative cues. Brain and Cognition, 44, 124–143. Byrne, R.W. (in press a). Imitation without intentionality: Using string parsing to copy the organization of behaviour. Animal Cognition. Byrne, R.W. (in press b). Seeing actions as hierarchically organized structures. Great-ape manual skills. In A. Meltzoff and W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases. Cambridge, UK: Cambridge University Press. Byrne, R.W. and Byrne, J.M.E. (1993). Complex leaf-gathering skills of mountain gorillas (Gorilla g. beringei): Variability and standardization. American Journal of Primatology, 31(4), 241–261. Byrne, R.W. and Russon, A.E. (1998). Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences, 21, 667–684. Call, J. and Carpenter, M. (in press). Three sources of information in social learning. In K. Dautenhahn and C.L. Nehaniv (Eds.), Imitation in animals and artifacts. Cambridge, MA: MIT Press. Carpenter, M., Akhtar, N., and Tomasello, M. (1998). Fourteen- through 18-month-old infants differentially imitate intentional and accidental actions. Infant Behavior and Development, 21, 315–330. di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180. Doupe, A.J. and Kuhl, P.K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22, 567–631. Fadiga, L., Fogassi, L., Pavesi, G., and Rizzolatti, G. (1995). Motor facilitation during action observation: A magnetic study. Journal of Neurophysiology, 73, 2608–2611. Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609. Gattis, M., Bekkering, H., and Wohlschläger, A. (in press). Goal-directed imitation. In A. Meltzoff and W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases. Cambridge: Cambridge University Press. Gleißner, B., Meltzoff, A.N., and Bekkering, H. (2000). Children’s coding of human action: Cognitive factors in2uencing imitation in 3-year-olds. Developmental Science, 3, 405–414. Gleitman, H. (1981). Psychology. W.W. Norton. Gollwitzer, P.M. and Schaal, B. (1998). Metacognition in action: The importance of implementation intentions. Personality-and-Social-Psychology Review, 2, 124–136. Greenwald, A.G. (1970). Sensory feedback mechanism in performance control: With special reference to the ideomotor mechanism. Psychological Review, 77, 73–99. Huber, L. (1998). Movement imitation as faithful copying in the absence of insight. Behavioral and Brain Science, 21, 694. Iacoboni, M., Woods, R.P, Brass, M., Bekkering, H., Mazziotta, J.C., and Rizzolatti, G. (1999). Cortical mechanisms of human imitation, Science, 286(5449), 2526–2528. Köhler, W. (1976). The mentality of apes. (transl. by E. Winter). New York: Liveright. Kuhl, P.K. and Meltzoff, A.N. (1996). Infant vocalizations in response to speech: Vocal imitation and developmental change. Journal of the Acoustic Society of America, 100, 2425–2438. Lacquaniti, F., Guigon, E., Bianchi L., Ferriana, S., and Caminiti, R. (1995). Representing spatial information for limb movement: Role of area 5 in the monkey. Cerebellar Cortex, 5, 391–409. Lefkowitz, M.M., Blake R.R., and Mouton, J.S. (1955). Status factors in pedestrian violation of traf1c signals. Journal of Abnormal and Social Psychology, 51, 704–706.
aapc15.fm Page 313 Wednesday, December 5, 2001 10:04 AM
Action perception and imitation: a tutorial
Leiguarda, R.C. and Marsden, C.D. (2000). Limb apraxis: Higher-order disorders of sensorimotor integration. Brain, 123, 860–879. Lombardino, A.J. and Nottebohm, F. (2000). Age at deafening affects the stability of learned song in adult male zebra 1nches. Journal of Neurosciences, 20, 5054–5064. McKeon, R. (1941). The basic work of Aristotle. New York: Random House. Meltzoff, A.N. (1988). Infant imitation after a 1-week delay: Long-term memory for novel acts and multiple stimuli. Developmental Psychology, 24, 470–476. Meltzoff, A.N. (1995). Understanding the intentions of others: Re-enactment of intended acts by 18-month-oldchildren. Developmental Psychology, 31, 838–850. Meltzoff, A.N. and Moore, M.K. (1977). Imitation of facial and manual gestures by human neonates. Science, 198, 75–78. Meltzoff, A.N. and Moore, M.K. (1983). Newborn infants imitate adult facial gestures. Child Development, 54, 702–709. Meltzoff, A.N. and Moore, M.K. (1994). Imitation, memory, and the representation of persons. Infant Behavior and Development, 17, 83–99. Meltzoff, A.N. and Moore, M.K. (1997). Explaining facial imitation: A theoretical model. Early Development and Parenting, 6, 179–192. Meltzoff, A.N. and Moore, M.K. (1998). Infant intersubjectivity: Broadening the dialogue to include imitation, identity and intention. In S. Braten et al. (Eds.), Intersubjective communication and emotion in early ontogeny. Studies in emotion and social interaction (2nd series), pp. 47–62. New York: Cambridge University Press. Mineka, M.D., Cook, M., and Keir, R. (1984). Observational conditioning of snake fear in rhesus monkey. Journal of Abnormal Psychology, 93, 355–372. Nagell, K., Olguin, R.S., and Tomasello, M. (1993). Processes of social learning in the tool use of chimpanzees (pan troglodytes) and human children (Homo sapiens). Journal of Comparative Psychology, 107, 174–186. Nottebohm, F. (1991). Reassessing the mechanisms and origins of vocal learning in birds. Trends in Neurosciences, 14, 206–211. Piaget, J. (1975). Nachahmung, Spiel und Traum: Die Entwicklung der Symbolfunktion beim Kinde. Stuttgart: Ernst Klett. Poulson, C.L., Kymissis, E., Reeve, K.F., Andreators, M., and Reeve, L. (1991). Generalized vocal imitation in infants. Journal of Experimental Child Psychology, 51, 267–279. Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131–141. Russon, A.E. (1996). Imitation in everyday use: Matching and rehearsal in the spontaneous imitation of rehabilitant orangutans (Pongo pygmaeus). In A.E. Russon, K.A. Bard, et al. (Eds.), Reaching into thought: The minds of the great apes, pp. 152–176. Cambridge, UK: Cambridge University Press. Russon, A.E. and Galdikas, B.M. (1993). Imitation in free-ranging rehabilitant orangutans (Pongo pygmaeus). Journal of Comparative Psychology, 107(2), 147–161. Savage-Rumbaugh, E.S. (1990). Language acquisition in a nonhuman species: Implications for the innateness debate. Developmental Psychobiology, 23(7), 599–620. Siegel, G.M., Cooper, M., Morgan, J.L., and Brenneise-Sarshad, R. (1990). Imitation of intonation by infants. Journal of Speech and Hearing Research, 33(1), 9–15. Spence, K.W. (1937). Experimental studies of learning and higher mental processes in infrahuman primates. Psychological Bulletin, 34, 806–850. Stürmer, B., Aschersleben, G., and Prinz, W. (2000). Correspondence effects with manual gestures and postures: A study of imitation. Journal of Experimental Psychology: Human Perception and Performance, 26(6), 1746–1759. Tchernichovski, O., Lints, T., Mitra, P.P., and Nottebohm, F. (1999). Vocal imitation in zebra 1nches is inversely related to model abundance. Proceedings of the National Academy of Sciences of the United States of America, 96, 12901–12904. Thorndike, E.L. (1898). Animal intelligence: An experimental study of the associative process in animals. Psychological Review Monograph, 2(8), 551–553. Tomasello, M. (1990). Cultural transmission in the tool use and communicatory signaling of chimpanzees? In S. Parker and K. Gibson (Eds.), Language and intelligence in monkeys and apes: Comparative developmental perspectives, pp. 274–311. Cambridge UK: Cambridge University Press.
313
aapc15.fm Page 314 Wednesday, December 5, 2001 10:04 AM
314
Common mechanisms in perception and action
Tomasello, M. (1992). The social bases of language acquisition. Social Development, 1, 67–87. Tomasello, M. (1996). Do apes ape? In C. Heyes and B. Galev (Eds.), Social learning in animals: The roots of culture, pp. 319–345. New York: Academic Press. Tomasello, M. and Call, J. (1997). Primate cognition. Oxford: Oxford University Press. Tomasello, M., Kruger, A.C., and Ratner, H.H. (1993a). Cultural learning. Behavioral and Brain Sciences, 16, 495–552. Tomasello, M., Savage-Rumbaugh, E.S., and Kruger, A.C. (1993b). Imitative learning of actions on objects by children, chimpanzees, and enculturated chimpanzees. Child Development, 64, 1688–1705. Travis, L.L. (1997). Goal-based organization of event memory in toddlers. In P.W. van den Broek, P.J. Bauer, and T. Bourg (Eds.), Developmental spans in event comprehension and representation: Bridging 1ctional and actual events, pp. 111–138. Mahwah, NJ: Erlbaum. Vygotsky, L.S. (1978). Prehistory of written speech. Soc. Sci. Inform., 17, 1–17. Want, S.C. and Harris, P.L. (1998). Indices of program-level comprehension. Behavioral and Brain Sciences, 21, 706. Whiten, A. (1998). Imitation of the sequential structures of actions by chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 112, 270–281. Whiten, A. and Ham, R. (1992). On the nature and evolution of imitation in the animal kingdom: Reappraisal of a century of research. In P.J.B. Slater, J.S. Rosenblatt, C. Beer, and M. Milinski (Eds.), Advances in the study of behavior, pp. 239–283. San Diego, CA: Academic Press. Wohlschläger, A. and Bekkering, H. (submitted). Is human imitation based on a mirror-neuron system? Some behavioral evidence. Wohlschläger, A., Gattis, M., and Bekkering, H. (submitted). Mapping means or mapping ends? Towards a goal-directed theory of imitation. Manuscript submitted for publication.
aapc16.fm Page 315 Wednesday, December 5, 2001 10:04 AM
16 Observing a human or a robotic hand grasping an object: differential motor priming effects Umberto Castiello, Dean Lusher, Morena Mari, Martin Edwards, and Glyn W. Humphreys
Abstract. The present paper aims to investigate how functional connections between action and perception may contribute to our imitation of motor acts of other beings. Four experiments examined motor priming effects on imitation from biological and nonbiological effector systems. In Experiment 1 we asked subjects 1rst to observe grasping movements performed by a human actor and by a robotic hand, and subsequently to perform the same movement. For 80% of the cases the movement performed by the primer and that performed by the subjects were directed to the same object (valid trials). For the remaining 20% of cases the subjects were required to perform an action towards an object that differed in size from the object grasped by the primer (invalid trials). Experiment 2 was similar to Experiment 1 except that the ratio between valid and invalid trials was 50%. We found priming effects con1ned to when a human actor served as the primer. In Experiment 3 we showed that the selective effects found for the human primer in Experiment 1 were unrelated to the fact that, in the robot condition, only an arm/hand system was visible while for the human actor both the face and the upper body were visible. Experiment 4 demonstrated some differences between the robot and a human primer even when the kinematics of the human primer did not change as a function of object size. The results demonstrate that motor priming can occur from the observation of biological action. The implications for understanding imitative behaviour are discussed in terms of differential levels of priming: some degree of unspeci1c priming (1rst level) seems to occur whenever the observer is exposed to a human versus a robot arm. There appears to be some conspeci1c advantage, which is completely unrelated to things like object size, trial type, or kinematics. More speci1c forms of priming (levels 2 and 3) appear to be fully dependent on model kinematics. Priming is only seen for the human hand when it operates naturally. It is not seen for the robot and it is not seen for the human hand when its kinematics do not differentiate between conditions.
16.1 Introduction The ability to imitate movements is of fundamental survival value for intelligent organisms, providing an important means of learning as well as a mechanism by which an individual may be accommodated within a group (for a review, see Bekkering and Wohlschläger, this volume, Chapter 15). The role of imitation in the development of human and other animals has long been documented, Darwin (Romanes and Darwin 1884), Thorndike (1898), and Piaget (1951) being three notable contributors to the literature. In recent years interest in the topic has been rekindled by new evidence emerging on the behavioural (for a review, see Prinz, in press) and physiological underpinnings of imitative behaviour (di Pellegrino, Fadiga, Fogassi, Gallese, and Rizzolatti 1992; Gallese, Fadiga, Fogassi, and Rizzolatti 1996; Rizzolatti and Arbib 1998).
aapc16.fm Page 316 Wednesday, December 5, 2001 10:04 AM
316
Common mechanisms in perception and action
In the following sections of this introduction we will brie2y review experimental paradigms applied to humans and primates to investigate the various aspects of imitative behaviour, which have led to the present series of experiments.
16.1.1 Behavioural experimental studies A number of paradigms have been implemented and a number of studies have been performed to investigate the mechanisms underlying imitation task performance (for a review, see Prinz, in press; Vogt, in press). For example, Kerzel, Bekkering, Wohlschläger, and Prinz (2000) developed the ‘movement reproduction’ paradigm to investigate the perception and reproduction of intentional actions. In particular, they asked participants to observe two consecutive object movements and then reproduce the 1rst of them as precisely as possible while ignoring the second. The basic paradigms consisted of two disks moving on a display. Participants were required to observe them and reproduce the velocity of the 1rst disk with a stylus movement on a graphic tablet. The results indicated that participants were able to reproduce velocity on the basis of the velocity of the 1rst disk, but, interestingly, the velocity of the second disk also in2uenced the reproduction pattern. Velocity reproduction for the 1rst disk tended to be faster if the velocity of the second disk was higher. While these results suggest a sharing of representational resources between movement perception and movement production, it is with the ‘movement selection’ paradigm that the issues about proper imitation may be better investigated. Stürmer, Aschersleben, and Prinz (2000) developed a paradigm based on gesture selection. This paradigm considered two gestures: hand spreading and hand grasping. In the 1rst case 1ngers were extended, and in the second case 1ngers were 2exed. Participants were required to perform one of these two hand movements as performed by an actor. An important feature for this task was that the identity of the stimulus gesture was irrelevant for the selection of the response gesture (to be performed by the subject). Instead, the relevant cue for the type of movement to be performed was the color of the stimulus hand i.e. red signi1ed 1nger extension; blue signi1ed 1nger 2exion). Similar to evidence on Stroop- and Simon-type compatibility effects, the authors found that the speed of responding was faster when there was correspondence between the stimulus and response gestures than when there was no correspondence. From this the authors argued that similar representational structures are involved in the perception and execution of actions (Prinz 1990, in press). Along the same lines, Brass, Bekkering, Wohlschläger, and Prinz (2000) developed the ‘effector selection paradigm’, where the gesture to be performed was 1xed (lifting of the 1nger) but there was a choice between two effectors to perform the gesture (index or middle 1nger). Two kinds of instructions were utilized, an imitative instruction where participants were required to lift the same 1nger as that lifted by a hand on a display and a symbolic instruction where participants were required to lift the same 1nger as indicated by a cross on the display. It was found that when the 1nger to imitate was the same as the 1nger to be lifted, as in the imitative cueing condition, response times were shorter than when actions were cued symbolically. The aforementioned studies suggest that there is a supramodal representational system, which matches the perceptual information of a seen act with proprioceptive information concerning an executable act. Bekkering, Gattis, and Wohlschläger (2000), however, have recently challenged this idea. In their experiments, preschool children were asked to imitate a set of hand gestures made by an actor positioned in front of them. The gestures varied in complexity. For example, a model touched the left and/or right ear(s) with one or both of the ipsilateral and/or contralateral hand(s). There were three ipsilateral hand movements (left hand to left ear, right hand to right ear, both hands
aapc16.fm Page 317 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
to ipsilateral ears) and three contralateral hand movements (left hand to right ear, right hand to left ear, and both hands to contralateral ears). The results suggested that the children preferred to use the ipsilateral hand. However, when hand movements were made to only one ear this ipsilateral preference was not observed. Similarly, the ipsilateral preference was not evident when movements were directed at a space rather than a physical object. Their results supported a goal-directed imitation hypothesis where it is advanced that the desired goal of an imitative act is what is extracted from a model’s movement (Bekkering et al. 2000), not a speci1c priming of the effector corresponding to that used by the actor being observed.
16.1.2 Neurophysiological studies A number of physiological studies have also supported the notion that motor structures are involved in action perception as well as production—particularly those concerned with the so-called ‘mirror neurons’(e.g. Rizzolatti and Arbib 1998) in area F5 of the pre-frontal cortex of the macaque monkey (see Gallese et al., this volume, Chapter 17). These neurons are active not only when a monkey grasps and manipulates objects, but also when a monkey observes an experimenter performing a similar gesture. Moreover the cells do not discharge merely in response to object presentation, but rather they require a speci1c observed action in order to be triggered. The tuning of the neurons can also be quite speci1c, coding not only the action but also how it is executed. For example, a neuron might 1re during observation of grasp movements but only when the grasping action is performed with the index 1nger and thumb. Also, if the same grasp is performed using a tool, the neuron may no longer 1re. Neurons showing quite similar properties to those in area F5 have also been reported within the superior temporal sulcus (STS) by Perrett and colleagues (Oram and Perrett 1996; Perrett, Rolls, and Caan 1982; Perrett, Harris, Bevan, and Thomas 1989). For instance, in the lower bank of the STS, cells sensitive to actions of the hand were found. One apparent difference between the neurons in F5 and the STS is that neurons in the STS do not respond to executed motor acts but rather only to perceived ones. Evidence that a similar mirror system exists in humans comes from studies using transcranial magnetic stimulation (TMS) and functional brain imaging. Using TMS, Fadiga, Fogassi, Pavesi, and Rizzolatti (1995) demonstrated a selective increase in motor-evoked potentials when subjects observed various actions. This increase occurred in the muscles that the subjects would usually use for producing the actions they observed. In addition, two PET studies (Grafton, Arbib, Fadiga, and Rizzolatti 1996; Rizzolatti et al. 1996) have shown selective activation in the superior temporal sulcus, the inferior parietal lobule, and the inferior frontal gyrus of the left hemisphere when subjects observe a grasping action performed by an experimenter. Along these lines a recent study conducted using functional magnetic resonance techniques (fMRI; Iacoboni et al. 1999) con1rmed the activation of frontal and parietal areas during an imitation task. These results suggest that the brain may employ specialized circuitry to code stimuli for imitative actions. Furthermore, this circuitry seems responsive to immediate observation of biologically appropriate actions for the organism (e.g. a grasping movement performed by a member of the same species (a conspec)), but not to similar actions that are not biologically appropriate for the organism (e.g. grasping by a tool; Gallese et al. 1996). Whether the same mechanisms are used for longerterm recall and reproduction of action outside of the immediately observed context is less clear, though there is some suggestive evidence from functional brain imaging that similar brain areas
317
aapc16.fm Page 318 Wednesday, December 5, 2001 10:04 AM
318
Common mechanisms in perception and action
may be activated in action imitation after immediate observation and in longer-term recall (Decety and Grèzes 1999).
16.1.3 The present study The present study provides a novel and integral contribution to the already existing body of evidence regarding imitation. The issues we tackle in the present paper have never been considered in the previous literature on imitation. For example: does the observed action have to be part of the already existing behavioural repertoire of the observer in order to trigger mechanisms for imitation? We know from the neurophysiology of imitation (Gallese et al. 1996) and from developmental studies (Woodward 1998, 1999) that monkeys and children code and attend to grasping actions performed by a person but not necessarily to those performed by a mechanical device. In the 1rst case, a monkey’s mirror neurons are silent when the object is grasped with forceps or pliers (Gallese et al. 1996). In the second case, 1ndings indicate that six-month-old children selectively attend to an object grasped when a person, but not a mechanical claw, grasps the object (Woodward 1998, 1999). On the basis of these studies, we may expect that observation of an action performed by a conspec should have consequences on subsequent motor behaviour by a human subject. For example, detailed information about the kinematics of the observed action may be used to prime an action made at a later time by the observer, so that, for example, the action is parameterized on the basis of the previously observed action. Moreover, priming an action should occur over and above effects due to recall of the behaviour as previously performed by the observer (at least to the extent that recall might only partially engage the same specialized imitative circuitry). The detailed behavioural consequences of action observation on the kinematics of subsequent actions have not been examined hitherto. This was the aim of the present study. We examined whether there are priming effects produced by observing grasping by a human actor on the execution of a similar action by an observer. If this were the case, then the kinematics of actions to a target would vary according to whether the human model had grasped an object of the same or a different size. In addition, we contrasted the observation of an action performed by a conspec (another human) with the observation of an action performed by a non-conspec, a robot arm. Observation of a robot arm performing the reaching action allows us to investigate three relevant issues: (1) it provides a baseline for assessing whether either mere sight of a prime object of a particular size, or observation of a nonbiological grasping action, was suf1cient to generate action priming; (2) the use of a robotic arm allows also for a comparison between imitation for an action performed within the supposed behavioural repertoire of a normal person (performed by a conspec), and an action outside the normal behavioural repertoire (an action performed by the robot); (3) the study of priming from the robotic arm should match and extend the work with monkeys on neuronal activation associated with actions performed with hands and tools, and also the human developmental work on the same topic.
16.2 The experiments 16.2.1 Experiment 1: visuomotor priming: robotic versus human primer In this initial experiment we used an experimental paradigm based on visuomotor priming (Craighero, Fadiga, Rizzolatti, and Umiltà 1998). Subjects were asked to observe grasping movements performed by a human model or by a robotic hand to a target of one of two sizes. Immediately after this, the
aapc16.fm Page 319 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
subjects had to grasp a target object that could be the same size as the prime object, or a different (unprimed) size. In Experiment 1, the prime and target were likely to be the same size (on 80% of the trials), so that both the size of the prime and the movement made by the human model were informative of the probable action to the subsequent target. Kinematics of the reach-to-grasp action were recorded. The question addressed was whether effects of the movement performed by the human model or the robotic hand could be observed on the actions performed by the participants. Further, we asked whether such effects, if any, occurred over and above effects due to recall and/or prediction of the action from either the size of the primed object or the type of primer (human or robot).
16.2.1.1 Methods Participants. Eight subjects (4 women and 4 men, aged 20–25 years) volunteered to participate. All were right-handed, all reported normal or corrected-to-normal vision, and all were naive as to the purpose of the experiment. They attended two experimental sessions of 4 hours duration in total. Robot. The arm looked like an average human forearm with a gloved hand. It was mounted on a metal frame with a single motor that moves the arm from a vertical position to a horizontal position. The 4 1ngers and 1 thumb had a common movement so as to mimic the closing of a human hand. The construction was electro-mechanical and was controlled by an 87c751 micro-controller. The hand was constructed from nylon cords for tendons, silicon rubber for joints, and wooden dowels for bones. The movement was provided by a DC electric motor that tensed the tendons to close the hand. Springs were used to store energy and thus reduce the required power and size of the DC motors. Limit sensors on the arm and hand were used by the micro-controller to control movement. The arm length was approximately 0.5 m. The maximum pickup weight was approximately 0.1 kg. A feature of the robot is that it does not differentiate kinematics between large and small objects as humans do. This problem will be tackled in Experiment 4 where it is demonstrated that this difference is not relevant for interpretation of the data from the previous three experiments. Type of stimuli. The stimuli consisted of 2 spherical white foam objects (diameter: ~8 cm and ~2 cm) positioned at 30 cm distance along the mid-sagittal plane. Type of trials. There were two trial types. (1) A ‘valid’ trial, where a robotic arm or a human experimenter performed a reach-to-grasp action towards either the small or the large object and subsequently the subject grasped the same object. This occurred on 80% of the trials. (2) An ‘invalid’ trial, where the robotic arm or the human experimenter performed an action towards the small object and the subject grasped the large object or vice versa. This occurred on 20% of the trials. Apparatus. Re2ective passive markers (0.25 cm diameter) were attached to (a) the wrist, (b) the index 1nger, and (c) the thumb. Movements were recorded with the ELITE motion-analysis system. This consisted of two infrared cameras (sampling rate 100 Hz) inclined at an angle of 30° to the vertical and placed 2 m on the side of the table and 2 m apart. The spatial error measured from stationary and moving stimuli was 0.4 mm. Coordinates of the markers were reconstructed with an accuracy of 1/3000 of the 1eld of view and sent to a host computer. Visual availability of the stimuli was controlled with Plato spectacles (Plato Technologies Inc.). These were lightweight and were 1tted with liquid crystal lenses. The robotic arm or the experimenter was positioned at 90°, in front of the subject. The starting position of the robotic arm and the starting position of the experimenter’s arm were the same (see Fig. 16.1). The distance from the hand of the subject, the hand of the robot, and
319
aapc16.fm Page 320 Wednesday, December 5, 2001 10:04 AM
320
Common mechanisms in perception and action
Fig. 16.1 In panel (a) and panel (b) the position of the subject (wearing LCD glasses) and the position of the experimenter or the robot are represented. Panel (c) shows the position of the markers. Panel (d) represents the position of the infrared cameras.
the hand of the experimenter to the target was kept constant (~30 cm). The type of movement performed by the robotic arm and the experimenter differed in that, for the robot, the programmed accelerative and decelerative phases of the movement were similar for small and large objects alike. For humans, these phases differ (as demonstrated by our baseline data where kinematics for the human prime were measured). Both the opacity of the lenses and the initiation of movement by the robotic arm were controlled by the computer. At the beginning of each trial involving the use of the robotic arm, the experimenter at the computer console pressed the data acquisition button, the spectacles cleared, and the robotic arm started to move at a delay of 0.5 s from the opening of the glasses. When the experimenter performed the reach-to-grasp action, he/she started the movement as soon as he/she detected the clearance of the spectacles. Subjects wore earplugs to avoid noise produced by the experimenter during the re-positioning of the object after each trial. It was felt that such noise could provide information regarding the type of trial to be performed (valid or invalid). Procedure. Subjects were asked to perform the task as indicated by a tape-recorded set of instructions. The sequence of events was as follows: after the 1rst opening of the spectacles the subject had to observe the robotic arm or the human grasping the small or the large object. Then the spectacles were shut and when they re-opened the subject had to perform the grasping action towards the same object in the case of ‘valid’ trials or towards a different object in the case of ‘invalid’ trials. The target stayed on throughout the duration of the trial. The participants performed 400 randomized trials (100 per block) over which all possible types of trial/target-size/human–robot combinations occurred. Twenty trials for each combination were subsequently analysed.
aapc16.fm Page 321 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
Data processing. The ELIGRASP (B|T|S|, 1997) software package was used to analyse the data. This provided a three-dimensional reconstruction of the marker positions. The data were then 1ltered using a 1nite impulse response (FIR) linear 1lter-transition band of 1 Hz (sharpening variable = 2; cut-off frequency = 10 Hz). The reaching component was assessed by analysing the acceleration and the velocity pro1le of the wrist marker. The grasp component was assessed by analysing the distance between the two hand markers. Movement duration was taken as the time between movement onset and the end of the action (when the target was touched). The period following this, in which the target was lifted, was not assessed. Analysis of the reaching component of the action was based on the time of peak acceleration, velocity, and acceleration and the time from peak velocity to the end of the movement (the deceleration time). For the grasp component, the time to reach maximum peak grip aperture, the amplitude of the peak grip aperture, the peak velocity for the opening phase of the 1nger movements, and the time from maximum grip aperture to the end of the movement (the closing time) were analysed. Measurements were also taken for the opening phase of the hand movement in relation to the maximum velocity of the movement and the time at which each occurred. The measurement of the maximum grip aperture was based on the greatest distance reached between the thumb and the index 1nger, and the time of its occurrence. A prolonged deceleration time, and a lower amplitude of peak velocity for the reaching component of a grasp action, for smaller relative to larger stimuli are consistently reported in the reach-to-grasp literature (Castiello 1996; Gentilucci et al. 1991; Jakobson and Goodale 1992; Marteniuk, Leavitt, MacKenzie, and Athenes 1990). Differences in deceleration time and peak velocity should therefore be expected here as a function of the size of the target objects, and these differences are a necessary precondition for tests of priming. For the grasp component, we expect there to be a reduced maximum grip aperture for the smaller of the two stimuli, and the maximum grip aperture to be formed earlier in time (Castiello 1996; Gentilucci et al. 1991; Jakobson and Goodale 1992; Marteniuk et al. 1990). In addition we analysed the peak velocity of the 1ngers as they opened for the grip and the time taken to close the grip on the object, because (a) previous results have demonstrated differences in the rate of 1nger opening as a function of target size (Bon1glioli and Castiello 1998), and (b) closing time provides an index that is sensitive to reach-to-grasp strategies (Hoff and Arbib 1993). For each dependent variable an analysis of variance (ANOVA) was performed with type of primer (human, robot), type of trial (valid, invalid), and object size (small, large) as the within-subjects factors. Post-hoc comparisons were conducted on the means of interest using the Newman–Keuls procedure (alpha level: .05).
16.2.1.2 Results In this and the following experiments priming can be observed at three levels of speci1city:
• First level: by effector. In the Results sections this refers to the main effect of robotic versus human primer.
• Second level: by effector and object size (small vs. large object). In the Results sections this refers to the two-way interaction between target size and type of primer (robotic vs. human primer).
• Third level: by effector and trial type. This refers to the two-way interaction between type of primer and trial type and the three-way interaction between type of prime, type of trial, and size.
321
aapc16.fm Page 322 Wednesday, December 5, 2001 10:04 AM
322
Common mechanisms in perception and action
16.2.1.3 Effects of size Consistent with previous results within the reach-to-grasp literature, we found a longer movement duration [873 vs. 845 ms; F(1, 7) = 14.67, p = 0.001], a prolonged deceleration time [457 vs. 433 ms; F(1, 7) = 4.94, p = 0.05], and a lower peak velocity amplitude [F(1, 7) = 8.10, p = 0.01] for smaller than for larger stimuli (Castiello 1996; Gentilucci et al. 1991; Jakobson and Goodale 1992; Marteniuk et al. 1990). For the grasp component the maximum grip aperture occurred earlier in time [525 vs. 548 ms; F(1, 7) = 9.36, p = 0.01] and it was reduced in size for smaller relative to larger stimuli [70 vs. 84 mm; F(1, 7) = 17.32, p = 0.001] (Castiello, 1996; Gentilucci et al. 1991; Jakobson and Goodale 1992; Marteniuk et al. 1990). 16.2.1.4 First-level priming effects This section refers to the main effect of the type of primer (robot vs. human). Several parameters of the movements differed if the primer was the robot arm rather than the human model. In essence this was because subjects tended to adopt the parameters for responding to the large object when the primer was the robot, irrespective of the actual size of the prime. Thus, for the reaching component, the time to reach peak acceleration [236 vs. 257 ms; F(1, 7) = 28.16, p = 0.001], peak velocity [403 vs. 430 ms; F(1, 7) = 11.11, p = 0.01], and peak deceleration [570 vs. 587 ms; F(1, 7) = 8.44, p = 0.01] were decreased for trials where the robot arm was the primer rather than the human. For the grasp component there were differences in the accelerative phase as the 1ngers moved to their maximum aperture. The time to maximum acceleration of the 1ngers occurred earlier [83 vs. 97 ms; F(1, 7) = 9.76, p = 0.01] and the maximum acceleration was greater [2159 vs. 2063 mm/s 2; F(1, 7) = 17.21, p = 0.001] following robotic rather than human primers. 16.2.1.5 Second-level priming effects This section considers the interaction between target size and type of primer. We found differences in the movement parameters between human and robot primers as a function of the size of the target object. In each case, the parameters were close to those found for the large target object, on trials where the primer was a robot (for an example, see Fig. 16.2(a)). In contrast, on human primer trials, there was an effect of the size of the target object. For small relative to large stimuli there was a longer movement duration [F(1, 7) = 20.13, p = 0.001], a prolonged deceleration time [F(1, 7) = 27.02, p = 0.0001], a shorter time to achieve the maximum grip aperture [F(1, 7) = 7.47, p = 0.05], and a lowered size of maximum grip aperture [F(1, 7) = 11.06, p = 0.001]. The results for this section are rather important because they demonstrate that primed movement kinematics can in2uence the execution of grasping movements. This is clearly shown by the contrasting data for the robot condition, where no effects of size were found (though remember that the movement of the robot was similar for different target sizes). 16.2.1.6 Third-level priming effects Priming effects were apparent in two-way interactions involving type of primer (robot, human) and type of trial (valid, invalid), and one 3-way interaction for deceleration time, involving type of primer (robot, human), type of trial (valid, invalid), and target size (small, large). Let us consider the two-way interactions 1rst. Reaching component. There was a (type of prime) × (type of trial) interaction for: the time to reach peak acceleration [F(1, 7) = 7.34, p = 0.01], the time to obtain peak velocity [F(1, 7) = 8.53, p < 0.01], the time to reach peak deceleration [F(1, 7) = 9.12, p = 0.01], and the deceleration time
aapc16.fm Page 323 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
Fig. 16.2 The two-way interaction (Type of Primer by Size) obtained for Expts. 1, 2, and 4 for the measure amplitude of maximum grip aperture.
itself [F(1, 7) = 6.53, p = 0.01]. For each of these parameters, the values for valid and invalid did not differ if the robot appeared on the priming trials. However, when the primer was human, then in all cases differences emerged between valid trials on the one hand, and invalid trials on the other (all p < 0.05, Newman–Keuls tests). Figure 16.3(a) represents an example of this patterning for the parameter deceleration time. With the human primer, deceleration time was reduced for valid but not for invalid trials. Grasp component. For the grasp component, the interaction between the type of primer and type of trial was signi1cant for the following parameters: time to maximum grip aperture [F(1, 7) = 10.21, p = 0.001], maximum grip aperture [F(1, 7) = 20.11, p = 0.001], time to peak acceleration of 1nger opening for grip [F(1, 7) = 8.82, p = 0.01], and closing time [F(1, 7) = 10.44, p = 0.001]. As for reaching, there were no differences between valid and invalid trials when the primer was the robot arm. With the human primer, the time to peak acceleration of the opening grip and the time to obtain
323
aapc16.fm Page 324 Wednesday, December 5, 2001 10:04 AM
Common mechanisms in perception and action
(a) 500
Deceleration time (ms)
324
(b) 500
Experiment 1
450
450
400
400
350
350
300
300
(c) 500
(d) 500
Experiment 3
450
450
400
400
350
350
Experiment 2 Robot Human
Experiment 4
300
300 Valid
Invalid
Valid
Invalid
Fig. 16.3 The two-way interaction (Type of Primer by Type of Trial) obtained for deceleration time for the four experiments.
Fig. 16.4 The three-way interaction (Type of Primer by Type of Trial by Size) obtained for Expts. 1 and 3 for deceleration time.
aapc16.fm Page 325 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
maximum grip aperture were longer on valid trials; in addition, the maximum grip aperture was smaller and the closing time shorter for valid trials. The three-way interaction between type of primer, type of trial, and object size was signi1cant for deceleration time [F(1, 7) = 13.12, p = 0.001]. As before, deceleration times did not vary as a function of trial type following robot primers. However, following human primers, deceleration times were slower for invalid relative to valid trials, but this only occurred when the target was large. The threeway interaction is illustrated in Fig. 16.4.
16.2.1.7 Discussion These results demonstrate that third-level priming effects occurred on trials with human primers but not on trials with robot primers. When the primer was a robot, we failed to 1nd any changes in movement kinematics as a function of whether trials were valid or invalid. There were also few differences between the kinematics of the responses to large and small targets, with the movement kinematics being set towards large targets (see Fig. 16.2(a)). These responses following robot primers can be understood if subjects adopted a relatively conservative response strategy on these trials, initiating their movement with parameters set for large targets. The data from trials with human primers indicate that, on invalid trials, it was easier to adapt an action parameterized for a large object (from a large prime to a small target) than an action parameterized for a small object (from a small prime to a large target). Thus, following human primers there were few costs on invalid trials for small targets, whilst there were reliable costs for large targets. When subjects make a grasp response to a target, any adjustment from a large to a smaller grasp response will match the natural pattern of action in which the 1nger and thumb reach a maximum grasp aperture that is wider than the target to be grasped and then close around the target under guidance from visual feedback. In contrast, adaptation of a grasp from small to large will operate against the usual patterns of adjustment during reaching for grasping, generating asymmetric costs on performance. With the robot primer, subjects seem to adopt a strategy of minimal adjustment, and so move in all cases (irrespective of the size of the prime object) from an initial parameterization favouring a large grasp. As we have noted, quite different results occurred on trials following observation of the robot and of a human movement. This is interesting because the effects of the prime were mostly to disrupt action. For example, on invalid trials deceleration times were slowed when the prime was small and the target large (Fig. 16.4(a)). It would appear that subjects adapted their behaviour to match the observed primer action even though this was not necessarily bene1cial to their performance. The degree to which this imitation effect is under strategic control was tested in Experiment 2 when we reduced the informativeness of the priming event by only having prime and target actions valid on half the trials. Strategic use of the priming event should be lessened under these conditions. The fact that we found third-level priming effects only with human and not robot primers indicates that priming was not due to subjects’ preprogramming actions based on (a) the size of the priming object (note that this was predictive of the size of the target), and (b) a memory of the action parameters used for the predicted target. If priming were due to either of these factors, then differences between valid and invalid trials should have occurred for robot primers as well as for human primers. 16.2.2 Experiment 2: testing the automaticity of the priming effects Consider a trial where the subject observes a primer action made to the small object. This event may lead subjects to set parameters for small grasp actions, even though a large target may subsequently
325
aapc16.fm Page 326 Wednesday, December 5, 2001 10:04 AM
326
Common mechanisms in perception and action
be presented. As a consequence of this, subjects show a cost effect when reaching to a subsequent target. This kind of adjustment could itself operate in one of two ways. One could be strategic, with subjects taking account of the transitionary changes in validity rather than the overall information carried by prime events. The other could be more automatic, based on some form of reinforcement learning operating on a trial-by-trial basis. Whichever is the case, the important point to stress is that similar effects were not found unless subjects observed another human performing the priming action, though they could have used the size of the prime object in a similar way. This raises the question of how strategic preprogramming is. To investigate this preprogramming issue we ran a further study with 50–50 valid to invalid contingencies. Under these conditions it should not be strategically bene1cial for subjects to preprogram the movement.
16.2.2.1 Methods Participants. Eight subjects (4 women and 4 men, aged 20–25 years) with the same characteristics as those in the previous experiment volunteered to participate. None of them had participated in the previous experiment. They attended one experimental session of 1 hour duration in total. Apparatus, materials, procedure and data processing. These were the same as for Experiment 1, except that the ratio between valid and invalid trials was 50%, and the number of performed trials was reduced from an overall value of 400 to 160.
16.2.2.2 Results Data were analysed as for Experiment 1. Effects of size. In contrast to the 1ndings for Experiment 1 we found that deceleration time and the amplitude of peak velocity were similar for smaller and larger stimuli. For the grasp component we found no differences for the time of maximum grip aperture and the amplitude of grip aperture between smaller and larger stimuli, and, for both stimuli, movement parameters seemed to be set for the larger object (cf. Table 16.1 and Fig. 16.2(b)). This suggests that a conservative response strategy was adopted when primes were not valid on a majority of trials.
Table 16.1 Movement duration and kinematic parameters of the subjects’ movements for Small and Large conditions for Exp. 2. SEM in parenthesis Small
Large
F
Sig.
Reaching component Deceleration time (ms) Amplitude peak velocity (mm/s)
339 (16) 696 (31)
353 (13) 707 (36)
2.15
ns
2.24
ns
Grasp component Time to maximum grip aperture (ms) Maximum grip aperture (mm)
495 (32) 91 (3)
515 (29) 88 (3)
3.53
ns
3.12
ns
aapc16.fm Page 327 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
First-level priming effects. As for Experiment 1, several parameters of the movements were accelerated after seeing the robotic rather than the human arm as the primer. Reaching component. For the reaching component, anticipation was evident for the time to peak acceleration [247 vs. 263 ms; F(1, 7) = 28.14, p = 0.0001], the time to peak velocity [398 vs. 416 ms; F(1, 7) = 20.16, p = 0.001], and the time to peak deceleration [536 vs. 559 ms; F(1, 7) = 10.08, p = 0.02]. Grasp component. For the grasp component it was the accelerative opening phase of the 1nger to a maximum that differed between the robotic and the human conditions. The time to peak gripopening acceleration was earlier [139 vs. 150 ms; F(1, 7) = 6.16, p = 0.05] and greater (2576 vs. 2342 mm/s2; F(1, 7) = 11.65, p = 0.001) for the robotic than for the human condition. Second-level priming effects. The interaction between type of primer (human, robot) and target size was not signi1cant for any of the dependent measures analysed. For example, as represented in Fig. 16.2(b), for both the human and robot primers no differences for the small and the large objects were found for the amplitude of maximum grip aperture. For both human and robot primers, subjects adopted movement parameters for a large object at the start of each trial, leading to a reduction in the overall effects of object size on reaching and grasping. Third-level priming effects. Despite the overall effects of object size being the same on target actions following both human and robot primers, there remained a differential priming effect. The two-way interaction between type of primer and type of trial was signi1cant for both reaching and grasping components measures. An example of this effect for deceleration time is shown in Fig. 16.3(b). Reaching component. For the reaching component the time to attain peak velocity occurred at the same time for valid and invalid trials when the robot was the primer. However, with a human primer, the peak velocity was attained earlier in time on invalid trials relative to valid trials [F(1, 7) = 7.05, p < 0.01]. The same held for the time to peak deceleration [F(1, 7) = 6.20, p = 0.05]. Grasp component. For the grasp component, the interaction between the type of primer and type of trial was signi1cant for the following parameters: time to maximum grip aperture [F(1, 7) = 10.48, p = 0.01] and the amplitude of maximum grip aperture [F(1, 7) = 6.66, p = 0.01]. When the robot was the primer, the time to maximum grip aperture, and the maximum aperture obtained, did not differ for valid and invalid trials. However, with a human primer the maximum grip aperture took place earlier in time on invalid trials than on valid trials. Post-hoc comparisons revealed that the differences between valid trials and invalid trials were signi1cant ps < 0.05.
16.2.2.3 Discussion When primes were not generally valid, subjects tended to adopt a conservative response strategy with human and robot primers alike, with responses tending to be set for the larger of the two target objects. Nevertheless, some third-level priming effects were apparent when the primer was human, whilst there was little differential effect of the robot primer. In particular, maximum grip aperture tended to be reduced, the time taken to reach this point was delayed, and the time to attain peak velocity for the reach component of the movement was more prolonged on valid relative to invalid trials. These results mimic the data from Experiment 1 and suggest some adjustment of the parameters of the movement on valid trials, particularly when the prime was small (with the conservative, ‘wide grip’ parameters being set less often).
327
aapc16.fm Page 328 Wednesday, December 5, 2001 10:04 AM
328
Common mechanisms in perception and action
16.2.3 Experiment 3: testing differences between robot and human primes: is how much of a body you can see important? The results from the previous two experiments suggest that there are differences in the different levels of movement priming between a robotic and a human hand. In the third experiment we investigated whether the priming effects found with the human primer in Experiments 1 and 2 were due to the fact that, in the robot condition, only a forearm/hand was visible while, for the human actor, many other cues were available (i.e. the face, the upper body). The motivation for these experiments comes from possible functional differences between, on the one hand, neurons within STS showing selective neuronal responses to the sight of actions of the hand (Perrett et al. 1989) and, on the other hand, neurons found in area F5 (Rizzolatti and Arbib 1998). This difference is that neurons in STS do not respond (as neurons in area F5 do) to executed motor acts, but rather only to perceived ones. Also, in studies of STS and mirror neurons by Perrett and colleagues and Rizzolatti and colleagues, similar to those of Experiments 1 and 2 of the present study, the entire body and face of the experimenter performing the action was visible. Thus it may be possible that our forearm/hand robot did not activate the neural system concerned with movement execution in order to generate motor priming effects in Experiments 1 and 2. To clarify this question we performed an experiment where only the forearm and the hand of the human actor were visible.
16.2.3.1 Methods Participants. Eight subjects (4 women, 4 men, aged 20–25 years), with the same characteristics as those who took part in the previous experiments, volunteered to participate. None of them had participated in the previous experiments. They attended two experimental sessions of 4 hours duration in total. Apparatus, materials, procedure, and data processing. These were the same as for Experiment 1, except that the experimenter was hidden behind a thick black net curtain and only the arm was visible to the subjects. Prime actions were valid on 80% of the trials.
16.2.3.2 Results Data were analysed as a means of comparison between Experiment 1 and Experiment 3. Experiment (1 vs. 3) was the between-subjects factor. Type of primer (human, robot), type of trial (valid, invalid), and object size (small, large) were the within-subjects factors. The four-way interaction between experiment, type of primer, type of trial, and object size was signi1cant for deceleration time [F(1, 7) = 8.54, p = 0.01] and the amplitude of peak velocity [F(1, 7) = 11,23, p = 0.001]. Deceleration time and the amplitude of peak velocity did not vary as a function of trial type and size following robot primes. However, following human primes, deceleration time was shorter and the amplitude of peak velocity was lower for invalid trials relative to valid trials. Further, deceleration time was shorter and the amplitude of peak velocity was lower for small invalid trials (see Fig. 16.4). 16.2.3.3 Discussion The present experiment was performed mainly because the STS region is activated by movements of various body parts (Perrett et al. 1989). In the earlier experiments here, more body information was visible for the human primer than for the robot primer, and this may have in2uenced the subjects’ performance. However, the data from Experiment 3 con1rm the reliability of the present
aapc16.fm Page 329 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
effects, and con1rm that the effects are not dependent on subjects seeing more than the arm of the human primer. Priming effects at all levels were obtained from the sight of a human arm reaching and grasping an object. With a robot primer, subjects again tended to adopt a conservative response strategy suitable for the larger object, irrespective of whether the target was large or small, and irrespective of the size of the prime object. Sight of the human model body is not necessary for priming to occur.
16.2.4 Experiment 4: testing the differences in kinematics between robot and human primers The contrasting results from the human and robot primers in the earlier experiments could be due to the differences between a conspec and a robot as a primer, or to the fact that the kinematics for the robot did not differentiate between large and small objects. In Experiment 4 we clarify this point, normalizing the kinematics for a human prime with respect to object size. We asked a naïve human primer to perform the movement blindfolded within an allotted time. These constraints were felt suf1cient to normalize the movement of the human primer with respect to object size. We evaluated whether this normalized movement, made by a human primer, was now equivalent to the robot primer.
16.2.4.1 Methods Participants. Six subjects (4 women, 2 men, aged 25–30 years) with the same characteristics as those who took part in the previous experiments volunteered to participate. None of them had participated in the previous experiments. They attended one experimental session of 2 hours duration in total. Apparatus, materials, procedure, and data processing. These were the same as for Experiment 1, except that for this experiment the naïve subject was asked to be the human primer. The human primer was trained to reach for the small and the large object within an interval corresponding to the time employed by the robotic hand to complete the movement (800 ms; ± 25 ms). The interval was de1ned by two sounds (200 ms sound duration; 880 Hz). This interval was the same for the small and the large object. Further, the human primer was blindfolded so he could not see his arm while reaching for the object. In other words, he did not know the size of the object he was grasping. This led to the primer using a movement patterning that was very similar to that for the robot. That is, the hand opened widely and closed on the object only after having touched it with the palm. During the experimental session the subject wore earphones so as not to hear the two sounds when the ‘instructed’ primer was demonstrating the movement. Prime actions were valid on 80% of the trials.
16.2.4.2 Results Of relevance for the present study is that the normalization procedure was successful. Kinematics analyses showed that there were no differences for movements directed to the small and the large object for the ‘instructed’ primer (see Table 16.2, where the values are presented as a percentage of movement duration). Thus, in this study, the human primer acted similarly to the robot primer. As found for Experiment 2 there was a tendency for the subject to program movement in terms of the larger object (see Fig. 16.2(c)). Data were analysed as for Experiment 1 and are summarized as follows. Several parameters of the movements differed if the primer was the robot arm relative to the human model (1rst-level priming effects). Thus, for the reaching component, the times to reach peak acceleration [242 vs. 265 ms;
329
aapc16.fm Page 330 Wednesday, December 5, 2001 10:04 AM
330
Common mechanisms in perception and action
Table 16.2 Kinematic parameters of the subjects’ and the human primer’s movements for Small and Large conditions for Exp. 4. Temporal measures are expressed as a percentage of movement duration. SEM in parenthesis Subjects
Human primer
Statistics
Small
Large
Small
Large
F
Sig.
Reaching component Deceleration time (%) Amplitude peak velocity (mm/s)
57 (7) 549 (6)
56 (7) 544 (9)
56 (6) 552 (7)
56 (5) 550 (6)
0.21
ns
1.04
ns
Grasp component Time to maximum grip aperture (%) Maximum grip aperture (mm)
48 (3) 96 (3)
48 (3) 97 (4)
48 (4) 97 (4)
49 (5) 97 (4)
2.12
ns
1.43
ns
F(1, 5) = 17.16, p = 0.001] and peak velocity [421 vs. 446 ms; F(1, 5) = 5.01, p = 0.05] were decreased for trials where the robot arm was the primer rather than the human. For the grasp component there were differences in the accelerative phase as the 1ngers moved to their maximum aperture. The time to maximum acceleration of the 1ngers occurred earlier [100 vs. 115 ms; F(1, 5) = 4.86, p = 0.05] following robotic rather than human primers. In contrast to Experiment 1, second-level priming effects were not found (the interaction between type of primer and size was not signi1cant for any of the dependent measures). These results further suggest that the actions of the ‘normalized’ human primer were interpreted independently of the size of the object. Priming effects were not apparent in the two-way interactions involving type of primer and type of trial (second-level priming effects). Further, no third-level priming effects were found. The interaction type of primer × type of trial was not signi1cant for any of the dependent measures (e.g. Fig. 16.3(d)). The values for valid and invalid trials did not differ for both the robot and the human primes. The data from Experiment 4 con1rm the reliability of the differences between the robot and the human primer for the reach component of performance. We also failed to 1nd evidence of the size of the prime object on movements to the target. This is not surprising, however, given that the kinematics of the human primer did not differ to large and small objects. Despite this, kinematics were delayed following the human primer relative to the robotic primer, as we found in the earlier studies.
16.3 General discussion We have reported four experiments showing that priming effects at all levels seem to occur whenever the observer is exposed to a human versus a robot arm. First, there appears to be some conspec advantage, which is completely unrelated to things like object size, trial type, or kinematics. Second, the more speci1c forms of priming (level 2 and 3) appear to be fully dependent on model kinematics. In Experiments 1 and 4, speci1c priming is only seen for the human hand when it operates naturally.
aapc16.fm Page 331 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
It is not seen for the robot and it is not seen for the human hand when its kinematics do not differentiate between the two conditions. At 1rst glance it may be argued that the different results obtained for the human model and the robot hand are present because the robot’s hand kinematics do not differentiate between large and small objects. In other words, the whole pattern of results can be interpreted in such a way that the kinematics of the movement prime the action irrespective of whether the kinematics are shown by a human or a robot. However, we clari1ed this issue in Experiment 4, where a human model was constrained to perform the same kinematics for both the large and the small objects. The results for this experiment con1rm those of the other experiments, suggesting that a robot arm is perceived in a way that is different from a human arm (since the reach components of action remained selectively accelerated for robot primers). Consistent with this, studies of functional imaging in humans have found no evidence for either premotor or frontal activation when movements of a hand have been observed in a virtual reality system (Decety et al. 1994). The robot hand here, and the virtual hand in Decety et al. (1994) seem not to engage cells which mediate immediate visually guided action (see also Gallese et al., this volume, Chapter 17). This lack of engagement with a robot arm is particularly evident when looking at the relationship between the type of primer and the type of trials. In the human condition, subjects appeared to preprogram a response based on the prime, and then use this to guide action. As a consequence, on invalid trials they had to amend their motor program to respond to the properties of the stimulus. This leads to acceleration of the action, that is, the accelerative part of both the reaching and the grasp components is anticipated. The effect of the human primer action on deceleration time was longer for ‘invalid’ than for ‘valid’ trials when a large stimulus was presented (third-level priming effects). In contrast, for the robot trials subjects appeared not to preprogram the movement for the ‘valid’ condition, so that no ‘invalid’ effect is noticed. Again, a possible explanation is that the robot’s kinematics are similar for both the small and the large objects, thus the priming effect is not evident because subjects are coding for similar types of actions rather than for different types of objects. Another possible explanation for the present results is to consider the robot as a control condition with kinematics held constant. If viewed in this way, the results from Experiment 1 could be taken to show that kinematics matter. This point is also con1rmed on the basis of the results of Experiment 4 where the kinematics of the human primer did not differ for large and small objects. The conclusion that kinematics is relevant (at least as regards level-2 and level-3 priming) is not only supported by comparing the natural human arm (Experiment 1) with the robot arm (Experiments 1 and 4), but also by comparing it with a nonnaturally moving human arm (Experiment 4). This indicates, in line with the results obtained by Kerzel et al. (2000) and Stürmer et al. (2000), not only that participants were able to reproduce the actor’s pattern, showing correspondence between the stimulus (actor’s movement) and response (observer’s movement) gestures, but also that this similar representational system not only matches the perceptual information of a seen act with proprioceptive information concerning an executable act but also takes movement kinematics into account. Recently, Bekkering and colleagues (2000) postulated a new view on the representations that mediate perception and action in imitation. They suggest a motor-pattern process that is guided by an interpretation of the motor pattern as a goal-directed behaviour. The present results do indicate that the desired goal of the action—for instance, grasping an object—can be preprogrammed from observation of a prior action. Further, it is shown that this computation is chie2y driven by primed movement kinematics rather than by object size. We believe that this is one of the most interesting
331
aapc16.fm Page 332 Wednesday, December 5, 2001 10:04 AM
332
Common mechanisms in perception and action
of the present results. It shows that not only target size, as has been shown many times in the literature, but also primed movement kinematics can in2uence the execution of grasping movements. Also, since both the robot and the constrained human primer did not elicit priming effects, it appears that the source of priming was not the perceived size of the prime (from which the predicted size, and associated movement patterns for the target, could be generated). In conclusion, we have demonstrated motor priming effects when human subjects see an action to an object by a human primer. There are also general differences in reach kinematics after observing a human relative to a robotic primer, even when the speci1c grasp components are not primed. We speculate that the neural basis for these priming effects may reside in the specialized circuitry revealed by physiological and functional imaging studies in the superior temporal, inferior parietal, and inferior frontal lobes.
Acknowledgments This work was supported by an NHMRC and by a Wellcome Trust grant to UC. Morena Mari was supported by an NHMRC grant to UC. Professor Glyn Humphreys was supported by a visiting scholar grant awarded to UC by the University of Melbourne, and by grants from the MRC and the Wellcome Trust. Bruce Ferabend is thanked for assembling the robot used in the present study. We would like to thank Gisa Aschersleben, Harold Bekkering, and Wolfgang Prinz for their comments on previous versions of this manuscript.
References Bekkering, H., Gattis, M., and Wohlschläger, A. (2000). Imitation of gestures in children is goal-directed. Quarterly Journal of Experimental Psychology, 53A, 153–164. Bon1glioli, C. and Castiello, U. (1998). Dissociation of covert and overt spatial attention during prehension movements: Selective interference effects. Perception and Psychophysics, 60, 1426–1440. Brass, M., Bekkering, H., Wohlschläger, A., and Prinz, W. (2000). Compatibility between observed and executed 1nger movements: Comparing symbolic, spatial, and imitative cues. Brain and Cognition, 44, 124–143. Castiello, U. (1996). Grasping a fruit: Selection for action. Journal of Experimental Psychology: Human Perception and Performance, 22(3), 582–603. Craighero, L., Fadiga, L., Rizzolatti, G., and Umiltà, C. (1998). Visuomotor priming. Visual Cognition, 5, 109–125. Decety, J. and Grèzes, J. (1999). Neural mechanisms subserving the perception of human actions. Trends in Cognitive Sciences, 3, 172–178. Decety, J., Perani, D., Jeannerod, M., Bettinardi, V., Tadary, B., Woods, R., Mazziotta, J.C., and Fazio, F. (1994). Mapping motor representations with positron emission tomography. Nature, 371, 600–602. di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180. Fadiga, L., Fogassi, L., Pavesi, G., and Rizzolatti, G. (1995). Motor facilitation during action observation: A magnetic stimulation study. Journal of Neurophysiology, 73, 2608–2611. Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609. Gentilucci, M., Castiello, U., Corradini, M.L., Scarpa, M., Umiltà, C., and Rizzolatti, G. (1991). In2uence of different types of grasping on the transport component of prehension movements. Neuropsychologia, 29, 361–378. Grafton, S.T., Arbib, M.A., Fadiga, L., and Rizzolatti, G. (1996). Localisation of grasp representations in humans by PET: 2. Observation versus imagination. Experimental Brain Research, 111, 103–111. Hoff, B. and Arbib, M.A. (1993). Models of trajectory formation and temporal interaction of reach to grasp. Journal of Motor Behavior, 25, 175–192.
aapc16.fm Page 333 Wednesday, December 5, 2001 10:04 AM
Observing a human or a robotic hand grasping an object: differential motor priming effects
Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J., and Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526–2528. Jakobson, L.S. and Goodale, M.A. (1992). Factors affecting higher-order movement planning: A kinematic analysis of human prehension. Experimental Brain Research, 86, 199–208. Kerzel, D., Bekkering, H., Wohlschläger, A., and Prinz, W. (2000). Launching the effect: Representations of causal movements are in2uenced by what they lead to. Quarterly Journal of Experimental Psychology, Section A: Human Psychology, 53, 1163–1185. Marteniuk, R.G., Leavitt, J.L., MacKenzie, C.L., and Athenes, S. (1990). Functional relationships between the grasp and transport components in a prehension task. Human Movement Science, 9, 149–176. Oram, M.W. and Perrett, D.I. (1996). Integration of form and motion in the anterior superior polysensory area (STPa) of the macaque monkey. Journal of Neurophysiology, 76, 109–129. Perrett, D.I., Rolls, E.T., and Caan, W. (1982). Visual neurones responsive to faces in the monkey temporal cortex. Experimental Brain Research, 47, 329–342. Perrett, D.I., Harris, M.H., Bevan, R., and Thomas, S. (1989). Framework of analysis for the neural representation of animate objects and actions. Journal of Experimental Biology, 146, 87–113. Piaget, J. (1951). Play, dreams, and imitation in childhood. W.W. Norton. Prinz, W. (1990). A common-coding approach to perception and action. In O. Neumann and W. Prinz (Eds.), Relationships between perception and action: Current approaches, pp. 167–201. Berlin, New York: Springer-Verlag. Prinz, W. (in press). Experimental approaches to imitation. In A. Meltzoff and W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases. Cambridge: Cambridge University Press. Rizzolatti, G. and Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–194. Rizzolatti, G., Fadiga, L., Matelli, M., Bettinardi, V., Perani, D., and Fazio, F. (1996). Localization of cortical areas responsive to the observation of hand grasping movements in humans: a PET study. Experimental Brain Research, 111, 246–256. Romanes, G.J. and Darwin, C. (1884). Mental evolution in animals. Appleton and Co. Stürmer, B., Aschersleben, G., and Prinz, W. (2000). Correspondence effects with manual gestures and postures: A study of imitation. Journal of Experimental Psychology: Human Perception and Performance, 26, 1746–1759. Thorndike, E.L. (1898). Animal intelligence: An experimental study of the associative process in animals. Psychological Review Monograph, 2, 551–553. Vogt, S. (in press). Visuomotor couplings in object-oriented and imitative actions. In A. Meltzoff and W. Prinz (Eds.), The imitative mind: Development, evolution and brain bases. Cambridge: Cambridge University Press. Woodward, A.L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69, 1–34. Woodward, A.L. (1999). Infants’ ability to distinguish between purposeful and nonpurposeful behaviors. Infant Behavior and Development, 22(2), 145–160.
333
aapc17.fm Page 334 Wednesday, December 5, 2001 10:05 AM
17 Action representation and the inferior parietal lobule Vittorio Gallese, Luciano Fadiga, Leonardo Fogassi, and Giacomo Rizzolatti
Abstract. From birth onwards, primates’ cognitive development depends heavily on being able to observe and interact with other individuals. How can individuals assign a meaning to the actions performed by other conspeci1cs? A possible neural correlate of the mechanism allowing action understanding could be represented by a class of neurons (mirror neurons) that we have discovered in area F5 of the ventral premotor cortex of the macaque monkey. We proposed that mirror neurons could be part of a cortical system that, by matching action observation with action execution, enables individuals to ‘understand’ the behavior of others. The present study is aimed at better clarification of the nature and the properties of such a cortical matching system. Neurons responding to the observation of complex actions have been described by Perrett and co-workers in the cortex buried within the superior temporal sulcus (STS). These neurons could be a particularly well-suited source of visual input for F5 mirror neurons. However, area F5 does not receive direct projections from the STS region. One of its major inputs comes from the inferior parietal lobule and in particular from area PF (7b). The inferior parietal lobule, in turn, is reciprocally connected with the STS region. We therefore decided to study the functional properties of area PF by means of single neuron recording experiments. About 20% of the recorded neurons responded both during action execution and action observation, and therefore, in analogy with the neurons described in area F5, we de1ned them as ‘PF mirror neurons’. Furthermore, a subset of PF mirror neurons matched hand action observation to mouth action execution. A possible hypothesis is that this latter class of PF neurons may represent a ‘primitive’ matching system. Taken together, these data indicate that an action observation/execution matching system does also exist in the parietal cortex, possibly constituting a building block of a cortical network for action understanding.
17.1 Introduction Primates are social animals. Their societies are characterized by complex and sophisticated rules disciplining the various types of interactions entertained by the single individuals within their group. This requires the capacity to recognize individuals within a social group, to assign a social rank to oneself as well as to others, and to be able to comply with the rules that such a complex hierarchical social environment entails. From birth onwards, primates’ cognitive development depends heavily on being able to observe and interact with other individuals. Action observation appears therefore to be very important in order to build a meaningful account of conspeci1cs’ behavior. How can individuals assign a meaning to the actions performed by other conspeci1cs? A possible neural correlate of the mechanism allowing action understanding could be represented by a class of neurons (mirror neurons) that we have discovered in area F5 of the ventral premotor cortex of the macaque monkey (di Pellegrino, Fadiga, Fogassi, Gallese, and Rizzolatti 1992; Gallese, Fadiga, Fogassi, and Rizzolatti 1996; Rizzolatti, Fadiga, Gallese, and Fogassi 1996a). Mirror neurons are activated during the execution of purposeful, goal-related hand movements, such as grasping, holding, or manipulating objects, and they also discharge when the monkey observes similar
aapc17.fm Page 335 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
hand actions performed by another individual. Mirror neurons require, in order to be activated by visual stimuli, an interaction between the action’s agent (human being or a monkey) and the object. Control experiments showed that neither the sight of the agent alone nor of the object alone were effective in evoking the neuron’s response. Similarly, mimicking the action without a target object or performing the action by using tools were poorly effective (Gallese et al. 1996). Frequently, a strict congruence was observed between the observed action effective in triggering mirror neurons and the effective executed action. In one-third of the recorded neurons the effective observed and executed actions corresponded in terms of both the general action (e.g. grasping) and in the way in which that action was executed (e.g. precision grip). In the other two-thirds only a general congruence was found (e.g. any kind of observed and executed grasping elicited the neuron’s response). We proposed that mirror neurons could be part of a cortical system that, by matching action observation with action execution, enables individuals to ‘understand’ the behavior of others (Gallese et al. 1996; Rizzolatti et al. 1996a). It must be stressed that several studies that used different methodologies have demonstrated the existence of a similar matching system in humans (see Cochin, Barthelemy, Lejeune, Roux, and Martineau 1998; Fadiga, Fogassi, Pavesi, and Rizzolatti 1995; Grafton, Arbib, Fadiga, and Rizzolatti 1996; Grèzes, Costes, and Decety 1998; Hari et al. 1999; Iacoboni et al. 1999; Rizzolatti et al. 1996b). All these studies suggested that humans have a ‘mirror matching system’ similar to that originally discovered in monkeys. Whenever we are looking at someone performing an action, beside the activation of various visual areas, there is a concurrent activation of the motor circuits that are recruited when we ourselves perform that action. Although we do not overtly reproduce the observed action, nevertheless our motor system becomes active as if we were executing the very same action that we are observing. The present study was aimed at better clarification of the nature and the properties of such a cortical matching system in the monkey brain. Neurons responding to the observation of complex actions, such as grasping or manipulating objects, have been described by Perrett and co-workers in the cortex buried within the Superior Temporal Sulcus (STSa, see also Jellema and Perrett, this volume). These neurons, whose visual properties are for many aspects similar to those of mirror neurons, could constitute mirror neurons’ source of visual information. The STS region, however, has no direct connection with area F5, but has links with the anterior part of the inferior parietal lobule (area PF or 7b), which in turn is reciprocally connected with area F5 (Matelli, Camarda, Glickstein, and Rizzolatti 1986; see also Rizzolatti, Luppino, and Matelli 1998). Area PF, or 7b, is located on the convexity of the inferior parietal lobule. It receives inputs from primary sensory areas (mostly area 2) and the second somatosensory area (area SII). It projects caudally to the adjacent areas located on the convexity of the inferior parietal lobule (PFG, PG) and on the lateral bank of the intraparietal sulcus (Pandya and Seltzer 1982). Outside the parietal cortex its main projections are to the ventral premotor cortex (Matelli et al. 1986; see also Rizzolatti et al. 1998), to SII, and to the prefrontal cortex (area 46). Single neuron studies showed that the majority of PF neurons respond to passive somatosensory stimuli (touch, joint displacement, muscle palpation) (Fogassi, Gallese, Fadiga, and Rizzolatti 1998; Graziano and Gross 1995; Hyvärinen 1981; Leinonen and Nyman 1979; Leinonen et al. 1979). The tactile receptive 1elds are large, frequently covering an entire arm, leg, or the chest. A considerable number of neurons can be activated by visual stimuli. About half of them are bimodal neurons responding to both visual and somatosensory stimuli (Graziano and Gross 1995; Leinonen et al. 1979). About one-third of PF neurons 1re during the animal’s active movements (Leinonen et al.
335
aapc17.fm Page 336 Wednesday, December 5, 2001 10:05 AM
336
Common mechanisms in perception and action
1979). Reaching with the arm, hand manipulation, and reaching with the mouth are the most frequently represented movements. Area PF, through its connection with STSa, on one hand, and F5, on the other, could play the role of an ‘intermediate step’ within a putative cortical network for action understanding, by feeding to the ventral premotor cortex visual information about actions as received from STSa. We decided therefore to study the functional properties of area PF by means of single neuron recording experiments. Neuron properties were examined during active movements of the monkey and in response to somatosensory and visual stimuli. Visual stimuli also included goal-related hand movements. About one-third of the recorded neurons responded both during action execution and action observation. These data indicate that an action observation/execution matching system does also exist in the parietal cortex, possibly constituting a building block of a cortical network for action understanding. The results of the present study will be discussed within a theoretical framework stressing the role played by the motor system in the representation of intentional actions. A preliminary report of these data appeared elsewhere (Fogassi et al. 1998).
17.2 Methods Electrical activity from single neurons was recorded from the rostral part of the inferior parietal lobule (area PF) in one monkey (Macaca nemestrina). All experimental protocols were approved by the Veterinarian Animal Care and Use Committee of the University of Parma and complied with the European law on the humane care and use of laboratory animals.
17.2.1 Neuron testing and behavioral paradigm During the recording session, the monkey was awake and seated on a primate chair, with the head 1xed. Once a neuron was isolated, its sensory and motor properties were 1rst tested (for a full description, see Rizzolatti et al. 1988). The somatosensory properties of the recorded neurons were tested using touch of the skin, hair bending, light pressure of the tissue, and slow and fast rotation of the joints. All testings were done with eyes open and closed. Visual properties were studied by presenting 3D objects by hand at different space locations and different distances from the monkey. After presentation, they were also moved, starting from different angles, toward and away from the monkey, or along a tangential plane at different distances from the monkey. The borders of the visual responding region (3D visual RF) were considered to be the external limits of that part of space whose crossing gave constant responses. In addition, all recorded neurons were studied by examining their response to the observation of actions performed by the experimenter in front of the monkey (for a full description, see Gallese et al. 1996). In brief, these actions were related to grasping, manipulating, holding, and placing objects. All these actions were performed at different distances from the monkey with the right, the left, or both hands of the experimenter. Furthermore, gestures with or without emotional content, such as threatening gestures, lifting the arms, waving the hand, etc., were executed in front of the monkey. To verify whether the recorded neurons were speci1cally activated by the observation of hand– object interactions, the following actions were also performed: prehension movements of objects performed with tools (e.g. pliers), mimicking object-related hand actions in absence of the target objects.
aapc17.fm Page 337 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Motor properties of recorded neurons were studied in both light and dark conditions. Different objects of various sizes and shapes were presented in the different quadrants of the visual space of the monkey that reached for and grasped them. By examining the large variety of proximal–distal movement combinations it was usually possible to assess which proximal or distal movement was effective in triggering a given neuron (for details, see Rizzolatti et al. 1988).
17.2.2 Physiological procedures and data recording The surgical procedures for the construction of the head implant were the same as described in previous studies (for details, see Gentilucci et al. 1988). Single neurons were recorded using tungsten microelectrodes (impedance 0.5–1.0 MΩ, measured at 1 kHz) inserted through the dura. Neuronal activity was ampli1ed and monitored with an oscilloscope. Individual action potentials were isolated with a time–amplitude voltage discriminator (Bak Electronics, Germantown, MD). The output signal from the voltage discriminator was monitored and fed to a PC for analysis. By using a contact-detecting circuit, a signal was sent to a PC whenever the monkey or the experimenter touched a metal surface with their hand or mouth. This signal allowed the alignment of the histograms with the moment in which the motor action, performed either by the monkey or by the experimenter, was completed. The same contact-detecting circuit was also used to record visual and somatosensory responses: a signal was sent to the PC when the stimulus was introduced to the monkey’s visual 1eld, or touched the monkey’s skin, respectively. Response histograms were constructed by summing ten individual trials.
17.2.3 Identi1cation of the recorded region The monkey from which all the neurons presented in this study have been recorded is still alive. Therefore, the identi1cation of the recorded region has been achieved on the basis of its functional properties and of the neighboring regions. We 1rst recorded from the hand and face region of SI and then moved backwards at regular steps.
17.3 Results We recorded from 236 PF neurons in one hemisphere of one monkey. Figure 17.1 shows the region of area PF that was studied. As previously shown (Hyvärinen 1981; Leinonen and Nyman 1979), the vast majority (n = 220, 93%) of PF neurons responded to sensory stimuli. The responsive neurons were subdivided into three categories: ‘somatosensory’ neurons, ‘visual’ neurons, and bimodal ‘somatosensory and visual’ neurons (see Table 17.1).
17.3.1 Somatosensory response properties Of the 220 PF neurons responding to passive stimulation, 196 (89%) were activated by somatosensory stimuli. The somatosensory properties of ‘somatosensory’ and bimodal PF neurons were similar and therefore will be described together. Out of 196 neurons that responded to somatosensory stimulation, 138 (70.5%) were activated by light touch, 52 (26.5%) by pressure applied to the skin or passive movement of the joints, and 6
337
aapc17.fm Page 338 Wednesday, December 5, 2001 10:05 AM
338
Common mechanisms in perception and action
Table 17.1
Passive properties of the recorded neurons
Property
No. of neurons
% of neurons
Somatosensory Visual Somatosensory and visual
72 24 124
33 11 56
Total
220
100
Fig. 17.1 Lateral view of the left hemisphere of a standard macaque monkey brain. The posterior parietal cortex is parcellated according to Pandya and Seltzer (1982). The agranular frontal cortex is parcellated according to Matelli et al. (1985). The shaded area indicates the part of area PF explored in this study. The two asterisks indicate the presumed location of two penetrations carried out in SI, where the face and the 1ngers were represented, respectively. (cs = central sulcus; ias = inferior arcuate sulcus; ls = lateral sulcus; sas = superior arcuate sulcus; sts = superior temporal sulcus.)
(3%) by both touch and joint rotation or deep pressure. The tactile receptive 1elds (RFs) of PF neurons were typically large. They were located most frequently on the face, or on the neck, chest, and arm. Table 17.2 summarizes the number and percentage of different body part locations of the RFs of all somatosensory neurons. Of the 126 neurons whose RFs were located on the face, 109 responded to the stimulation of the lower face, 14 to the stimulation of the upper face, and 3 to the stimulation of both parts of the face. Most RFs were contralateral to the recorded side (73%), some extended bilaterally (22%), and a few were strictly ipsilateral (5%).
aapc17.fm Page 339 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Table 17.2 Subdivision of neurons with somatosensory responses according to their RF locations No. of neurons
% of neurons
Contralateral
Ipsilateral
Bilateral
Face Hand Arm Neck and trunk Face and hand Large
126 35 10 4 8 13
64 18 5 2 4 7
82 35 10 0 5 11
8 0 0 0 1 0
36 0 0 4 2 2
Total
196
100
143
9
44
Table 17.3 Subdivision of bimodal (somatosensory and visual) neurons according to their RF locations No. of neurons
% of neurons
Contralateral
Ipsilateral
Bilateral
Face Hand Arm Neck and trunk Face and hand Large
68 7 6 1 2 9
73 8 6 1 2 10
46 7 6 0 2 7
1 0 0 0 0 0
21 0 0 1 0 2
Total
93
100
68
1
24
Table 17.3 summarizes the body part location of RFs of bimodal ‘somatosensory and visual’ neurons. If compared with ‘somatosensory’ neurons, the RFs of bimodal neurons tended to be located even more predominantly on the face. Of 68 neurons whose RFs were on the face, 59 had RFs on the lower face, 8 on the upper face, and 1 on both parts of the face. Considering that all ‘large’ RFs included the face, 85% of bimodal neurons had RFs on the monkey’s face. In contrast with tactile responses, proprioceptive and deep pressure responses were mostly evoked by arm and hand stimulation. Proprioceptive and deep pressure responses were most frequently (n = 49, 84%) evoked by stimuli applied contralateral to the recorded hemisphere, while only 9 (16%) of the neurons were activated by bilateral stimuli, and none by ipsilateral ones.
17.3.2 Visual properties Out of 236 recorded neurons, 148 (63%) responded to visual stimuli. According to the type of stimulation effective in activating them, visually responsive neurons were subdivided into three main classes, shown in Table 17.4. The 1rst and most represented class was formed by neurons with RFs located in the space around the monkey (peripersonal space) and responding best to rotational stimuli or stimuli moved along a horizontal, vertical, or sagittal direction. Typically, RFs were large. All neurons of the 1rst class were bimodal ‘somatosensory and visual’ neurons, with their visual RFs located around the tactile
339
aapc17.fm Page 340 Wednesday, December 5, 2001 10:05 AM
340
Common mechanisms in perception and action
Table 17.4 Subdivision of neurons with visual responses according to their preferred visual stimuli No. of neurons Peripersonal Far Biological actions Total
% of neurons
71 16 61
48 11 41
148
100
ones. Most RFs were located around the face. Out of the 71 neurons of the 1rst class, 58 (82%) had RFs contralateral to the recorded hemisphere, only one neuron had an ipsilateral RF, and 12 neurons (17%) had 1elds extending bilaterally. Visual responses were elicited any time a three-dimensional object was moved inside the RF. The quality of the object was generally irrelevant to the activation of the neurons, although occasionally neurons were recorded that seemed to respond best when the stimulus was the experimenter’s hand. The second class was formed by neurons responding to stimuli presented outside the peripersonal space. These neurons typically responded any time the stimuli were presented or moved in the monkey’s visual 1eld at a distance greater than 40 cm from the monkey’s body. The stimuli could be pieces of food or different objects at hand in the lab. The third class was composed by neurons responding to the observation of actions. Some of these neurons, in addition, had tactile RFs on the face and visual RFs around the tactile ones. In the next sections we will describe in more detail the neurons of this third class.
17.3.3 Neurons responding to the observation of actions Sixty-one neurons responded to the observation of actions executed by the experimenter in front of the monkey. Out of these 61 neurons, 43 had motor properties, and therefore, in analogy with the neurons described in area F5 (see Gallese et al. 1996; Rizzolatti et al. 1996a), will be referred to as ‘PF mirror’ neurons. Eighteen neurons, which were devoid of motor properties, will be described as ‘PF action-observation’ neurons.
17.3.4 Visual properties of PF mirror neurons Virtually all neurons of this class (n = 39, 91%) responded to the observation of actions in which the experimenter’s hand(s) interacted with objects. Of the four remaining neurons, three responded to the observation of the experimenter’s arm reaching for an object, and one to the observation of the experimenter’s elbow flexion. The responses triggered by these stimuli were consistent and did not habituate. The visual presentation of objects, such as food items or objects at hand in the lab, did not evoke any response. Similarly ineffective or very little effective in driving the neuron response were actions that, although achieving the same goal and looking similar to those performed by the experimenter’s hand, were made with tools such as pliers or pincers. Actions having emotional content, such as threatening gestures, were also ineffective. The distance and the location in space
aapc17.fm Page 341 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Table 17.5 Mirror neurons subdivided according to the actions effective in activating them Observed actions
No. of neurons
Bimanual interaction Grasping Manipulating Holding Releasing Placing Reaching
8 4 3 3 2 1 3
Grasping and placing Grasping and holding Grasping and releasing Grasping and bringing to the mouth Placing and holding Placing and manipulating
5 4 2 1 3 1
Elbow 2exion
1
Total
43
with respect to the monkey at which the experimenter’s actions were performed did not appear to modulate the intensity of the response. Out of 43 PF mirror neurons, 25 were driven by the observation of a single action. Eighteen neurons were activated by the observation of two of them. The properties of neurons responding to two actions were the same as those responding to one action, apart from their lower speci1city. Table 17.5 shows the observed actions effective in activating the neurons and the number of mirror neurons activated by each of them. Only the actions listed in Table 17.5 were effective, among the many tested, in driving the neurons. An example of a PF mirror neuron responding to the observation of a single action is shown in Fig. 17.2. This neuron (bimanual interaction neuron) responded to the observation of both hands of the experimenter interacting with an object. Figure 17.3 shows another example of a PF mirror neuron responding to the observation of a single action. This neuron responded to the observation of grasping. The discharge started immediately before contact with the object and diminished immediately after the experimenter’s hand took possession of it. The neuron responded also during the monkey’s active execution of grasping with the mouth. This neuron, in addition to responses during grasping observation, also showed bimodal properties. The association between mirror and bimodal properties was found in about one quarter of PF mirror neurons. This interesting association will be dealt with below. About 40% of mirror neurons responded to the observation of two actions. An example of a PF mirror neuron discharging during the observation of two actions—grasping and releasing—is shown in Fig. 17.4.
341
aapc17.fm Page 342 Wednesday, December 5, 2001 10:05 AM
342
Common mechanisms in perception and action
Fig. 17.2 Visual responses of a ‘bimanual interaction’ mirror neuron. Each panel shows rasters and histograms recorded during 10 consecutive trials of the corresponding behavioral condition. This neuron discharged when both hands of the experimenter held an object. The discharge was tonically present during the whole holding period. If holding was performed by a single hand the discharge was either much weaker (left hand) or completely absent (right hand). Mimicking a bimanual holding action without the object did not evoke any response. The neuron discharged also when the monkey grasped an object with its hand (not shown in the 1gure). Rasters and histograms are aligned (small gray bars and black vertical bar, respectively) with the moment in which the experimenter’s hand started moving toward his other hand holding the object (1rst panel from top), toward his other hand without object (last panel from top), or to show the object to the monkey (second and third panel from top). Abscissae: time; ordinates: spikes per second.
aapc17.fm Page 343 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Fig. 17.3 Visual and motor responses of a ‘grasping’ mirror neuron. Rasters and histograms are aligned with the moment in which the experimenter’s hand touched the object ( 1rst panel from top), started moving to present the object to the monkey (second panel from top), started moving the object with a top-down trajectory (third panel from top). Rasters and histograms of the last panel from top are aligned with the moment in which the monkey’s mouth touched the object. Other conventions as in Fig. 17.2.
343
aapc17.fm Page 344 Wednesday, December 5, 2001 10:05 AM
344
Common mechanisms in perception and action
Fig. 17.4 Visual and motor responses of a ‘grasping and releasing’ mirror neuron. This neuron started 1ring about 300 ms before the experimenter’s hand touched the object. The discharge continued until the experimenter’s hand took possession of the object, ceased during the holding phase, and started again during the releasing action. This neuron displayed a speci1city for the observed grip: the observation of grasping achieved by opposing the index 1nger to the thumb (pre-cision grip, PG) was much more effective than the observation of grasping achieved by 2exing all 1ngers around the object (whole hand prehension, WH). This selectivity was reciprocated by the neuron’s motor selectivity: the neuron’s discharge was higher when the monkey grasped the object using a precision grip than when using a whole hand prehension. Rasters and histograms of all panels, except for the last from top on the left row, are aligned with the moment in which either the experimenter’s or the monkey’s hand touched the object. Rasters and histograms of the last panel from top on the left row are aligned with the moment in which the experimenter’s hand started moving to present the object to the monkey. Other conventions as in Fig. 17.2.
This neuron displayed another property observed in several (see below) PF mirror neurons; namely, the hand performing the observed action markedly in2uenced the discharge intensity. The discharge was higher when the observed actions were performed by the experimenter’s left hand as opposed to the right hand. Out of 43 mirror neurons, 15 (35%) responded best to the observation of actions performed by one hand: 10 preferred the left hand and 5 the right hand. It is interesting to note that the experimenter’s left hand in a face-to-face stance corresponds to the observer’s right hand, that is, as in the present experiment, the hand contralateral to the recorded hemisphere.
aapc17.fm Page 345 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Fig. 17.5 Tactile and visual properties of a ‘grasping’ mirror neuron. The mirror properties of this neuron are shown in Fig. 17.3. Rasters and histograms are aligned with the moment in which the tactile stimulus, a three-dimensional object, touched the monkey’s skin ( 1rst and third panel from top), or with the moment in which the visual stimulus, a three-dimensional object, entered the monkey’s visual 1eld (second and last panels from top). In A, tactile and visual stimuli were moved along a top-down trajectory. In B, tactile and visual stimuli were moved along a bottom-up trajectory. The two drawings illustrate the location of the tactile RF (shaded area), which encompassed the entire hemiface contralateral to the recorded hemisphere, and of the peripersonal visual RF (solid), which extended approximately 15 cm from the monkey’s skin. The two arrows indicate the direction along which the tactile and visual stimuli were moved. Other conventions as in Fig. 17.3.
345
aapc17.fm Page 346 Wednesday, December 5, 2001 10:05 AM
346
Common mechanisms in perception and action
A set of PF mirror neurons (n = 10, 23%), in addition to action observation, responded to tactile stimuli applied to the monkey’s face and to visual stimuli (three-dimensional objects) moved in the peripersonal space around the tactile RF. Unlike the other mirror neurons, all ‘bimodal mirror’ neurons were excited during the monkey’s active movements of the mouth. Figure 17.5 shows an example of a bimodal mirror neuron. Its mirror properties were illustrated in Fig. 17.3. The tactile RF was located around the hemiface contralateral to the recorded hemisphere. The visual RF was located around the tactile one, extending in depth for about 15 cm. Both tactile and visual RFs were directionally selective: stimuli moved top-down were far more effective than stimuli moved bottomup in evoking the neuron discharge. It is interesting to note that this directional selectivity matched with the selectivity for the direction along which the observed hand approached and grasped the object, as shown in Fig. 17.3. The neuron discharged only when the experimenter’s hand approached the object from above, with a top-down trajectory. However, moving the object held by the experimenter’s hand with the same trajectory was much less effective in driving the neuron’s response. Object presentation was also ineffective.
17.3.5 Visual properties of PF action-observation neurons Among the 61 neurons responding to the observation of actions, 18 were devoid of motor properties (‘action-observation’ neurons). Out of 18 of these neurons, 8 were driven by the observation of a single action. Ten neurons were activated by the observation of two or three actions. Table 17.6 shows the observed actions effective in activating the neurons and the number of action-observation neurons activated by each of them. Only the actions listed in Table 17.6 were effective, among the many tested, in driving the neurons. Figure 17.6 shows an example of an action-observation neuron. This neuron discharged when the monkey observed the experimenter’s hand grasping and holding the object. The discharge onset preceded the moment in which the experimenter’s hand touched the object by about 300 ms, and
Table 17.6 Action-observation neurons subdivided according to the actions effective in activating them Observed actions Bimanual interaction Grasping Holding Reaching Grasping and holding Bimanual interaction and holding Placing and holding Grasping and releasing Grasping, placing, and bimanual interaction Total
No. of neurons 3 2 2 1 5 2 1 1 1 18
aapc17.fm Page 347 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Fig. 17.6 Visual properties of a ‘grasping and holding’ action-observation neuron. Rasters and histograms are aligned with the moment in which the experimenter’s hand touched the object (1rst and second panel from top), started moving to present the object to the monkey with a stick (third panel from top), or started holding the object. Other conventions as in Fig. 17.2.
then continued during the whole holding period. The hand used by the experimenter markedly influenced the discharge intensity of this neuron. Hand preference was present in three out of 18 actionobservation neurons, of which two preferred the left hand and one, shown in Fig. 17.6, the right one. When the experimenter performed the observed action with his left hand, the discharge was almost absent. Similarly ineffective were object presentation, or the observation of a holding action not preceded by a grasping action, even when performed by the preferred hand.
347
aapc17.fm Page 348 Wednesday, December 5, 2001 10:05 AM
348
Common mechanisms in perception and action
The vast majority of action-observation neurons (n = 12, 67%) had also bimodal properties. These neurons responded to tactile stimuli applied to the monkey’s face and to visual stimuli (three-dimensional objects) moved in the peripersonal space around the tactile RF.
17.3.6 Motor properties Out of 236 recorded neurons, 130 (55%) discharged in association with active movements of the monkey. Of those, 16 (12%) were purely motor, 38 (30%) also responded to somatosensory stimuli, 17 (13%) to visual stimuli, and 59 (45%) to both somatosensory and visual stimuli. Table 17.7 shows the effectors whose movement determined the responses of PF neurons. Almost all neurons (90%) were activated by movements of the mouth, of the hand, or of both. As previously described (Hyvärinen 1981; Leinonen et al. 1979), typically, PF neurons were selectively activated during speci1c actions. These were: grasping with the hand (n = 27), manipulation (n = 15), grasping and manipulation (n = 6), grasping and holding (n = 2), grasping with the mouth (n = 40), grasping with the hand and the mouth (n = 24), arm reaching (n = 3), bringing to the mouth (n = 8), arm reaching and bringing to the mouth (n = 1), and associated actions of the hand, arm, and mouth (n = 3). Only one neuron responded during a movement of the monkey—elbow 2exion. The motor properties of PF mirror neurons were indistinguishable from those of the other PF neurons. Seventeen of them (40%) discharged during hand actions: 13 were activated by hand grasping, and four by manipulation. Eleven neurons (25.5%) discharged during mouth grasping, and another 11 neurons (25.5%) discharged during hand and mouth grasping. Three neurons (7%) responded during arm and forearm movements: two of them responded to reaching, and one to elbow 2exion. Finally, one more neuron (2%) discharged during mouth grasping and arm reaching.
17.3.7 Relationship between visual and motor properties of mirror neurons In most mirror neurons there was a clear relationship between the observed action they responded to and the executed action that drove their discharge. Using as a classi1cation criterion the relationship between the effectors whose action observation or execution triggered the neurons’ discharge, we distinguished three broad classes. The 1rst class (n = 23, 53%) comprised neurons that responded during observation and execution of hand actions. In eight of these neurons the effective observed and executed action corresponded both in terms of action goal (e.g. grasping) and in terms of the way in which the goal was achieved
Table 17.7
Properties of motor neurons
Effector Mouth Hand Arm and forearm Mouth and hand Mouth, arm, and hand Total
No. of neurons
% of neurons
40 50 13 24 3
31 39 10 18 2
130
100
aapc17.fm Page 349 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
(e.g. precision grip). In six neurons the effective observed and executed actions were similar but not identical. In some of them the motor response was more speci1c than the visual one, in others the opposite was true. In six neurons the visual response could be interpreted as logically related to the motor response. For example, the effective observed action could be placing a piece of food on a tray, while the effective executed action could be grasping the piece of food. Finally, in three neurons there was no clear-cut relationship between the effective observed and executed actions. The second class (n = 12, 28%) comprised neurons that responded during observation of hand actions and during the execution of mouth actions. All bimodal mirror neurons fell in this class. Finally, the third class comprised neurons (n = 8, 19%) that responded during observation of hand actions and during execution of hand and mouth actions. If one considers only the hand actions, in two neurons the effective observed and executed actions corresponded in terms of both action goal and of the way in which the goal was achieved, while in six neurons the effective observed and executed actions were similar but not identical.
17.4 Discussion 17.4.1 General properties of area PF In agreement with previous reports (Graziano and Gross 1995; Hyvärinen 1981; Leinonen and Nyman 1979; Leinonen et al. 1979), the present study showed that the majority of PF neurons respond to passive somatosensory stimuli, to visual stimuli, or to both. A considerable percentage of neurons in area PF were also endowed with motor properties. All these motor neurons (except one) discharged during actions such as grasping, manipulating, and reaching objects. This 1nding stresses the role of the posterior parietal cortex as a region of the brain where not only different sensory modalities are integrated but also sensory information is used for motor actions. Neurons responding to visual stimuli were subdivided into three classes. Neurons of the 1rst class had RFs located in the space around the monkey (peripersonal space). Neurons of the second class responded to objects presented outside the peripersonal space. Finally, neurons of the third class responded to the observation of actions. In the remaining part of the Discussion we will focus on neurons of the 1rst and third class. Virtually all neurons of the 1rst class were also driven by cutaneous stimulation. These bimodal neurons responded independently to visual and tactile stimulation. Two-thirds of bimodal somatosensory and visual neurons had visual peripersonal RFs around the face, most frequently around the mouth. Tactile RFs were also located predominantly on the lower part of the face. Tactile and visual RFs were therefore usually ‘in register’. Directional selectivity, when present, as in the neuron shown in Fig. 17.5, was the same in both modalities. It is noteworthy that most bimodal neurons with motor properties discharged during mouth grasping movements. The association of motor properties related to a given effector (e.g. mouth, head, or arm) with bimodal receptive 1elds anchored to the same effector, is displayed also by neurons of the premotor area F4 (Fogassi et al. 1992, 1996; Gentilucci, Scandolara, Pigarev, and Rizzolatti 1983; Graziano, Yap, and Gross 1994; Graziano, Hu, and Gross 1997). The ‘visual’ responses of F4 neurons, rather than being truly visual, are probably potential actions. In other words, they consist of the automatic activation of the motor programs required by the effector whose visual receptive field is crossed by a stimulus to interact with the same stimulus (see Fogassi et al. 1996; Rizzolatti, Fadiga, Fogassi,
349
aapc17.fm Page 350 Wednesday, December 5, 2001 10:05 AM
350
Common mechanisms in perception and action
and Gallese 1997a). Although one cannot exclude that the responses observed in the parietal cortex are true visual responses, coding the position of stimuli in the peripersonal space, it is more likely that in area PF too the peripersonal space located around the mouth is a ‘grasping space’. This space would enable a fast grasping action of the mouth whenever an appropriate stimulus approaches its surrounding space (see also below). Previous experiments in which area PF was ablated produced de1cits that are consistent with the neuron sensory and motor properties just described (Faugier-Grimaud, Frenois, and Stein 1978; Matelli, Gallese, and Rizzolatti 1984; Rizzolatti, Gentilucci, and Matelli 1985). Following damage to PF, monkeys tend to use the hand ipsilateral to the lesion. Distal movements of the contralateral hand become clumsy. Tactile stimuli applied to the face contralateral to the lesion are frequently neglected. The coordinated head–mouth movements necessary to reach food when the lips are touched with it are slower on the affected side. Visual de1cits are present in the peribuccal space, but not in the extrapersonal space. When the monkey 1xates a central stimulus, the movement of a piece of food near the mouth on the side of the lesion produces an immediate mouth-grasping response, whereas the same stimulus shown contralaterally is ignored. With two stimuli moved simultaneously, the one ipsilateral to the lesion is always preferred.
17.4.2 Neurons responding to the observation of actions The most important 1nding of this study was the discovery that a considerable percentage of PF neurons (mirror and action-observation neurons) responded to the observation of actions performed by other individuals such as grasping, placing, holding, reaching, and the like. The majority of these neurons had also motor properties matching the visual ones. As in the premotor area F5 (di Pellegrino et al. 1992; Gallese et al. 1996; Rizzolatti et al. 1996a), the PF mirror and action-observation neurons required an interaction between the agent of the action and the object target of the action in order to be visually driven. The sight of the agent alone or of the object alone was not effective. Similarly, the same actions when mimicked without the target object were much less effective. Hand actions were the most effective stimuli. The use of tools to imitate the effective observed actions usually did not signi1cantly affect the neuron response. Another similarity between PF and F5 mirror and action-observation neurons was the broad degree of generalization across different instances of the observed actions evoking the neuron’s discharge. In many neurons the distance and location in space with respect to the monkey of the observed actions was not crucial. About two-thirds of PF neurons responding to action observation have properties almost indistinguishable from those characterizing F5 mirror and action-observation neurons. This result indicates that the ‘mirror’ system, matching action observation to action execution, is not a prerogative of the premotor cortex, but extends to the posterior parietal lobe as well. How could one relate these PF neurons with the properties of F5 mirror neurons? A problematic issue since the discovery of F5 mirror neurons has been the source of their visual input. Neurons activated by complex biological, meaningful visual stimuli had been previously described in the macaque brain. Early studies showed that in the inferotemporal lobe there are neurons that discharge selectively to the presentation of a face or a hand (Gross et al. 1972; see also Perrett, Rolls, and Caan 1982). More recently, Perrett and co-workers demonstrated that in a region of the upper bank of the superior temporal sulcus (STSa) there are neurons, apparently devoid of motor properties (it must be noted, however, that such properties were never tested), selectively activated by the sight of hand actions (Perrett et al. 1989; Perrett, Mistlin, Harries, and Chitty 1990; see
aapc17.fm Page 351 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
also Jellema and Perrett, this volume). The results of the present study suggest that area PF could represent an intermediate step leading from a ‘visual’ description of actions, carried out in STSa, to a motor representation of the same actions coded in the premotor cortex. Area PF, together with the STS region and the premotor area F5 could compose a cortical network supporting action recognition. Evidence from brain-imaging studies in humans suggests that this is more than a speculative hypothesis. Several PET and fMRI studies (Grafton et al. 1996; Grèzes et al. 1998; Iacoboni et al. 1999; Rizzolatti et al. 1996b; for a review, see Allison, Puce, and McCarthy 2000; Decety and Grèzes 1999) have shown that whenever subjects observed meaningful goal-related hand actions, three cortical regions were consistently activated: the STS region, the anterior part of the inferior parietal lobule (BA 40), and a sector of the premotor cortex (BA 44–45). It is noteworthy that in a recent fMRI study (Buccino et al. 2001) it has been shown that a mirrormatching system is present for mouth and foot actions as well, thus suggesting that this system is not con1ned to hand actions, but is likely to underpin the understanding of a huge variety of actions.
17.4.3 Matching hand actions on mouth actions A consistent number of PF mirror neurons matched observed hand actions on mouth actions. This apparent discrepancy between the different effectors, the observation and active movement of which drive these neurons, thus needs to be addressed. A possible hypothesis is that these PF neurons represent a ‘primitive’ matching system based on mouth movements. Ontogenetically speaking, the mouth is the effector by means of which all primates, humans included, not only start to feed themselves, but also start to explore the surrounding world. Through the medium of mouth actions the world can be readily classi1ed in categories (good/bad, edible/non-edible) that are very likely to form the building blocks of the future, more comprehensive account of the environment. Consistent with this ontogenetic hypothesis is the high degree of mouth–hand synergies observed in infants. After extensive practice, paralleled by the development of corticospinal projections, the two effectors can easily be used independently in a successful way. When the infant primate acts with its mouth, nevertheless, it most frequently can see its hand. An association between mouth and hand actions can therefore be established. According to this hypothesis, this association may lead to the origin of mirror neurons that have mouth-related responses on the output side, and hand-related responses on the input side. This relation, which initially is established between the infant’s mouth and its own grasping hand, could be generalized later to the hands of other individuals. One could object that these neurons were recorded in an adult monkey. Such a ‘primitive’ matching system, however, could persist in adulthood, even when a more sophisticated hand–hand matching system is developed, in order to provide an ‘abstract’ categorization of the observed actions: what is recognized is a particular action goal, regardless of the effector enabling its achievement. Some additional words should be spent to discuss the properties of those PF mirror neurons that showed bimodal properties. The visual RFs of these mirror neurons were located around the face, mostly around the mouth, and their tactile RFs were located on the lips and the peribuccal region. All these neurons discharged during mouth grasping actions and all responded to the observation of hand actions. What can the function of these RFs be, when combined with mirror properties? As stated above, a visual peripersonal RF located around a mouth tactile RF can be interpreted as a motor space, by means of which the visual stimuli that cross it are ‘translated’ into suitable motor
351
aapc17.fm Page 352 Wednesday, December 5, 2001 10:05 AM
352
Common mechanisms in perception and action
plans (e.g. a mouth grasping action), enabling the organism endowed with such RFs to successfully interact with the same stimuli (Fogassi et al. 1996; Rizzolatti, Fadiga, Fogassi, and Gallese 1997a). The visual stimulus that most frequently crosses the peripersonal visual RFs of these PF mirror neurons is likely to be the monkey’s own hand, while bringing food to the mouth. A hand approaching the mouth can therefore pre-set the motor programs controlling grasping with the mouth. Through a process of generalization between the monkey’s own moving hand, interpreted as a signal to grasp with the mouth, and the object-directed moving hands of others, any time the monkey observes another individual’s hand interacting with food, the same mouth action representation will be evoked. According to this ontogenetic hypothesis, the peripersonal visual RF around the mouth would enable a primitive matching between the vision of a hand and the motor program controlling the mouth. Once this equivalence is put in place, a mirror system matching hand actions observation to mouth actions execution can be established.
17.4.4 Action, perception, and the parietal cortex What is the link between acting and observing someone else acting? This question raises the broader issue of the relationship between action and perception. Since the early eighties, the dominant view on the cortical processing of visual information has been the ‘what’ and ‘where’ theory, as proposed by Ungerleider and Mishkin (1982). According to these authors, the ventral stream has its main role in object recognition, while the dorsal stream analyzes an object’s spatial location. This point of view was in accordance with the classical notion of the parietal cortex as the site for unitary space perception. Since the early nineties, Milner and Goodale (Goodale and Milner 1992; Milner and Goodale 1995) have argued against this theory, emphasizing the role of the dorsal stream in the ‘on-line’ control of action. This point of view, primarily triggered by clinical data, has been subsequently substantiated by neurophysiological evidence. The posterior parietal cortex is now thought to consist of a mosaic of areas, each receiving speci1c sensory information (Colby and Duhamel 1996; Rizzolatti, Fogassi, and Gallese 1997b; Rizzolatti et al. 1998). Within the dorsal stream, there are parallel cortico-cortical circuits, each of which elaborates a speci1c type of visual information in order to guide different types of action. The peculiarity of these circuits resides in the fact that different effectors are provided with the most suitable type of visual information required by their motor repertoire. This 1rm connection between vision and action seems to be the organizing principle within the circuitry connecting the parietal cortex with the agranular frontal cortex of the monkey (see also Gallese, Craighero, Fadiga, and Fogassi 1999). The present data point to an important involvement of the posterior parietal lobe, and in particular of areas such as area PF, strictly linked with the premotor cortex, in mediating processes traditionally considered to be ‘high level’ or cognitive, such as action recognition. Mirror neurons, such as those presented in this study, represent a perfect instantiation of this view. This matching mechanism can be framed within theories postulating a shared representational domain for action and perception (Gallese 2000a,b; Jeannerod 1994, 1997; Prinz 1997; Rizzolatti, Fogassi, and Gallese 2000). Such a mechanism offers the great advantage of using a repertoire of coded actions in two ways at the same time: at the output side to act, and at the input side, to analyze the visual percept. The link is constituted by the presence in both instances of a goal. Our proposal is that the goal of the observed action is recognized and ‘understood’ by the observer by mapping it on to a shared motor representation.
aapc17.fm Page 353 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Acknowledgements This work was supported by MURST and by HFSP.
References Allison, T., Puce, A., and McCarthy, G. (2000). Social perception from visual cues: Role of the SRS region. Trends in Cognitive Sciences, 4, 267–278. Andersen, R.A., Asanuma, C., Essick, G., and Siegel, R.M. (1990). Corticocortical connections of anatomically and physiologically de1ned subdivisions within the inferior parietal lobule. Journal of Comparative Neurology, 296, 65–113. Buccino, G., Binkofski, F., Fink, G.R., Fadiga, L., Fogassi, L., Gallese, V., Seitz, R.J., Zilles, K., Rizzolatti, G., and Freund, H.-J. (2001). Action observation activates premotor and parietal areas in a somatotopic manner: An fMRI study. European Journal of Neuroscience, 13, 400–404. Cavada, C. and Goldman-Rakic, P.S. (1989a). Posterior parietal cortex in rhesus monkey: II. Evidence for segregated corticocortical networks linking sensory and limbic areas with the frontal lobe. Journal of Comparative Neurology, 287, 422–445. Cavada, C. and Goldman-Rakic, P.S. (1989b). Posterior parietal cortex in rhesus monkey: I. Parcellation of areas based on distinctive limbic and corticocortical connections. Journal of Comparative Neurology, 287, 393–421. Cochin, S., Barthelemy, C., Lejeune, B., Roux, S., and Martineau, J. (1998). Perception of motion and qEEG activity in human adults. Electroencephalography and Clinical Neurophysiology, 107, 287–295. Colby, C.L. and Duhamel, J.-R. (1996). Spatial representations for action in parietal cortex. Cognitive Brain Research, 5, 105–115. Decety, J. and Grèzes, J. (1999). Neural mechanisms subserving the perception of human actions. Trends in Cognitive Sciences, 3, 172–178. di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180. Fadiga, L., Fogassi, L., Pavesi, G., and Rizzolatti, F. (1995). Motor facilitation during action observation: A magnetic study. Journal of Neurophysiology, 73, 2608–2611. Faugier-Grimaud, S., Frenois, C., and Stein, D.G. (1978). Effects of posterior parietal lesions on visually guided behavior in monkeys. Neuropsychologia, 16, 151–168. Fogassi, L., Gallese, V., di Pellegrino, G., Fadiga, L., Gentilucci, M., Luppino, G., Matelli, M., Pedotti, A., and Rizzolatti, G. (1992). Space coding by premotor cortex. Experimental Brain Research, 89, 686–690. Fogassi, L., Gallese, V., Fadiga, L., Luppino, G., Matelli, M., and Rizzolatti, G. (1996). Coding of peripersonal space in inferior premotor cortex (area F4). Journal of Neurophysiology, 76, 141–157. Fogassi, L., Gallese, V., Fadiga, L., and Rizzolatti, G. (1998). Neurons responding to the sight of goal-directed hand/ arm actions in the parietal area PF (7b) of the macaque monkey. Society of Neuroscience Abstracts, 24, 257.5. Gallese, V. (2000a). The acting subject: Towards the neural basis of social cognition. In T. Metzinger (Ed.), Neural correlates of consciousness, pp. 325–333. Cambridge, MA: MIT Press. Gallese, V. (2000b). The inner sense of action: agency and motor representations. Journal of Consciousness Studies, 7, 23–40. Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609. Gallese, V., Craighero, L., Fadiga, L., and Fogassi, L. (1999). Perception through action. Psyche: http://psyche. cs.monash.edu.au/v5/psyche-5-21-gallese.html. Gentilucci, M., Scandolara, C., Pigarev, I.N., and Rizzolatti, G. (1983).Visual responses in the postarcuate cortex (area 6) of the monkey that are independent of eye position. Experimental Brain Research, 50, 464–468. Gentilucci, M., Fogassi, L., Luppino, G., Matelli, M., Camarda, R., and Rizzolatti, G. (1988). Functional organization of inferior area 6 in the macaque monkey: I. Somatotopy and the control of proximal movements. Experimental Brain Research, 71, 475–490. Goodale, M.A. and Milner, D. (1992). Separate visual pathways for perception and action. Trends in Neuroscience, 15, 20–25.
353
aapc17.fm Page 354 Wednesday, December 5, 2001 10:05 AM
354
Common mechanisms in perception and action
Grafton, S.T., Arbib, M.A., Fadiga, L., and Rizzolatti, G. (1996). Localization of grasp representations in humans by PET: 2. Observation compared with imagination. Experimental Brain Research, 112, 103–111. Graziano, M.S.A and Gross, C.G. (1995). The representation of extrapersonal space: A possible role for bimodal visual-tactile neurons. In M.S. Gazzaniga (Ed.), The cognitive neurosciences, pp. 1021–1034. Cambridge, MA: MIT Press. Graziano, M.S.A., Yap, G.S., and Gross, C.G. (1994). Coding of visual space by premotor neurons. Science, 266, 1054–1057. Graziano, M.S.A., Hu, X., and Gross, C.G. (1997). Visuo-spatial properties of ventral premotor cortex. Journal of Neurophysiology, 77, 2268–2292. Grèzes, J., Costes, N., and Decety, J. (1998). Top-down effect of strategy on the perception of human biological motion: A PET investigation. Cognitive Neuropsychology, 15, 553–582. Gross, C.G. et al. (1972). Visual properties of neurons in inferotemporal cortex of the monkey. Journal of Neurophysiology, 35, 96–111. Hari, R., Forss, N., Avikainen, S., Kirveskari, S., Salenius, S., and Rizzolatti, G. (1999). Activation of human primary motor cortex during action observation: A neuromagnetic study. Proceedings of the National Academy of Science, 95, 15061–15065. Hyvärinen, J. (1981). Regional distribution of functions in parietal association area 7 of the monkey. Brain Research, 206, 287–303. Iacoboni, M., Woods, R., Brass, M., Bekkering, H., Mazziotta, J.C., and Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526–2528. Jeannerod, M. (1994). The representing brain: Neural correlates of motor intention and imagery. Behavioral Brain Sciences, 17, 187–245. Jeannerod, M. (1997). The cognitive neuroscience of action. Oxford: Blackwell. Leinonen, L. and Nyman, G. (1979). II. Functional properties of cells in anterolateral part of area 7 associative face area of awake monkeys. Experimental Brain Research, 34, 321–333. Leinonen, L., Hyvärinen, J., Nyman, G., and Linnankoski, I. (1979). I. Function properties of neurons in lateral part of associative area 7 in awake monkeys. Experimental Brain Research, 34, 299–320. Matelli, M., Gallese, V., and Rizzolatti, G. (1984). De1cit neurologici conseguenti a lesione dell’area parietale 7b nella scimmia. Bollettino della Società Italiana di Biologia Sperimentale, 60, 839–844. Matelli, M., Luppino, G., and Rizzolatti, G. (1985). Patterns of cytochrome oxidase activity in the frontal agranular cortex of the macaque monkey. Behavioral Brain Research, 18, 125–137. Matelli, M., Camarda, R., Glickstein, M., and Rizzolatti, G. (1986). Afferent and efferent projections of the inferior area 6 in the Macaque Monkey. Journal of Comparative Neurology, 251, 281–298. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action. Oxford: OUP. Pandya, D.N. and Seltzer, B. (1982). Intrinsic connections and architectonics of posterior parietal cortex in the rhesus monkey. Journal of Comparative Neurology, 204, 196–210. Perrett, D.I., Rolls, E.T., and Caan, W. (1982).Visual neurons responsive to faces in the monkey temporal cortex. Experimental Brain Research, 47, 329–342. Perrett, D.I., Harries, M.H., Bevan, R., Thomas, S., Benson, P.J., Mistlin, A.J., Chitty, A.K., Hietänen, J.K., and Ortega, J.E. (1989). Frameworks of analysis for the neural representation of animate objects and actions. Journal of Experimental Biology, 146, 87–113. Perrett, D.I., Mistlin, A.J., Harries, M.H., and Chitty, A.K. (1990). Understanding the visual appearance and consequence of hand actions. In M.A. Goodale (Ed.), Vision and action: The control of grasping, pp. 163–180. Norwood, NJ: Ablex. Petrides, M. and Pandya, D.N. (1997). Projections to the frontal cortex from the posterior parietal region in the rhesus monkey. Journal of Comparative Neurology, 228, 105–116. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Rizzolatti, G., Gentilucci, M., and Matelli, M. (1985). Selective spatial attention: One center, one circuit or many circuits? In M.I. Posner and O. Marin (Eds.), Attention and Performance, XI: Conscious and nonconscious information processing, pp. 251–265. Hillsdale, NJ: Erlbaum. Rizzolatti, G., Camarda, R., Fogassi, L., Gentilucci, M., Luppino, G., and Matelli, M. (1988). Functional organization of inferior area 6 in the macaque monkey: II. Area F5 and the control of distal movements. Experimental Brain Research, 71, 491–507. Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996a). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131–141.
aapc17.fm Page 355 Wednesday, December 5, 2001 10:05 AM
Action representation and the inferior parietal lobule
Rizzolatti, G., Fadiga, L., Matelli, M., Bettinardi, V., Paulesu, E., Perani, D., and Fazio, G. (1996b). Localization of grasp representations in humans by PET: 1. Observation versus execution. Experimental Brain Research, 111, 246–252. Rizzolatti, G., Fadiga, L., Fogassi, L., and Gallese, V. (1997a). The space around us. Science, 277, 190–191. Rizzolatti, G., Fogassi, L., and Gallese, V. (1997b). Parietal cortex: From sight to action. Current Opinion in Neurobiology, 7, 562–567. Rizzolatti, G., Luppino, G., and Matelli, M. (1998). The organization of the cortical motor system: New concepts. Electroencephalography and Clinical Neurophysiology, 106, 283–296. Rizzolatti, G., Fogassi, L., and Gallese, V. (2000). Cortical mechanisms subserving object grasping and action recognition: A new view on the cortical motor functions. In M.S. Gazzaniga (Ed.), The cognitive neurosciences, (2nd edn), pp. 539–552. Cambridge, MA: MIT Press. Seltzer, B. and Pandya, D.N. (1984). Further observations on parietotemporal connections in the rhesus monkey. Experimental Brain Research, 55, 301–312. Ungerleider, L.G. and Mishkin, M. (1982). Two visual systems. In D.J. Ingle, M.A. Goodale, and R.J. Mans1eld (Eds.), Analysis of visual behavior, pp. 549–586. Cambridge, MA: MIT Press.
355
aapc18.fm Page 356 Wednesday, December 5, 2001 10:06 AM
18 Coding of visible and hidden actions Tjeerd Jellema and David I. Perrett
Abstract. We review the properties of cells in the temporal cortex of the macaque monkey, which are sensitive to visual cues arising from the face and body and their movements. We speculate that the responses of populations of cells in the cortex of the anterior superior temporal sulcus (STSa) support an understanding of the behaviour of others. Actions of an agent, including whole body movements (e.g. walking) and articulations (of the limbs and torso), made during the redirecting of attention and reaching are coded by STSa cells in a way which: (1) allows generalization over different views and orientations of the agent with respect to the observer; (2) utilizes information about the agent’s current and (3) imagined position while occluded from sight; and (4) is sensitive to sequences of the agent’s movements. The selectivity of cells is described from the perspective of hierarchical processing, which presumes that early processing establishes sensitivity to simple body cues and later coding combines these cues to specify progressively more subtle and abstract aspects of behaviour. The action coding of STSa cells is discussed in terms of dorsal and ventral cortical systems, the binding problem, and the functional architecture, which allows hierarchical information processing.
18.1 Introduction 18.1.1 Starting simple: perspectives on hierarchical processing One of the most fundamental tasks for scientists attempting to understand vision is to realize its purpose. This is necessary even before de1ning a computational theory of how that purpose can be achieved (Marr 1982). Of course, there need not be one single purpose, a ‘holy grail’, though authors have championed various singular causes—for example, Marr (1982) suggested that the purpose of vision is to build representations and, more recently, Goodale and Milner (1992) have stressed the function of vision in guiding actions. This paper assumes that one goal of vision is to enable the viewer to understand the behaviour of others (which may in turn afford social interactions, avoid predation, etc.). Given this goal, vision needs to achieve the ability to detect and discriminate meaningful actions. These are abstract, complex visual events, which are obviously not de1ned by the presence of a single edge or particular colour in the image. Early vision, which provides the basis for discriminating colour or orientations, is not suf1cient for understanding actions. An act such as ‘knocking over an object’ involves analysis of a huge number of elementary visual features and their movements. Detecting this act may even necessitate realization of the motives of another (i.e. that the knocking over was intentional rather than accidental). The thesis followed in our work is that understanding of such complex acts is achieved by the initial detection of simple events and the subsequent detection of combinations of these simple events. This reiterative process supports a hierarchy of complexity of the visual con1gurations
aapc18.fm Page 357 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
detected and allows progressively more subtle meanings to be realized. There is no reason that readers should be alarmed by the properties of any one stage of this hierarchy. The detection of light, of edges, shapes, gaze, and attention are all manifestations of the same wonderful biological processes, whereby cells take on inputs representing sensory data, perform a statistical assessment of these (Riesenhuber and Poggio 2000), and provide output to other cells. No stage of processing needs to be seen as exceptional, since the machinery and operations at each level are equivalent. The problem with our notion of hierarchies is our anthropomorphism. We assume that being higher in the hierarchy is somehow more important, perhaps because people higher in the chain of command are seen as more responsible and thus deserving of higher salaries. Cells, on the other hand, are paid equally and all do a similar job of detecting patterns of input and providing output action potentials wherever they are in the brain. Anthropomorphic thinking about hierarchies in the nervous system leads to a hunt for the ghost in the machine. Once processing a few synapses away from the sensory receptors is understood, it is allocated the lowly status of sensory processing; awareness and consciousness, even of those simple qualities made apparent by the sensory processing, is relocated to some higher level. An alternative, more egalitarian view is that at each stage neural processing can contribute to awareness. Under this view, cells in the primary visual cortex can contribute to awareness of orientation and spatial frequency, but are not able to contribute directly to awareness of facial patterns. Cells sensitive to facial patterns, on the other hand, may contribute to the awareness of faces but not to the orientation of edges. The purpose of this paper is to provide an overview of how the brain builds progressively more abstract descriptions of the actions of others. Investigations into visual processing can be made by recording the activity of individual brain cells at different stages in the visual system of experimental animals. Since the work of Gross and colleagues (Gross et al. 1972), it has become clear that some cells in the temporal cortex of the macaque monkey respond selectively to the sight of biologically important objects (such as hands and faces). The properties of such cells, therefore, offer a unique opportunity to study directly the brain mechanisms involved in processing complex visual patterns and their meaning. In this paper, we focus on the visual cues used by cells to specify the posture and actions of others. We review previously published 1ndings, but extend these accounts with new examples and observations on the sensitivity of single cells within the temporal cortex that exhibit tuning for speci1c body postures, movements, and components of behaviour. The paper traces the historical progression from cell properties that have been described for over two decades (i.e. selectivity to view of the face) to more recently discovered and unusual properties. Hopefully, it will be apparent that even the most complex neural descriptions of behaviour can in principle be derived from cellular sensitivity to relatively standard visual attributes (such as the form of individual body components, view, direction of motion, and position).
18.2 Integration of form 18.2.1 Getting attention The responses of cells to faces, particularly the cells within the superior temporal sulcus (STSa), are consistent with these cells playing a role in the perception of social signals (Emery and Perrett 1994). One type of social signal that appears to be analysed extensively within the STSa is where another animal is directing its attention (Perrett et al. 1990c, 1991). We refer to these signals as social attention (see Langton, Watt, and Bruce 2000).
357
aapc18.fm Page 358 Wednesday, December 5, 2001 10:06 AM
358
Common mechanisms in perception and action
Fig. 18.1 Discrimination of view for head and body. Upper: Schematic illustration of the 3-D test stimuli. Lower: The mean and standard error (SE) of response of one cell to different views of the head and body. With the body occluded from sight, the cell gave an excitatory response to the left pro1le view of the head but gave zero response to the right pro1le view. With the head occluded from sight, the cell responses showed a preference for the left pro1le view of the body (torso and limbs). Statistical analysis of responses supported these observations. A 2-way ANOVA (with view and body components as main factors) indicated a signi1cant effect of view on responses [F(1,14) = 7.7, p < 0.02], no effect of body part [F(1,14) = 0.4, p = 0.52], and no interaction between factors [F(1,14) = 0.30, p = 0.93]. Thus, the cell responses signal the sight of the head or the body facing in the same direction. A role in the visual analysis of direction of another’s attention may account for the selectivity of different cells in STSa to not only the face but also many other views of the head (Hasselmo, Rolls, Baylis, and Nalwa 1989; Perrett et al. 1991). Different cells in the STSa are selective for different views of the head; some respond only to the front or face view, others respond selectively to the left pro1le view of the head (e.g. Fig. 18.1), or to the right pro1le, while yet others respond to the back view of the head. Further cells respond to the head raised or the head lowered (e.g. Fig. 18.2). We speculate that a cell maximally responsive to the face seen in left pro1le may signal that the attention of another individual is directed to the observer’s left. Likewise, a cell responsive to the head lowered might signal attention down.
18.2.2 What you looking at? The hypothesis that STSa cells responsive to faces may signal attention direction suggests the existence of a variety of response properties, if the cells are to be useful in social situations. In many
aapc18.fm Page 359 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
cases the direction in which another individual’s head is pointing is not a reliable index of where that individual’s attention lies. Gaze direction can be a better guide to the focus of another individual’s attention and should therefore affect STSa cell responses. These predictions for a role in signalling attention direction are borne out since most cells selective to a particular head view are also sensitive to gaze direction (Perrett et al. 1985a, 1990c, 1991, 1992). Moreover, the direction of gaze and the head view to which cells are maximally sensitive tend to be aligned. For example, cells responsive to the head directed towards the observer (full face) are more responsive to eye contact than to laterally averted gaze. By contrast, cells responding to the head turned away from the observer towards pro1le also respond more to laterally averted gaze. Similarly, in the vertical plane, many of the cells that are sensitive to the head pointing downwards are also sensitive to gaze directed down (Fig. 18.2), and many of those selective for the face directed upwards are also selective for the eyes directed upwards (Perrett et al. 1990c). This is illustrated for one cell in Fig. 18.2, which was responsive to the sight of the head oriented towards the ground, and whether the head was seen from the front or seen from pro1le (not shown). For the front view of the head facing the ground, the eyes are not visible, so that responses must be based on visual information derived from other parts of the face. The cell is unresponsive to the face directed towards the camera/observer, but does respond to the face when the eyes are directed downward. An ineffective view of the head becomes effective provided that the eyes point in the correct direction: downwards, not straight ahead or upwards. There are circumstances in which the direction of gaze of another individual is not clear, for example when the eyes lie in shadow. In these cases, the direction of attention can still be analysed from the direction in which the head is pointing. Head angle thus provides a parallel cue to attention direction. Cells showing combined sensitivity to head view and gaze direction (e.g. Fig. 18.2) are therefore capable of signalling the direction of another individual’s attention under a variety of viewing conditions. Since gaze direction can be a more accurate cue to direction of attention than head view, a prediction can be made that if head and gaze cues were put in con2ict, cell sensitivity to gaze direction should override sensitivity to head view. Experimental results support this prediction. Changing the gaze direction can decrease cell response to an effective head view or elevate response to an ineffective head view (Perrett et al. 1985a, 1990c, 1992). Langton et al. (2000) note that, for humans, head and gaze cues may contribute independently (rather than hierarchically) to the analysis of attention direction. Indeed, at the cellular level the interaction of head and gaze cues can be additive rather than prioritized.
18.2.3 Bodies count too! It is also possible to derive visual indications of the direction of attention from an individual’s body posture. It turns out that 60% of cells within the STSa that are responsive to the face have been found to process visual information about the body in addition to the head (Wachsmuth et al. 1994). Visual information arising from body cues appears to contribute to cell sensitivity in a way that is also consistent with the cells’ role in analysing the direction of attention (Perrett et al. 1992). For example, Fig. 18.1 illustrates the responses of one cell to the left pro1le view of the head or the left pro1le view of the body. The cell responses could contribute to signalling that an individual is attending in a direction to the observer’s left. This signalling function could occur in situations where either the head or the body are partially occluded from sight.
359
aapc18.fm Page 360 Wednesday, December 5, 2001 10:06 AM
360
Common mechanisms in perception and action
Fig. 18.2 Sensitivity to gaze direction, head, and body postures indicative of attention directed down. Upper parts of (A and B): Schematic illustrations of real 3-D stimuli used for testing. Lower parts of (A and B): Mean (+ /− 1SE) of response of one cell to the stimuli. (a) Sensitivity to head and eye gaze directed down. The cell responded more to a view of the head in which the face was rotated towards the ground than to full-face views ( p < 0.002, each comparison, Newman–Keuls). With the full-face view the cell responded more when the gaze was directed down than to gaze directed at the camera (viewer) or to gaze averted upwards or spontaneous activity (SA, p < 0.005 each comparison). [Overall effect of conditions F(4, 20) = 29.6, p < 0.0005.] (b) Sensitivity of the same cell to body posture. With the head covered, the cell responded more to the quadrupedal posture than to the bipedal posture ( p < 0.0005). With the head visible the cell responded more when the head was pointing at the ground than when it was level ( p < 0.0005). [Overall effect of conditions F(4, 36) = 20.5, p < 0.0005.] (Adapted from Perrett et al. 1992.)
aapc18.fm Page 361 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Figure 18.2 illustrates a further example of the responses of a cell sensitive to head view and body posture. The sensitivity of this cell to head down and gaze down has already been discussed. Some of the cells with this type of sensitivity were also found to be responsive to the sight of the body in quadrupedal but not bipedal posture (Perrett et al. 1990c, 1992). For the cell illustrated in Fig. 18.2, sensitivity to the quadrupedal posture was found with the head occluded from sight. These results indicate that three independent types of visual cue, arising from the eyes, head, and body, all impact on the cell. Moreover, the visual information from the head appears to take some priority over the visual cues from the body, since the ineffective bipedal body posture becomes effective when the head is visible and oriented downward, and the effective quadrupedal body posture becomes ineffective when the head is visible but oriented level rather than downwards. The visual sensitivity of this type of cell to the eyes, head, and body posture are each consistent with a role in coding the sight of an individual ‘attending down’ (Perrett et al. 1992). Generally, the conjoint sensitivity to gaze direction, head view, and body posture indicates the cells’ ability to utilize very different types of visual information, yet all of the information to which the cells are sensitive is compatible with the same conceptual interpretation of another individual’s direction of attention. These cells appear to signal where in the world someone else is looking. One can imagine that separate visual descriptions would be built initially for the appearance of the eyes, head, and body and, at later stages in the analysis, the outputs of the appropriate versions of these lower level descriptions would be combined hierarchically to establish selectivity for multiple components of the body. This hierarchical processing scheme 1ts the range of neural sensitivities observed, with some cells sensitive for component facial cues and other cells sensitive to combinations of gaze, facial, and bodily cues (Perrett et al. 1992). Such a hierarchical scheme is analogous to that proposed for the 2ow of information from cell populations with view-speci1c response sensitivity to cell populations with view-general sensitivity (Logothetis, Pauls, and Poggio 1995; Perrett et al. 1984, 1985a, 1989, 1991, 1992; Riesenhuber and Poggio 2000; Seibert and Waxman 1992). Some evidence for such a hierarchical organization with speci1c tuning being combined to establish more conceptually abstract tuning is apparent from analysis of cell response latencies: cells with more abstract properties show onset latencies that are longer than cells with more simple properties (Perrett et al. 1992).
18.3 Integration of form and motion 18.3.1 Time for action The neuronal mechanisms described so far have involved the processing of static visual stimuli. Area STSa receives visual information about the motion of ‘animate’ objects from posterior motion processing areas (Boussaoud, Ungerleider, and Desimone 1990). Comprehension of what other individuals are ‘doing’ may depend on the combined analysis of the individuals’ appearance (their form) and the way they move. Within the STSa there are many cell populations which appear specialized for the coding of speci1c limb or whole body movements (Perrett et al. 1985b). These cells re2ect combined sensitivity to both the form of the body and the nature of its movements (Oram and Perrett 1996). Collectively the cells can be thought of as providing a visual encyclopaedia of recognizable actions of the face and body. Again, the cell types can be arranged in a conceptual hierarchy, starting with cells sensitive to the simple movements of single limbs and ending with cells sensitive to the complex patterns of
361
aapc18.fm Page 362 Wednesday, December 5, 2001 10:06 AM
362
Common mechanisms in perception and action
articulation characteristic of whole body movements and cells sensitive to actions in which the body movements are related to objects or goals within the environment (Jellema et al. 2000a; Perrett et al. 1989). Cells sensitive to individual limb movements code articulations in speci1c directions towards or away from the observer, or code changes in limb elevation up or down. Some cells are responsive solely to particular head movements, others to leg or arm movements and others to movements of smaller body or facial components (1ngers, eyes, or mouth). All of these cell types display form sensitivity in that they respond speci1cally to one body part moving but not to equivalent movements involving a different body part (e.g. arm movements but not leg movements). Most cells are unresponsive to control stimuli constructed to resemble the effective body part in size and shape. For some cells, however, the form sensitivity is sometimes reduced such that stick 1gures, which articulate in the same way as the whole body or a speci1c limb, can evoke cell responses (Oram and Perrett 1994; Perrett et al. 1990a). Such sensitivity to patterns of articulation allows the cells to respond to a variety of body types with different colouration and patterning of fur or of clothes (in the case of humans).
18.3.2 Good intentions One key to the coding of intention in actions is sensitivity to information about the direction of attention of the individual performing the action. Usually an agent will attend to the goal of an action that is intended; by contrast the agent’s attention may be elsewhere when an action is unintentional or accidental. We recently studied a population of STSa cells which seems to combine information about the direction of attention of an agent with the action performed by that agent (Jellema et al. 2000a,b); this enables the cells to be sensitive to the intentionality of the action. These cells typically responded when an agent performing a reaching action focused attention onto the intended target-site of the reach. When the agent performed an identical reaching action but directed attention 90 degrees away from the position in which the reaching was aimed, the cells did not respond. These cells can be thought of as combining the outputs from cells speci1cally responsive to arm reaching with the outputs of cells speci1cally responsive to the direction of attention (as conveyed by the face, gaze, and body orientation, e.g. Figs. 18.1, 18.2). The presence of a speci1c object at the target position of the reaching, such as a banana located on a tray, did not affect the responses of these cells. The goal of the reaching in these cases appeared to be a position rather than an object.
18.3.3 Stepping out There is a whole menagerie of cell types located within the STSa. One type that occurs in large numbers is responsive to the sight of whole body movements that are witnessed during walking (Perrett et al. 1985b). Most cells coding whole body movements within the STSa appear to use a ‘viewer-centred’ frame of analysis. That is, changing the observer’s view of the moving body changes the cell’s response. For example, one cell might respond to the right pro1le view of the body moving to the observer’s right, but not to the left pro1le view moving in the same direction. Different cells are sensitive to whole body movements in different directions (left, right, towards, away, up, and down). This is illustrated in Fig. 18.3 for one cell that responds to the front view of the body approaching the observer. Note that changing the view of the approaching body, to left or right pro1le or to the back of the body, eliminates the cell’s response. Likewise, movements maintaining the front view of the body but directed to the left, right, or away from the observer fail to
aapc18.fm Page 363 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
provoke the cell’s response. Here, then, both the body view and the movement direction must be speci1ed correctly with respect to the observer before the cell responds. Such behaviour is typical of 95% of STSa cells responsive to whole body movements made during walking (Oram and Perrett 1996; Perrett et al. 1985b). The cells differ in the choice of direction and view of the body coded; some respond to walking to the observer’s left, some to the observer’s right. A small number (5%) of STSa cells have been studied that are capable of responding in more abstract terms to whole body movements and behave as if the vantage point of the observer was irrelevant (Figs. 18.4–6). Some of these cells respond to ‘walking forwards’ (e.g. Fig. 18.4), others to ‘walking backwards’ (e.g. Figs. 18.5 and 18.6). The former cells respond best to movements away from the observer when the back view of the body is seen moving; yet for movements towards the observer the same cells respond best to the front view of the body; for movements to the observer’s left the left pro1le view is the optimal one and for movements to the right it is the right pro1le view (e.g. Fig. 18.4). For each direction of movement the body view is critical or for each view of the body the direction of motion is critical (e.g. Fig. 18.5). Such descriptions have ‘object-centred’
Fig. 18.3 Viewer-centred coding for walking towards the observer. Upper: Schematic representation of the view and type of movement. Lower: Mean (+/−1SE) of responses of one cell. The cell responds to the experimenter walking towards the observer so long as the body faces the direction of movement (i.e. walks forwards). The front view of the body seen moving left, right, or away from the observer fails to provoke a response. The experimenter approaching with different body views (left or right pro1le, or back of the body) also produces less response as does a body-sized control object approaching and the static front view of the body. The cell requires a particular combination of body view and direction of motion. [1-way ANOVA main effect of test condition F(8, 57) = 4.95, p < 0.0002; front view approaching greater than all other conditions, p < 0.001.]
363
aapc18.fm Page 364 Wednesday, December 5, 2001 10:06 AM
364
Common mechanisms in perception and action
properties because the description walking forwards does not depend on the observer’s vantage point (Hasselmo et al. 1989; Marr and Nishihara 1978; Oram and Perrett 1996; Perrett et al. 1985a,b, 1989). It is easy to speculate that such abstract descriptions of walking forwards and backwards are built hierarchically by combining the outputs of several view-speci1c descriptions of motion. For
Fig. 18.4 Object-centred coding of forward walking or forward translation. Mean (+ /−1SE) of responses of one cell to video clips of walking towards, left, right, or away from the observer so long as the body faces the direction of movement (i.e. walks forward). A schematic representation of the view and type of movement is shown adjacent to each of the responses. Responses to compatible walking are higher than to incompatible walking (where the body is facing away from the direction of motion). [2-way ANOVA, overall effect of compatibly of walking (forwards, backwards), F(1, 15) = 75.8, p < 0.000005; direction of motion (towards, away, to the right, to the left), F(3, 15) = 0.8, p = 0.5; interaction, F(3, 15) = 2.1, p = 0.15.] The cell responses continue to discriminate compatible and incompatible body movements for video 1lm of an actor moved on a trolley without articulation. [2-way ANOVA main effect of compatibility F(1, 14) = 50.5 p < 0.000005, direction of motion F(3, 14) = 1.3 p = 0.3, interaction F(3,14) = 0.3 p = 0.82.]
aapc18.fm Page 365 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Response (spikes/sec)
30
Incompatible Compatible Control
24 18 12 6 0
Move towards
Move left
Move away
Move right
30
Response (spikes/sec)
Move left Move right
15
0 0
90
180 Angle of view
270
360
Fig. 18.5 Coding of walking backwards to the right and left. Upper: Schematic representation of the view and type of movement. Lower: Mean (+/−1SE) of responses of one cell to 8 different views of the body moving to the left or right. Curves display the best 1t second order cardioid function (see Perrett et al. 1991). The cell tuning responses show different view tuning for two directions of motion. For each direction the optimal view is one in which the body is oriented the opposite way to the motion.
example, walking forwards independent of view can be manufactured by combining the outputs of viewer-centred cells sensitive to compatible motion directed towards, away, to the left, and right of the observer (Perrett et al. 1985b, 1989, 1990a,b). There are two observations consistent with this speculation. First, latencies of cells responding to whole body movements in a viewer-centred manner tend to be shorter than latencies of cells
365
aapc18.fm Page 366 Wednesday, December 5, 2001 10:06 AM
366
Common mechanisms in perception and action
Fig. 18.6 Cell coding upright and inverted backward walking. Mean (+/−1SE) of responses of one cell to walking towards, left, right, and away from the observer, so long as the body faces the opposite direction of movement (i.e. walks backwards). [2-way ANOVA, main effect of compatibility F(1, 38) = 38.4, p < 0.000005, effect of direction of motion F(3, 38) = 3.7, p = 0.02, interaction F(3, 38) = 3.66, p = 0.02.] The cell responses continue to discriminate compatible and incompatible body movements for inverted 1lm of a person walking away from the camera/observer [F(1, 11) = 29.9 p < 0.0002].
responding in the object-centred manner. A similar difference in latencies is apparent for cells responsive to view-speci1c and view-general static information about the head and body (Perrett et al. 1992). Second, one can 1nd cells that show sensitivity to more than one but not all directions of movement. When this occurs we 1nd that the cells show selectivity for one type of view compatible with the movement directions: we do not 1nd cells showing selectivity for compatible forward motion in one direction and incompatible backward motion in another direction. Thus, the cells’ view and direction selectivity seem to re2ect logical combinations rather than random combinations. This is illustrated in Fig. 18.5 for a cell responsive to motion to the left and right of the observer. This cell failed to respond to directions towards or away from the observer. For movements directed left and right, tests were made comparing responses to video 1lms showing eight views of the body moving. It can be seen that for both directions the cell is tuned for the view of the body consistent with it walking backwards.
18.3.4 Articulation that doubles you up Rather than responding to the net displacement of bodies through space, a different type of cell codes for the articulation of the body where one limb or multiple components of the body move with respect to other components. Cells sensitive to articulation come in different ‘2avours’: one apparent division of labour between the cells is whether they code for vertical or horizontal rotation. The articulatory movements of the body can again be described relative to the observer (viewer-centred) or they can be described relative to some other component of the body itself (object-centred: Hasselmo et al. 1989; Perrett et al. 1985b, 1990a,b). When referenced to the observer, horizontal rotations can be speci1ed by the view they bring to confront the observer: rotation towards the observer
aapc18.fm Page 367 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
bringing the head and body to a front view; rotation away taking the face away and presenting a pro1le or rear view (Perrett et al. 1985b, 1990b). Vertical rotations may move the observed body or some component of it with respect to gravity, for example, lowering the face or chest towards the ground or raising it towards the sky. Figure 18.7 illustrates one cell typical of those sensitive to vertical articulation. For the sight of a human body seated (normally) in a chair, the cell responds to motion increasing the angle of 2exion at the hips and resulting in the chest and face taking on a more skyward orientation. Note that the angle of 2exion through which the cell is responsive is speci1c: it is only when the angle between upper torso and legs exceeds 90° that the cell responds. The body can 2ex in four ways, symbolized by different arrows in Fig. 18.7. We have studied 38 cells of this type responsive to vertical body 2exion. The majority (94%) of these cells were sensitive to bending motion through one of the four quadrants illustrated. Examples are illustrated for bending backwards from upright (Fig. 18.7), and for bending forwards from upright (Fig. 18.8). Thus, the cells code a speci1c type of 2exion or extension (cf. Hasselmo et al. 1989). For this type of cell, we have studied the importance of different component body movements, 2exing just the head, or the head and upper torso. Most of these cells did not require the torso to move and responded to the head movement alone. For a small number of cells (e.g. Fig. 18.7C), the response to the head movement alone was present but reduced compared with the combined head and torso motion, implying the importance of the torso movement. Thus, like the cells coding static information about body posture, some of the cells coding body movement also appear to integrate information from multiple components of the body. Such integration of visual cues from head and torso motion is indicated in Fig. 18.7. Here the head and torso either move in the same direction or twist in different directions. When the movements of the head and upper torso occur in different directions, the movement of the face downwards can suppress the response to the chest moving upwards.
18.4 Generalization across similar actions 18.4.1 Turning the world upside down Further dramatic indications that cell coding can generalize to conceptually similar movements come from tests in which the video of the walking person was vertically inverted. This is illustrated in Fig. 18.6 for one cell that is responsive to walking backwards, when witnessed in the normal upright orientation. The cell remains capable of discriminating compatible and incompatible movement for inverted videos of walking. Similar generalization across viewing orientation is found in cells tuned to articulation of the body and head (see Fig. 18.8). One immediately thinks that no subject has seen inverted walking—so why are their cells capable of generalizing to such unusual movements, and how do their inputs allow the cells to respond? Monkeys and apes (including humans, particularly in their youth) see their companions moving in all sorts of ways: climbing up and down, scrambling forwards and backwards while suspended upside down. Observers, too, will occasionally hang upside down while watching the movements of others in their gravitationally normal lives. So retinal stimulation by movements will occur in a variety of orientations. Temporal cortex appears sensitive to viewing conditions: cells in this area develop selectivity for the appearance of faces and bodies in those conditions in which they are experienced (Ashbridge, Perrett, Oram, and Jellema 2000). Tuning also appears to re2ect the duration of experience (Perrett et al. 1998).
367
aapc18.fm Page 368 Wednesday, December 5, 2001 10:06 AM
368
Common mechanisms in perception and action
Fig. 18.7 Coding of the body bending backwards independent of view. Upper left: Schematics depicting the test views of the experimenter relative to the subject. The direction of body articulation is indicated by arrows (numbered 1–4). A: Mean (+ /−1SE) of response to different articulations. Articulations at the hips that bent the body backwards, increasing the distance between the chest and face and the knees, produced larger responses than other articulations (movement number 2; dark arrow and 1lled histogram bars). The cell responded vigorously to this action when seen from the front, back, right, and left side. [2-way ANOVA, showed a main effect of articulation type (movements 1–4), F(3, 128) = 154.1, p < 0.000005; and of view, F(3, 128) = 3.67, p = 0.014; but no interaction, F(9, 128) = 1.42, p = 0.18.] B: Independent sensitivity to head movement. Rotation of the head backwards, starting from the upright position, produced more activity from the cell than rotating the head forwards along the same trajectory and ending in the upright position [F(1, 15) = 20.6, p < 0.0004]. Head movements 1 and 4 did not excite the cell (not shown). C: Interaction of head and body articulation. The responses of the cell to upper body movements were modulated by rotation of the head. Head movements are indicated by small arrows. The data shown represent the averaged responses across front, back, right, and left views. The response to the backward bending of the torso was reduced when the head simultaneously turned downwards (p < 0.0002), whereas the response to the forward bending of the torso was increased when the head simultaneously turned upwards (p < 0.004). [ANOVA, type of bending, F(3, 93) = 88.9, p < 0.00005.]
aapc18.fm Page 369 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Fig. 18.8 Coding of the body bending forwards independent of view and orientation. Upper: Schematics depicting the test views and orientations of the experimenter relative to the subject. The directions of body articulations are indicated by arrows (numbered 1–4). Lower: Mean (+ /−1SE) of response to different articulations. With the body in an upright orientation, articulations at the hips that bent the body forwards and brought the chest and face closer to the knees (movement number 4; dark arrow and 1lled bars) produced larger responses than other articulations. [2-way ANOVA, showed a main effect of articulation type (movements 1–4), F(3, 106) = 147.8, p < 0.00005; view, F(3, 106) = 3.51, p = 0.018; and interaction, F(9, 106) = 2.1, p = 0.034.] With the body oriented horizontally the cell continued to respond selectively for the same type of articulation. [2-way ANOVA main effect of articulation type (movement 1 vs. 4) F(1, 35) = 70.2, p < 0.00005; view, F(1, 35) = 0.018, p = 0.89; interaction, F(1, 35) = 1,30, p = 0.26.]
18.4.2 Gliding along Perhaps the real clue to the cells’ capacity to generalize across orientation comes from consideration of the mechanisms by which cells respond selectively to body movements. There are two
369
aapc18.fm Page 370 Wednesday, December 5, 2001 10:06 AM
370
Common mechanisms in perception and action
main ways; some of the cells code speci1c patterns of articulation, others code the combination of the form available at each instant plus the direction of displacement. The former cells respond to ‘biological motion’ displays where only a few points of light attached to the body need be visible (Johansson 1973; Oram and Perrett 1994; Perrett et al. 1990a). The latter cell type, which is much more numerous, requires only the form of the face and/or body to be seen translating or changing in scale. This is shown in Fig. 18.4 for one cell, which responded in an object-centred manner to walking forwards in all directions. The cell continued to discriminate forward from backward motion for videotape stimuli in which the body was moved without articulation (or effort) on a mobile trolley, although responsiveness was less than that observed for a real human walking. Indeed, for many cells the movement required can be further simpli1ed. For example, for cells selective to a body walking towards the observer, simply zooming a slide of a face to increase its magni1cation can be suf1cient to evoke responses. Likewise, cells responsive to walking right can respond to a slide projection of the right pro1le of a human body made to drift to the right across a projection screen. Some cells are a little more sophisticated and require translation of the face/body relative to background elements (Hietanen and Perrett 1996a,b; Oram and Perrett 1996). Cells within the STSa that are sensitive to body movements appear to combine two different signals, one available from an STSa cell population that speci1es direction of motion but lacks form selectivity (Oram, Perrett, and Hietanen 1993) and a second STSa population that speci1es the form of the face and body independent of its movement (Perrett et al. 1984). These two sources of information can be seen to arrive separately (i.e. at different times) on individual cells conjointly sensitive to form and motion (Oram and Perrett 1996). Given that the cells selective for whole body movements can combine these two types of input, then it follows that, since about 20% of the cells specifying face and body form generalize over orientation (Ashbridge et al. 2000), some of the cells sensitive to walking will inherit this orientation tolerance and display the trick of coping with upside-down walkers. For cells generalizing to articulations across different orientations, generalization may depend on different mechanisms. For these cells, generalization could depend on coding the relative separation between two parts of the body (Perrett et al. 1990a; e.g. the top of head moving closer to the knees), or the relative speed of motion of different parts of the body towards a further part of the body (e.g. forehead moving towards the hips faster than chin).
18.5 Cortical visual pathways and the location of actions 18.5.1 What, where, how, and why? Our understanding of visual processing has been dominated by the ‘what’ versus ‘where’ dichotomy, proposed by Ungerleider and Mishkin (1982). Their model envisages a separation of visual processing into two distinct cortical streams: a dorsal stream, from V1 into the inferior parietal cortex, which deals with the spatial relationships of objects (the ‘where’ stream), and a ventral stream, extending from V1 into the inferior temporal cortex, dealing with the shape and identity of objects (the ‘what’ stream) (e.g. Desimone and Ungerleider 1989; Haxby et al. 1991, 1993; Köhler, Kapur, Moscovitch, Winocur, and Houle 1995).
aapc18.fm Page 371 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Where in this dichotomy do the cells described here 1t? The abundance of cells within STSa that code the visual appearance of the face and body, and their apparent lack of sensitivity to retinal position, orientation, luminance, and colour (e.g. Ashbridge et al. 2000; Ito, Tamura, Fujita, and Tanaka 1995; Lueschow, Miller, and Desimone 1994; Perrett et al. 1984, 1989; Rolls and Baylis 1986) suggest a role in object recognition, and allocate STSa to the ventral stream. In fact the STS runs between dorsal and ventral streams. The assignation of the STS to the dorsal–ventral anatomical dichotomy is therefore ambiguous. The posterior sections of the STS include the motion processing areas (V5/MT and MST) and are traditionally thought of as belonging to the dorsal stream, since they project heavily to the parietal cortex. The anterior sections of the STS lie in the temporal lobe and therefore belong anatomically to the ventral division. The functions of STSa cells in coding the direction of motion of animate objects and the direction of attention of other animals have a decidedly spatial 2avour, which could be allied with dorsal operations. Indeed, such visual information could be sent to the dorsal cortical systems to facilitate control of the observer’s own attention (Hoffman and Haxby 2000; Lorincz, Baker, and Perrett 1999) via the dense anatomical projections to the parietal cortex from the STSa (Harries and Perrett 1991). It is more important to consider the possible functions of STSa cells than to try to 1t their properties into circumscribed functions already assigned to dorsal and ventral processing streams. Attempts to shoe-horn cell types into one or other system blind us to functions that the STSa cells may serve, that are independent of those currently associated with dorsal and ventral streams. If, as we have speculated, STSa cells play a role in social cognition, then they may utilize all sorts of information (including spatial position) that has previously been allocated to the dorsal or to the ventral cortical streams. A number of 1ndings at the neuropsychological level challenge the strict ‘what–where’ dorsal– ventral dichotomy and indicate that object properties may also be coded dorsally (e.g. Goodale, Milner, Jakobson, and Carey 1991; Murphy, Carey, and Goodale 1998). These are substantiated by reports at the cellular level—for example, cells in parietal cortex code for the size and orientation of objects that have to be grasped (Murata, Gallese, Kaseda, and Sakata 1996; Sakata et al. 1998). Sereno and Maunsell (1998) found cell selectivity for passively viewed two-dimensional shapes in the lateral intraparietal cortex. Conversely, there have been recent reports of spatial coding within the ventral stream. Dobbins, Jeo, Fiser, and Allman (1998) reported a high proportion (> 60%) of cells in area V4 displaying changes in response with viewing distance, independent of retinal image size. Even cells in V1 code for certain volumes of space (Trotter and Celebrini 1999). This evidence suggests that both form and spatial cues may be processed in each of the two cortical visual streams. Milner and Goodale (Goodale and Milner 1992; Milner and Goodale 1993, 1995) have reformulated the function of dorsal and ventral visual streams and emphasize the visuomotor nature of processing within parietal areas (‘how’ to deal with objects). One major implication of this revised type of model is that form and space are processed in both pathways but for different purposes. The ventral stream is thought to serve visual ‘perception’, that is, object and scene recognition (cf. Marr and Nishihara 1978) and recognition of ‘why’ an action is occurring (Walsh and Perrett 1994). In the ventral stream, object representations are thought to bene1t from allocentric spatial coding to represent the enduring characteristics of objects and relationships between object components. By contrast the dorsal stream is thought to serve the visual control of ‘action’, and to utilize egocentric spatial coding for short-lived representation of views of an object that are essential for guiding visuomotor interactions with objects. The functions of ventral and dorsal streams emphasize visionfor-perception versus vision-for-action.
371
aapc18.fm Page 372 Wednesday, December 5, 2001 10:06 AM
372
Common mechanisms in perception and action
18.5.2 Using your position Lately, we have become aware that spatial position is integrated with form and movement cues to support the comprehension of animals and their actions within the temporal cortex (e.g. Baker et al. 2000, 2001; Jellema and Perrett 1999). Our working hypothesis with respect to the functional signi1cance of spatial coding in STSa is that it plays a role in the visual analysis of the intentions and goals of others’ actions, that is, in social cognition (cf. Abell et al. 1999; BaronCohen 1994, 1996; Brothers 1995; Emery and Perrett 1994, 1999; Jellema et al. 2000a). The spatial locations that individuals occupy are especially relevant in hierarchically organized primate societies. Our preliminary results suggest that spatial coding may indeed be widespread in STSa. The reason why previous studies did not observe it is probably that, given the predominant view of the functions of the dorsal and ventral visual streams, most studies on the ventral stream have been biased towards investigating object processing, neglecting possible effects of position. A convergence of information about the motion and form of objects within STSa has been con1rmed at the cellular level (Oram and Perrett 1994, 1996; Tanaka et al. 1999), but a possible spatial in2uence on STSa cells has not been studied systematically. Anatomical studies reveal that STSa receives an abundance of projections, which could provide the position information: from the parahippocampal cortex (Seltzer and Pandya 1994), the entorhinal cortex (Good and Morrison 1995), the posterior parietal cortex (Baizer, Ungerleider, and Desimone 1991; Seltzer and Pandya 1984), the anterior part of IT (Baizer et al. 1991; Boussaoud et al. 1990; Morel and Bullier 1990), and posterior regions of STS (Boussaoud et al. 1990). For instance, the distance sensitivity observed in cells in area V4 (Dobbins et al. 1998) may well extend into STSa, since V4 forms the main visual input onto IT, and IT projects heavily onto STSa. The hippocampus and/or parahippocampal gyrus may provide the spatial input to STSa via its projection onto the perirhinal cortex (Seltzer and Pandya 1994). Thus, information about space has a profound in2uence in the temporal lobe but its utilization in the visual processing of complex objects and actions is only just becoming apparent. Baker et al. (2001) described substantial numbers of cells with responses that were selective for static views of the body and which were additionally sensitive to the distance of the body from the observer. It now appears that the cells in the STSa tuned to body movement can also possess spatial selectivity. Based on preliminary data, we 1nd sensitivity in STSa to walking depends on the combination of each of the three factors (form, motion direction, and spatial position). Changing just one factor can abolish the response, while no single factor is suf1cient to evoke the response: spatial cues are necessary but not suf1cient to produce a response. Preliminary data show that, for cells sensitive to approaching or retreating movements of the experimenter, there is a tendency for maximal responses at the ‘near’ location for cells responsive to compatible walking towards the subject, and at the ‘far’ location for cells responsive to compatible walking away from the subject (Jellema and Perrett 1999). This suggests that the cells favour certain combinations of location, form, and direction of motion above others. Such spatial sensitivity appears to be present in at least some cells selective for walking directed left or right. This is illustrated for one cell in the upper half of Fig. 18.9. The cell responded to the sight of walking to the right, more than to the sight of walking to the left. The cell shows spatial sensitivity, with greater responses to the experimenter when in positions on the right-hand side of the room compared with the left-hand side. The spatial selectivity of this cell is relatively weak compared with that exhibited by other STSa cells sensitive to movement in depth towards and away from the observer.
aapc18.fm Page 373 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Fig. 18.9 Coding visible and hidden movement. Responses (mean + /− 1 SE) of one cell to the sight of the experimenter moving (A) in and (B) out of sight. The monkey’s view of the lab is illustrated on the right. Grey rectangles denote the position of occluding screens in the middle and on the right of the experimental room 4 m from the subject. Solid arrows indicate the position and direction of walking while the experimenter was visible (a–c). Dotted arrows indicate position of the experimenter when hidden from sight and the direction of movement prior to occlusion (d–f). [2-way ANOVA, visibility of movement (in sight vs. out of sight), F(1, 24) = 15.7, p = 0.0006; direction and position of movement (3 levels; left position move right, right position move left, right position move right), F(2, 24) = 57.8, p < 0.00005; interaction, F(2, 24) = 2.4, p = 0.12; n = 5 for each condition.] The response to the experimenter out of sight on the right (f) was greater than all other conditions (a–e), p < 0.004, each comparison. Walking towards the right screen (c) evoked a greater response than walking towards the middle screen (a and b) [c vs. b, p < 0.0005; c vs. a, p < 0.05]. Walking towards the middle screen produced a larger response when starting from the left (a) than when starting from the right (b) [p < 0.002]. (From Jellema et al. 2002.)
18.6 Actions as events extending over time 18.6.1 Now you see me, now you don’t Actions often become partially or completely hidden from view. Since the predictability of impending sensory stimuli has a pervasive in2uence on STSa responses to tactile and motion stimuli (Hietanen and Perrett 1993a,b, 1996a,b; Mistlin and Perrett 1990), we have investigated how cells respond when actions such as walking become hidden from sight. These studies revealed a population of visual cells in STSa that respond maximally when individuals are seen to ‘hide’ behind an occluding screen. Of particular relevance to the discussion of spatial coding was the 1nding that all of the cells studied in this population (n = 30) were sensitive to the location of the hidden person (Baker et al.
373
aapc18.fm Page 374 Wednesday, December 5, 2001 10:06 AM
374
Common mechanisms in perception and action
2001). Thus these cells responded maximally after the individual had moved out of sight at a particular location in the lab. For example, in Fig. 18.9 the cell illustrated responded more in the 3 s following disappearance from sight behind screens than in the prior 3 s when the experimenter was visible and moving towards the screens. Many cells have no detectable response to visible movements but start responding 1–4 s after the person moving has become completely hidden. The cell illustrated in Fig. 18.9 responded maximally when the experimenter was hidden behind a screen located at the far right side of the experimental room (response to condition f is greater than to all other conditions a–e). Hiding behind a screen located in the middle of the room at the same distance from the subject (d and e) produced less response. The cell’s responses to the experimenter walking in-sight were consistent with the out-of-sight responses in that a larger response was evoked when the experimenter walked towards the right screen (c) than towards the middle screen (a and b). Additionally, walking towards the middle screen produced a larger response when starting from the left (a) than when starting from the right (b). Thus, the maximal out-of-sight response was obtained when the experimenter hid behind the right-hand screen, and the maximal in-sight response was obtained when the direction of walking was towards the right-hand screen and when the site of walking was closest to the right-hand screen. These in- and out-of-sight responses are consistent with the idea that this cell codes not only for the presence of the experimenter behind the right screen, but also for the intention of the experimenter to go behind that screen. For this interpretation, we need only assume that walking towards the right screen re2ects the intention to move behind that screen. The use of spatial information in temporal cortex contrasts sharply with (1) the spatial sensitivity of the hippocampal ‘view’ cells (Rolls et al. 1997, 1998), which do not require an object at the optimal position or view, and (2) spatial sensitivity in premotor cortex (Graziano, Yap, and Gross 1994; Graziano, Hu, and Gross 1997), which applies to any object which happens to be in the near space. The speculation here is that position sensitivity of STSa cells may help the interpretation of current impending actions, providing indications as to what is likely to happen next.
18.6.2 What happens next? A 1nal property exhibited by some cells within the STSa is intriguing in this context. We have just found that the perceptual history or sequence of events is critical in determining the response to the current scene. To some extent this is apparent for the cells that respond when animate objects become hidden from sight. For example, in Fig. 18.9, when the person has walked out of sight, the scene is identical, yet the cell’s response depends critically on what was last seen—where the experimenter was and in which direction he was moving before disappearing. More dramatically, we 1nd that cell responses to static views of the head and body in a particular posture can depend entirely on the preceding movement (Jellema, Baker, Wicker, and Perrett 2000b). We have studied 31 cells for which responses occur when one particular movement leads to the posture, but responses are absent if the same posture is presented from behind a shutter with no perceptual history, or is presented after a different preceding movement. Actions and behaviour are complex: much of their complexity derives because actions are composed of complex sequences of movements extending over time and developing in different ways. What we are beginning to see within the STSa is that cells are sensitive to elementary sequences of two events. It is quite likely that comprehension of complex and lengthy action sequences can be stitched together, hierarchically, from sensitivity to these elementary sequences.
aapc18.fm Page 375 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
18.7 Discussion 18.7.1 Bondage without pain The cells reviewed here require two or more visual attributes to 1re: for example, motion to the left and the left pro1le view of the body (Oram and Perrett 1996). In essence, the cells require the two features to be ‘bound together’, that is, to arise from one object rather than from two independent objects (one object moving left, and a second object with the left pro1le form; Oram and Perrett 1996). Philosophers, psychologists, and neuroscientists continue to be fascinated by the ability of the nervous system to detect such conjunctions and solve the ‘binding problem’. It is therefore appropriate to discuss how the process may be achieved. Indeed, it need be no mystery; the essence of a mechanism to solve the binding problem without the use of spooky codes has been available for 30 years (Wicklegren 1969; for review see Perrett and Oram 1998). Binding happens whenever simple features are combined to allow a more complex pattern to be detected. To understand binding we therefore need to understand how hierarchical feature processing works, for example, how a diamond shape is detected and differentiated from an X, which has many of the same features but in a different order. The same problem exists for cells tuned to face patterns, which may require the visibility of several eye-like and mouth-like features to 1re but can remain unresponsive when the same features are presented in a jumbled con1guration. Details of the way temporal cortex binds information together to detect facial patterns are given elsewhere (Perrett and Oram 1998). Here we can restrict ourselves to simpler examples, but it should be clear how such examples extend to more complex properties. Consider the detection of a diamond shape (◊). The activity in feature detectors tuned for individual orientations does not reveal the diamond pattern explicitly, nor does activity in a collection of detectors sensitive to pairs of oriented elements as angular corners, e.g. >, <, ∧, ∨, or parallel edges //, \\ (even if the activity in such detectors is synchronized to a 40-Hz wave). A pattern-sensitive cell (of the type described by Tanaka et al. 1991) that requires inputs from several such hypothetical V4 feature detectors (e.g. ∧ , ∨, //, \\) would, however, be sensitive to a diamond pattern and unresponsive to other shapes that have some but not all of the same features (e.g. X shapes). The solution to the detection problem comes simply from the fact that alternative patterns do not have as many of the required features and will therefore produce only suboptimal activation in the cells that are tuned to diamond shapes. Of course, such cells sensitive to diamond patterns could be made up with a different set of input features, and collectively the ability of many such cells to detect diamond patterns would be superior to the performance of individual cells. Here the operations seem straightforward; we know and love such angle and parallelism detectors, since they have been described in early visual cortex. We should, however, not overlook the fact that the binding problem is being solved within these operations: one feature (e.g. ∧) is being bound to a second (e.g. //). If we consider a slightly more abstract problem of, say, detecting a blue diamond shape, the process need be no more complicated: the only additional requirement is that one or more of the input features is already paired with blue (e.g. ∧, ∨, //, and blue \\). In V4 and inferior temporal cortex there are plenty of cells tuned to colour and to edge orientation or shape. At least some psychologists do not trust such simple operations by the little grey cells. Psychologists ponder the binding problem and worry that when the visual system is faced with a black and white diamond and a blue square, the observer will suffer an illusory conjunction of blueness and diamond
375
aapc18.fm Page 376 Wednesday, December 5, 2001 10:06 AM
376
Common mechanisms in perception and action
shape and therefore see a blue diamond. The hypothetical blue diamond detector described above will not suffer illusory detection because blue is linked to diagonal edges, which are absent from the image containing a black and white diamond and a blue square. In general, a mechanism for solving the binding problem relies on utilization of input features that are already conjointly sensitive to the dimensions to be bound but do not yet make explicit the speci1c con1guration that needs to be detected. Taking on a more interesting problem, we can consider the detection of a rightward moving diamond. To detect a blue diamond we needed blue to be bound with diamond shape and we required blue to be bound at the input stage with one or more of the elementary angular or parallel features. Similarly detecting a rightward moving diamond can be performed by utilizing the same input features (e.g. ∧, ∨, and //) with the additional speci1cation that one or more of these shapesensitive feature inputs requires movement to the right. Posterior inferior temporal cortex contains cells that are selective for elementary shapes and movement direction (Gross, Rocha-Miranda, and Bender 1972; Tanaka et al. 1991), so building more elaborate detectors for complex shapes can inherit the property of directionality from these earlier feature detectors. Such a progression of information processing could allow the detection of particular shapes moving in particular ways. This mechanism may account for the conjoint sensitivity of STSa cells to the translation of face and body shapes that has been described here. While the schemes outlined above consider the detection of simple shapes and conjunction of simple attributes, processing underlying the detection of more complex patterns can utilize the same principles. For example, if one utilizes inputs tuned to letter pairs (e.g. _m, me, ea, at, e_), one can build speci1c detectors tuned to speci1c words (e.g. meat) that are insensitive to other words with combinations of the letters (e.g. mate, tame, team). Designing such a process to detect facial patterns and ignore patterns that are jumbled facial features is straightforward under this scheme too (see Perrett and Oram 1998). Note that this scheme can work even if the feature detectors themselves have large receptive 1elds—though, to avoid confusion over which elements belong to which object there may be a trade-off for the size of the receptive 1eld and the complexity of features required (Fukushima 1980; Perrett and Oram 1998).
18.7.2 Action coding in humans Many aspects of neural processing of social signals reviewed here, which were discovered 1rst in the monkey brain, have since been found to apply to the human STS. This includes ‘biological motion’ (Bonda, Petrides, Ostry, and Evans 1996), hand actions (Grafton, Arbib, Fadiga, and Rizzolatti 1996; Iacoboni et al. 1999; Rizzolatti et al. 1996), facial movements (Puce et al. 1998), gaze direction (Hoffman and Haxby 2000; Wicker, Michel, Henaff, and Decety 1998), and meaningful actions (Decety et al. 1997), though each of these appears to be processed at relatively posterior sites in the human STS (Allison, Puce, and McCarthy 2000). Indeed, the human temporal lobe is a vast structure, which may mean that our understanding of behaviour relies on visual processing of cues that go on increasing in complexity and subtlety at more anterior temporal lobe sites. From neurophysiological studies of cells in frontal (di Pellegrino et al. 1992; Jeannerod, Arbib, Rizzolatti, and Sakata 1995) and parietal cortex (Gallese et al., this volume, Chapter 17) it is now abundantly clear that the visual coding of actions involves studies of widespread neural systems, though how these systems interact is unknown.
aapc18.fm Page 377 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Acknowledgements This work was supported by the H.F.S.P. We thank C. Baker, B. Wicker, and C. Keysers for their help in parts of the work described here.
References Abell, F., Krams, M., Ashburner, J., Passingham, R., Friston, K., Frackowiak, R., Happe, F., Frith, C., and Frith, U. (1999). The neuroanatomy of autism: A voxel-based whole brain analysis of structural scans. Neuroreport, 10, 1647–1651. Allison, T., Puce, I., and McCarthy, G. (2000). Social perception from visual cues: The role of the STS region. Trends in Cognitive Science, 4, 267–278. Ashbridge, E., Perrett, D.I., Oram, M.W., and Jellema, T. (2000). Effect of image orientation and size on object recognition: Responses of single units in the macaque monkey temporal cortex. Cognitive Neuropsychology, 17, 13–34. Baizer, J.S., Ungerleider, L.G., and Desimone, R. (1991). Organization of visual inputs to the inferior temporal and posterior parietal cortex in macaques. Journal of Neuroscience, 11, 168–190. Baker, C.I., Keysers, C., Jellema, T., and Perrett, D.I. (2000). Coding of spatial position in the superior temporal sulcus of the macaque. Current Psychology Letters: Brain, Behaviour and Cognition, 1, 71–87. Baker, C.I., Keysers, C., Jellema, T., Wicker, B., and Perrett, D.I. (2001). Neuronal representation of disappearing and hidden objects in temporal cortex. Experimental Brain Research, 140, 375–381. Baron-Cohen, S. (1994). How to build a baby that reads minds: Cognitive mechanisms in mindreading. Current Psychology of Cognition, 13, 513–552. Baron-Cohen, S. (1996). Mindblindness. An essay on autism and theory of mind. Cambridge, MA: MIT Press. Bonda, E., Petrides, M., Ostry, D., and Evans, A. (1996). Speci1c involvement of human parietal systems and the amygdala in the perception of biological motion. Journal of Neuroscience, 16, 3737–3744. Boussaoud, D., Ungerleider, L.G., and Desimone, R. (1990). Pathways for motion analysis: Cortical connections of the medial superior temporal and fundus of the superior temporal visual areas in the macaque. Journal of Comparative Neurology, 296, 462–495. Brothers, L. (1995). The neurophysiology of the perception of intentions by primates. In M. Gazzaniga (Ed.), The cognitive neurosciences. Cambridge, MA: MIT Press. Decety, J., Grèzes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E., Grassi, F., and Fazio, F. (1997). Brain activity during observation of actions. In2uence of action content and subject’s strategy. Brain, 120, 1763–1777. Desimone, R. and Ungerleider, L.G. (1989). Neural mechanisms of visual processing in monkeys. In F. Boller and J. Grafman (Eds.), Handbook of neuropsychology, Vol. 2, pp. 267–299. Amsterdam: Elsevier. di Pellegrino, G., Fadiga, L., Fogassi, V., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Reseach, 91, 176–180. Dobbins, A.C., Jeo, R.M., Fiser, J., and Allman, J.M. (1998). Distance modulation of neural activity in the visual cortex. Science, 281, 552–555. Emery, N.J. and Perrett, D.I. (1994). Understanding the intentions of others from visual signals: Neurophysiological evidence. Current Psychology of Cognition, 13, 683–694. Emery, N.J. and Perrett, D.I. (1999). How can studies of monkey brain help us understand ‘theory of mind’ and autism in humans? In S. Baron-Cohen, H. Tager-Flusberg, and D.J. Cohen (Eds.), Understanding other minds (2nd edn): Perspectives from autism and cognitive neuroscience, pp. 279–310. Oxford: Oxford University Press. Fukushima, K. (1980). Neocognition: A self-organising neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36, 193–202. Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. This volume, Chapter 17. Good, P.F. and Morrison, J.H. (1995). Morphology and kainate-receptor immunoreactivity of identi1ed neurons within the entorhinal cortex projecting to superior temporal sulcus in the cynomolgus monkey. Journal of Comparative Neurology, 357, 25–35.
377
aapc18.fm Page 378 Wednesday, December 5, 2001 10:06 AM
378
Common mechanisms in perception and action
Goodale, M.A. and Milner, A.D. (1992). Separate visual pathways for perception and action. Trends in Neuroscience, 15, 20–25. Goodale, M.A., Milner, A.D., Jakobson, L.S., and Carey, D.P. (1991). A neurological dissociation between perceiving objects and grasping them. Nature, 349, 154–156. Grafton, S.T., Arbib, M.A., Fadiga, L., and Rizzolatti, G. (1996). Localization of grasp representations in humans by positron emission tomography. 2. Observation compared with imagination. Experimental Brain Research, 112, 103–111. Graziano, M.S.A., Yap, G.S., and Gross, C.G. (1994). Coding of visual space by premotor neurons. Science, 266, 1054–1057. Graziano, M.S.A., Hu, X., and Gross, C.G. (1997). Coding the locations of objects in the dark. Science, 277, 239–241. Gross, C.G., Rocha-Miranda, C.E., and Bender, D.B. (1972). Visual properties of neurons in inferotemporal cortex of the macaque. Journal of Neurophysiology, 35, 96–111. Harries, M.H. and Perrett, D.I. (1991). Modular organization of face processing in temporal cortex: Physiological evidence and possible anatomical correlates. Journal of Cognitive Neuroscience, 3, 9–24. Hasselmo, M.E., Rolls, E.T., Baylis, G.C., and Nalwa, V. (1989). Object-centered encoding by face-selective neurons in the cortex in the superior temporal sulcus of the monkey. Experimental Brain Research, 75, 417–429. Haxby, J.V., Grady, C.L., Horwitz, B., Ungerleider, L.G., Mishkin, M., Carson, R.E., Herscovitch, P., Schapiro, M.B., and Rapoport, S.I. (1991). Dissociation of object and spatial visual processing pathways in human extrastriate cortex. Proceedings of the National Academy of Sciences USA, 88, 1621–1625. Haxby, J.V., Grady, C.L., Horwitz, B., Salerno, J., Ungerleider, L.G., and Mishkin, M. (1993). Dissociation of object and spatial visual processing pathways in human extrastriate cortex. In B. Gulyás, D. Ottoson, and P.E. Roland (Eds.), Functional organisation of the human visual cortex, pp. 329–340. Oxford: Pergamon Press. Hietanen, J.K. and Perrett, D.I. (1993a). Motion sensitive cells in the macaque superior temporal polysensory area: I. Lack of response to the sight of the monkey’s own hand. Experimental Brain Research, 93, 117–128. Hietanen, J.K. and Perrett, D.I. (1993b). The role of expectation in visual and tactile processing within temporal cortex. In T. Ono et al. (Eds.), Brain mechanisms for perception and memory: From neuron to behaviour, pp. 83–103. Oxford: Oxford University Press. Hietanen, J.K. and Perrett, D.I. (1996a). Motion sensitive cells in the macaque superior temporal polysensory area: Response discrimination between self- and externally generated pattern motion. Behavioural Brain Research, 76, 155–167. Hietanen, J.K. and Perrett, D.I. (1996b). A comparison of visual responses to object- and ego-motion in the macaque superior temporal polysensory area. Experimental Brain Research, 108, 341–345. Hoffman, E.A. and Haxby, J.V. (2000). Distinct representations of eye gaze and identity in the distributed human neural system for face perception. Nature Neuroscience, 3, 80–84. Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C., and Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526–2528. Ito, M., Tamura, H., Fujita, I., and Tanaka, K. (1995). Size and position invariance of neuronal responses in monkey inferotemporal cortex. Journal of Neurophysiology, 73, 218–226. Jeannerod, M., Arbib, M.A., Rizzolatti, G., and Sakata, H. (1995). Grasping objects: The cortical mechanisms of visuomotor transformation. Trends in Neuroscience, 18, 314–320. Jellema, T. and Perrett, D.I. (1999). Coding of object position in the banks of the Superior Temporal Sulcus of the Macaque. Society of Neuroscience Abstracts, 25, 919. Jellema, T., Baker, C.I., Wicker, B., and Perrett, D.I. (2000a). Neural representation for the perception of the intentionality of actions. Brain and Cognition, 44, 280–302. Jellema, T., Baker, C.I., Perrett, D., and Wicker, B. (2000b). Neural representation for the perception of the intentionality of actions. International Journal of Psychology, 35, 205. Jellema, T., Baker, C.I., Oram, M.W., and Perrett, D.I. (2002). Cell populations in the banks of the superior temporal sulcus of the macaque and imitation. In A. Meltzoff and W. Prinz (Eds.), The imitative mind: Development, evolution and brain bases, pp. 267–290. Cambridge: Cambridge University Press. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–211. Köhler, S., Kapur, S., Moscovitch, M., Winocur, G., and Houle, S. (1995). Dissociation of pathways for object and spatial vision: A PET study in humans. NeuroReport, 6, 1865–1868.
aapc18.fm Page 379 Wednesday, December 5, 2001 10:06 AM
Coding of visible and hidden actions
Langton, S.R.H., Watt, R.J., and Bruce, V. (2000). Do the eyes have it? Cues to the direction of social attention. Trends in Cognitive Science, 4, 50–59. Logothetis, N.K., Pauls, J., and Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5, 552–563. Lorincz, E.N., Baker, C.I., and Perrett, D.I. (1999). Visual cues for attention following in rhesus monkeys. Current Psychology of Cognition, 18, 973–1001. Lueschow, A., Miller, E.K., and Desimone, R. (1994). Inferior temporal mechanisms for invariant object recognition. Cerebral Cortex, 5, 523–531. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco, CA: Freeman. Marr, D. and Nishihara, H.K. (1978). Representation and recognition of the spatial organization of threedimensional shapes. Proceedings of the Royal Society of London: Series B, 200, 269–294. Milner, A.D. and Goodale, M.A. (1993). Visual pathways to perception and action. Progress in Brain Research, 95, 317–337. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action. Oxford: Oxford University Press. Mistlin, A.J. and Perrett, D.I. (1990). Visual and somatosensory processing in the macaque temporal cortex: The role of expectation. Experimental Brain Research, 82, 437–450. Morel, A. and Bullier, J. (1990). Anatomical segregation of two cortical visual pathways in the macaque monkey. Visual Neuroscience, 4, 555–578. Murata, A., Gallese, V., Kaseda, M., and Sakata, H. (1996). Parietal neurons related to memory-guided hand manipulation. Journal of Neurophysiology, 75, 2180–2186. Murphy, K.J., Carey, D.P., and Goodale, M.A. (1998). The perception of spatial relations in a patient with visual form agnosia. Cognitive Neuropsychology, 15, 705–722. Oram, M.W. and Perrett, D.I. (1994). Responses of anterior superior temporal polysensory (STPa) neurons to ‘biological motion’ stimuli. Journal of Cognitive Neuroscience, 6, 99–116. Oram, M.W. and Perrett, D.I. (1996). Integration of form and motion in the anterior superior temporal polysensory area (STPa) of the macaque monkey. Journal of Neurophysiology, 76, 109–129. Oram, M.W., Perrett, D.I., and Hietanen, J.K. (1993). Directional tuning of motion sensitive cells in the anterior superior temporal polysensory area (STPa) of the macaque. Experimental Brain Research, 97, 274–294. Perrett, D.I. and Oram, M.W. (1998). Visual recognition based on temporal cortex cells: Viewer-centred processing of pattern con1guration. Zeitschrift für Naturforschung, C53, 518–541. Perrett, D.I., Smith, P.A.J., Potter, D.D., Mistlin, A.J., Head A.S., Milner, A.D., and Jeeves, M.A. (1984). Neurones responsive to faces in the temporal cortex: Studies of functional organization, sensitivity to identity and relation to perception. Human Neurobiology, 3, 197–208. Perrett, D.I., Smith, P.A.J., Potter, D.D., Mistlin, A.J., Head, A.S., Milner, A.D., and Jeeves, M.A. (1985a). Visual cells in the temporal cortex sensitive to face view and gaze direction. Proceedings of the Royal Society of London: Series B, 223, 293–317. Perrett, D.I., Smith, P.A.J., Mistlin, A.J., Chitty, A.J., Head, A.S., Potter, D.D., Broennimann, R., Milner, A.D., and Jeeves, M.A. (1985b). Visual analysis of body movements by neurons in the temporal cortex of the macaque monkey: A preliminary report. Behavioural Brain Research, 16, 153–170. Perrett, D.I., Harries, M.H., Bevan, R., Thomas, S., Benson, P.J., Mistlin, A.J., Chitty, A.J., Hietanen, J.K., and Ortega, J.E. (1989). Frameworks of analysis for the neural representation of animate objects and actions. Journal of Experimental Biology, 146, 87–113. Perrett, D.I., Harries, M.H., Benson, P.J., Chitty, A.J., and Mistlin, A.J. (1990a). Retrieval of structure from rigid and biological motion: An analysis of the visual response of neurons in the macaque temporal cortex. In T. Troscianko and A. Blake (Eds.), AI and the eye, pp. 181–201. Chichester: Wiley. Perrett, D.I., Harries, M.H., Chitty, A.J., and Mistlin, A.J. (1990b). Three stages in the classi1cation of body movements by visual neurons. In H.B. Barlow, C. Blakemore, and M. Weston-Smith (Eds.), Images and understanding, pp. 94–108. Cambridge: Cambridge University Press. Perrett, D.I., Harries, M.H., Mistlin, A.J., Hietanen, J.K., Benson, P.J., Bevan, R., Thomas, S., Ortega, J., Oram M.W., and Brierly, K. (1990c). Social signals analysed at the single cell level: Someone’s looking at me, something touched me, something moved. International Journal of Comparative Psychology, 4, 25–50. Perrett, D.I., Oram, M.W., Harries, M.H., Bevan, R., Hietanen, J.K., Benson, P.J., and Thomas, S. (1991). Viewer-centred and object-centred coding of heads in the macaque temporal cortex. Experimental Brain Research, 86, 159–173.
379
aapc18.fm Page 380 Wednesday, December 5, 2001 10:06 AM
380
Common mechanisms in perception and action
Perrett, D.I., Hietanen, J.K., Oram, M.W., and Benson, P.J. (1992). Organization and functions of cells responsive to faces in the temporal cortex. Philosophical Transactions of the Royal Society of London: Series B, 335, 23–30. Perrett, D.I., Oram, M.W., and Ashbridge, E. (1998). Evidence accumulation in cell populations responsive to faces: An account of generalisation of recognition without mental transformations. Cognition, 67, 111–145. Puce, A., Allison, T., Bentin, S., Gore, J.C., and McCarthy, G. (1998). Temporal cortex activation in humans viewing eye and mouth movements. Journal of Neuroscience, 18, 2188–2199. Riesenhuber, M. and Poggio, T. (2000). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. Rizzolatti, G., Fadiga, L., Matelli, M., Bettinardi, V., Paulesu, E., Perani, D., and Fazio, F. (1996). Localization of grasp representations in human by PET. 1. Observation versus execution. Experimental Brain Research, 111, 246–252. Rolls, E.T. and Baylis, G.C. (1986). Size and contrast have only small effects on the responses to faces of neurons in the cortex of the superior temporal sulcus of the monkey. Experimental Brain Research, 65, 38–48. Rolls, E.T., Robertson, R.G., and Georges-François, P. (1997). Spatial view cells in the primate hippocampus, European Journal of Neuroscience, 9, 1789–1794. Rolls, E.T., Treves, A., Robertson, R.G., Georges-François, P., and Panzeri, S. (1998). Information about spatial view in an ensemble of primate hippocampal cells. Journal of Neurophysiology, 79, 1797–1813. Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tanaka, Y., and Tsutsui, K.-I. (1998). Neural coding of 3D features of objects for hand action in the parietal cortex of the monkey. Philosophical Transactions of the Royal Society of London: Series B, 353, 1363–1373. Seibert, M. and Waxman, A.M. (1992). Adaptive 3D object recognition from multiple views. IEEE-PAMI, 14, 107–124. Seltzer, B. and Pandya, D.N. (1984). Further observations on parieto-temporal connections in the rhesus monkey. Experimental Brain Research, 55, 301–312. Seltzer, B. and Pandya, D.N. (1994). Parietal, temporal and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: a retrograde tracer study. Journal of Comparative Neurology, 243, 445–463. Sereno, A.B. and Maunsell, J.H.R. (1998). Shape selectivity in primate lateral intraparietal cortex. Nature, 395, 500–503. Tanaka, K., Saito, H.-A., Fukada, Y., and Moriya, M. (1991). Coding visual images of objects in the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology, 66, 170–189. Tanaka, Y.Z., Koyama, T., and Mikami, A. (1999). Neurons in the temporal cortex changed their preferred direction of motion dependent on shape. NeuroReport, 10, 393–397. Trotter, Y. and Celebrini, S. (1999). Gaze direction controls response gain in primary visual-cortex neurons. Nature, 398, 239–242. Ungerleider, L.G. and Mishkin, M. (1982). Two cortical visual systems. In D.J. Ingle, M.A. Goodale, and R.J.W. Mans1eld (Eds.), Analysis of visual behavior, pp. 549–586. Cambridge, MA: MIT Press. Wachsmuth, E., Oram, M.W., and Perrett, D.I. (1994). Recognition of objects and their component parts: Responses of single units in the temporal cortex of the macaque. Cerebral Cortex, 5, 509–522. Walsh, V. and Perrett, D.I. (1994). Visual attention in the occipitotemporal processing stream of the macaque. Cognitive Neuropsychology, 11, 243–263. Wicker, B., Michel, F., Henaff, M.A., and Decety, J. (1998). Cerebral structures involved in the perception of gaze: A PET study. Neuroimage, 8, 221–227. Wicklegren, W.A. (1969). Context-sensitive coding, associative memory and serial order in (speech) behaviour. Psychological Review, 76, 1–15.
aapc19.fm Page 381 Wednesday, December 5, 2001 10:06 AM
19 The visual analysis of bodily motion Maggie Shiffrar and Jeannine Pinto Abstract. Does the visual system process human movement differently from object movement? If so, what are the criteria that the visual system uses to discriminate between human and non-human motions? A series of psychophysical studies was conducted to address these questions by examining the conditions under which the visual perception of human and object motions appears to rely on similar and different mechanisms. To determine whether motion integration across space is similar for human and non-human movements, observers viewed moving stimuli through a set of spatially disconnected apertures. Under these conditions, motion integration across space was found to signi1cantly differ for human and object movements as long as the human movement was upright and consistent with normal locomotion. An apparent-motion paradigm was used to investigate motion integration across time. It was found that human and object movements are similarly perceived at brief temporal intervals. However, important differences arise at slower display rates. Finally, recent PET data indicate motor-system activity during the perception of possible, but not impossible, human movements. When considered together, these results support the hypothesis that the visual analysis of human movement does differ from the visual analysis of a wide variety of non-human movements whenever visual motion signals are consistent with an observer’s internal representation of possible human movements.
19.1 Introduction Over the past thirty years, researchers have repeatedly noted that human observers demonstrate an exquisite visual sensitivity to the movements of other people. Such statements imply that our visual sensitivity to human movement must differ from, or even be superior to, our visual sensitivity to the movements of non-human objects. Does the visual analysis of human movement actually differ from other motion analyses? If so, under what conditions? The identi1cation of signi1cant differences between the visual analysis of human and non-human movements would be important because it would challenge the commonly held assumption that the visual system is a general-purpose processor that analyzes all visual stimuli in the same manner. An alternative to the general-purpose processor approach is the proposal that the visual analyses may be best understood in relation to the motor outputs they subserve (e.g. Bridgeman 1992, this volume, Chapter 5; Goodale and Milner 1992; Milner and Goodale 1995; Prinz 1997; Rossetti and Pisella, this volume, Chapter 4). It should be fairly obvious that the visual perception of a waving friend and a wind-blown tree are normally associated with different motor responses on the part of the observer. I might wave back to a friend, but I would never wave back to a swaying tree. Thus, the intimate connection between human social behavior and human movement perception may render some separation between human and non-human motion analyses. Increasingly, neurophysiological evidence supports the existence of neural mechanisms dedicated to the analysis of primate movement. For example, single-cell recordings in the anterior superior temporal polysensory area (area STPa) of the monkey have repeatedly identi1ed cells that are conjointly sensitive to particular primate forms and motions (e.g. Bruce, Desimone, and Gross 1981;
aapc19.fm Page 382 Wednesday, December 5, 2001 10:06 AM
382
Common mechanisms in perception and action
Jellema and Perrett, this volume, Chapter 18; Oram and Perrett 1994; Perrett, Harries, Mistlin, and Chitty 1990). Consistent with this, human brain-imaging studies have revealed that neural responses in this region are tied to the perception of simpli1ed displays of human dance (Bonda, Petrides, Ostry, and Evans 1996). Other single-cell recordings have resulted in the identi1cation of ‘mirror neurons’ in the premotor cortex that respond selectively when a monkey performs some action and when that monkey watches another monkey or a human perform the same action (Gallese et al. this volume, Chapter 17; Rizzolatti, Fadiga, Gallese, and Fogassi 1996). Again, in the human, when subjects passively observe the actions of another human so that they can later imitate those actions, selective PET activity is found in premotor cortex (Decety et al. 1997). These 1ndings are particularly intriguing because they suggest that the visual perception of human action may involve a functional linkage between the perception and production of human motor activity that is absent from the perception of object movement (Viviani, this volume, Chapter 21; Viviani, Baud-Bovy, and Redol1 1997). We shall return to this point later. Thus far, we have been very careful to draw a simple distinction between human movement and non-human movement. Given the 1ndings from the above single-cell recording experiments in which monkey STPa and mirror neurons respond to both human and monkey movement, this human– non-human dichotomy is probably misleading. Numerous potential dichotomies present themselves as possible alternatives, including primate versus non-primate, animal versus non-animal, animate versus inanimate, living versus non-living, etc. Obviously, much more data are needed before one can con1dently posit the optimal dichotomy; assuming that one even exists. Studies currently underway in our laboratory address the theoretical utility of these and other possibilities. However, for the purposes of this paper, we prefer to start at the beginning by comparing human movement with the movement of inanimate, non-living objects. If the visual system treats human motion differently from other motions, this difference should be most easily identi1ed with the human–object comparison. Only after processing differences between these two non-overlapping event categories have been established would it make sense to try to develop more 1ne-grained de1nitions. Moreover, to maintain a clean separation between these two categories, we initiate our studies with the use of human movements that do not involve objects—either their physical or implied (pantomimed) presence. Thus, for the purposes of this paper, we use the terms ‘human movement’ and ‘action’ interchangeably to refer to non-goal-directed movements of the human limbs about an intact human body. In sum, the goal of this paper is to determine whether the visual perception of human movement differs from the visual perception of moving objects. Are there conditions under which the visual system treats human and object motions similarly? By addressing these issues, we hope to develop a deeper understanding of the cues that the visual system uses to de1ne a stimulus event as a human movement. Our investigation is structured as follows. We begin with a very brief review of some of the data that demonstrate that motion processes are necessary for the representation of human displacement. Given that motion appears to be a de1ning characteristic of human action, we systematically examine various fundamental characteristics of complex motion perception. This begins with a review of the need for the integration of local motion signals across space. We then examine whether motion integration across space is similar during the visual analysis of object and human movement. Motion integration over time is then addressed within the context of apparent motion of human and object movements. Finally, recent neurophysiological data concerning the visual perception of human and object movement are discussed.
aapc19.fm Page 383 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
19.1.1 Is motion needed for the representation of the human body? It has been asserted that the visual system constructs different representations for action and recognition (e.g. Milner and Goodale 1995). One of the ways in which these representations are thought to differ is in their generalization across viewpoint changes. Two general classes of representations are used to explain how we recognize objects in novel orientations. In egocentric representations, stimuli are represented in speci1c orientations relative to the observer. Such representations are therefore known as view-dependent. Perception for action is thought to rely upon these viewpointdependent, egocentric representations (Milner and Goodale 1995). Representations can also be object-centered when stimuli are represented as structural descriptions that are independent of the stimulus’ orientation relative to the observer. Such viewpoint-independent, object-centered representations are thought to underlie perception for recognition (Milner and Goodale 1995). However, recent perceptual-priming experiments call into question this strict dichotomy between representational formats (Kourtzi and Shiffrar 1997, 1999b). More speci1cally, when observers view static presentations of a human actor, their visual systems appear to represent these different views of the actor in an egocentric manner. However, when these same views are presented under conditions of apparent motion, object-centered representations of the human body result (Kourtzi and Shiffrar 1999a). These object-centered representations only appear when views of a human actor are consistent with the biomechanical limitations of the human body. Two conclusions can be drawn from these results. First, the visual representation of human movement may depend upon the movement limitations of the human body. We will return to this point repeatedly in subsequent studies. Second, motion processes appear to play a necessary and fundamental role in the perception and representation of the dynamic human body. Do motion processes play the same role in the perception and representation of moving objects? This question is discussed in the following section.
19.1.2 Motion integration and segmentation While sipping a cup of tea at an outdoor café, I visually scan my environment. In doing so, I observe trees gently bending with the wind, my rotating tea-cup, cars zipping down the street, and people rushing by. To make sense of these different motions, my visual system must simultaneously perform two apparently con2icting tasks. On one hand, it must separate motion signals belonging to different objects. It would be an error for me to confuse the motion of a car with the motion of a pedestrian. On the other hand, my visual system must also combine motion cues belonging to the same object. While a pedestrian’s swinging arms usually move in opposing directions, these motion signals must be combined before I can visually interpret the movements of an entire person. These simultaneous processes of integration and segmentation are what allow us to interpret moving stimuli (see Shiffrar 2001, for review). The accurate integration and segmentation of motion information poses a challenge to the visual system as a result of some of the structural characteristics of the neurons that make early motion measurements (e.g. Hubel and Wiesel 1968; Movshon, Thompson, and Tolhurst 1978). First, neurons in the early stages of the visual system have relatively small receptive 1elds and as such can only respond to changes within very small regions of an observer’s 1eld of view. As a result, in order to interpret the motion of an object or animal, motion information must be combined across much larger regions of retinal space. Second, early motion–sensitive neurons are conjointly sensitive to direction and orientation. This combined sensitivity means that directionally sensitive neurons with
383
aapc19.fm Page 384 Wednesday, December 5, 2001 10:06 AM
384
Common mechanisms in perception and action
Fig. 19.1 The aperture problem. (A) On the left, a diagonal line translates upward. Each line segment shows the position of the translating line at a different time. On the right, the vertically translating line is viewed through a small window or aperture. Such apertures can be used to represent the receptive 1eld of a neuron. (B) On the left, a diagonal line translates rightward. Again, each line segment illustrates the position of the translating line at a different time. On the right, the rightwardly translating line is viewed through an aperture. Note that the upward (A) and rightward (B) motions appear to be identical when they are viewed through an aperture that hides the end points of the line. This so-called aperture problem refers to the fact that the motion of a translating line or grating is ambiguous. This ambiguity arises from the fact that the component of translation parallel to a line’s orientation can not be measured unless the real ends of the lines are visible.
small receptive 1elds will sometimes give the same response to very different motions. Thus, the activity of any particular neuron provides only ambiguous motion information. This ambiguity, illustrated in Fig. 19.1 (A and B), is commonly referred to as the aperture problem. The aperture problem can arise whenever the motion of a continuous luminance edge must be estimated from the activity of a receptor having a small receptive 1eld. To understand this problem from a spatial perspective, 1rst consider that the motion of any line can be decomposed into the portion that is parallel to the line and the portion that is perpendicular to the line. Because a neuron cannot track or ‘see’ the ends of the line if those ends fall outside of its receptive 1eld, the neuron cannot measure any of the motion that is parallel to the line’s orientation (that is, along the length of the line). As a result, a neuron can only detect the perpendicular component of the line’s motion. Because only this perpendicular component of motion can be measured, all motions having the same perpendicular motion will appear to be identical even if these motions differ signi1cantly in their parallel component. Thus, a neuron will give the same response to many different motions. Because all known visual systems, whether biological or computational, have neurons with receptive 1elds that are limited in size, this measurement ambiguity has been extensively studied (e.g. Hildreth 1984; Wallach 1935). How can observers interpret the motions of objects or humans when early motion measures are inherently ambiguous? While the interpretation of a single translating line is ambiguous, the possible interpretations of its motion are limited to a large family of related motions. All of the members
aapc19.fm Page 385 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
Fig. 19.2 The intersection of constraints solution to the aperture problem. Because of the aperture problem, the true motion of a line or grating viewed within an aperture could be any one of an in1nitely large family of different motions de1ned by its constraint line (shown here as a dashed line). The visual system can overcome this ambiguity by considering the motion measurements from two or more differently oriented lines. That is, while the measured motion of a single translating line is consistent with in1nitely many interpretations, measurements of differently oriented lines can be combined to uniquely interpret the line motion. This unique solution is de1ned by the point of intersection of two different constraint lines (shown on the right) and is known as the intersection of constraints or IOC solution. of this family differ only in the component of translation that is parallel to the line’s orientation. Members of two hypothetical families are illustrated by the groups of three arrows in Fig. 19.2. The visual system can solve the aperture problem by taking advantage of this regularity in possible motions. To do so, individually ambiguous motion estimates from two differently oriented lines must be combined. As long as two differently oriented lines are rigidly connected to each other, and actually moving in the same direction, their corresponding constraint lines will intersect at a single point. This point, known as the intersection of constraints or IOC, de1nes the only possible motion interpretation that is shared by the two rigidly connected translating lines. Thus, when the visual system is correct in assuming that two lines are rigidly connected to each other, then the motion of the stimulus de1ned by those lines can be uniquely interpreted. Experimental support for this IOC approach comes from studies examining the visual perception of and neural response to spatially overlapping edges and gratings. In their in2uential behavioral experiments, Adelson and Movshon (1982) asked subjects to report whether superimposed sinusoidal gratings (illustrated on the right side of Fig. 19.2) appeared to move as a coherent whole. When the luminance contrast and the spatial frequency of the two gratings were similar, subjects perceived a single plaid pattern translating in the direction of the IOC solution. On the other hand, when the two gratings differed signi1cantly in their spatial frequency or contrast, subjects reported the perception of two independently translating gratings that slid over one another. These results suggest that when overlapping stimuli are structurally similar, the visual system assumes that they belong to the same object and, as a result, combines their component motions.
19.2 Motion integration across space: objects The above results provide just one example of how the visual system might solve the aperture problem for superimposed gratings presented within a single receptive 1eld or region of visual space. How does the visual system link motion signals across space? Previous theories assumed that non-overlapping, moving edges would be analyzed and perceived in the same way as overlapping edges. However,
385
aapc19.fm Page 386 Wednesday, December 5, 2001 10:06 AM
386
Common mechanisms in perception and action
subsequent behavioral tests have not supported this hypothesis. When subjects view differently oriented edges through disconnected apertures, they experience systematic dif1culties in their ability to link motion signals across the disconnected edges (Shiffrar and Pavel 1991; Lorençeau and Shiffrar 1992). For example, when viewing a simple, rigidly rotating polygon through a set of apertures, subjects cannot combine motion measurements accurately across the polygon’s edges (Shiffrar and Pavel 1991). Instead, subjects perceive non-rigid movement. Even when they know that they are viewing a square rigidly rotating behind four stationary apertures, they still perceive either disconnected rotating line segments or a pulsating elastic 1gure. Thus, although theories of motion perception are based on the assumption that the visual system overcomes the ambiguity of individual motion measurements by combining those measurements, observers are often unable to perform this crucial task. What is the cause of subjects’ inability to integrate velocity estimates across the different sides of a rotating object? In classic motion-integration studies (e.g. Adelson and Movshon 1982), edges undergo translation rather than rotation (Shiffrar and Pavel 1991). Therefore, to ensure that the differences in motion integration within and across spatial locations suggested by the above study did not result from differences in the type of motion used, motion integration across translating edges was examined (Lorençeau and Shiffrar 1992). To that end, subjects in another series of experiments viewed a diamond 1gure rigidly translating behind a set of spatially separated apertures, as shown in Fig. 19.3(A). In a two-alternative forced-choice procedure, subjects performed a direction discrimination task constructed so that the diamond’s direction of translation could only be determined from an integration of the motion measurements across the diamond’s visible edges. When the translating diamond was centrally presented at high luminance contrast, subjects performed at chance levels in the direction discrimination task. That is, even though subjects knew they were viewing a translating diamond, they could not link motion signals across the diamond’s sides and determine its direction of motion. Instead, under these conditions, the visual system interpreted the display as four independently translating object fragments.
Fig. 19.3 (A) A diamond translates rightward behind four rectangular windows. The four visible line segments appear to move in different directions. (B) However, if the shape of the window edges is changed so that positional noise is added to the visible line endings, the same four edges now appear to move coherently.
aapc19.fm Page 387 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
When considered together, the results of the above studies clearly suggest that the integration of motion signals within a single region differs from the integration of motion signals across disconnected spatial locations. So how does the visual system control the integration of motion information across different spatial locations? Outside of the laboratory, visual scenes usually contain multiple objects. To identify dynamic actions and objects in natural scenes, the visual system must integrate motion measurements originating from the same physical unit while segmenting motion measurements arising from different units. Because the ends of lines (or terminators) and the ends of surfaces (or corners) are simple form cues that signal object boundaries, such discontinuities may determine when motion measurements are linked across edges. This hypothesis was tested in the following studies. If contour discontinuities determine whether motion integration or segmentation occurs, then manipulations of discontinuity visibility should signi1cantly alter the visual interpretation of dynamic images. In the previously described translating diamond display, four stationary apertures were positioned so that only one segment of each of the diamond’s four sides could be viewed. The apertures were rectangular so that the visible length of each segment remained constant as the diamond moved. This created eight (two per segment) high-contrast terminators that smoothly translated back and forth along the obliquely oriented aperture sides. We manipulated the visibility of these terminators in three very different manners: luminance contrast, positional noise (as indicated in Fig. 19.3(B)), or peripheral presentation. In every case, when terminator visibility was low, because terminators were presented at low luminance contrast, with added noise, or in peripheral vision, performance was high since motion integration across the visible diamond segments was facilitated. Conversely, whenever the terminators became more visible, performance dropped. Since accurate performance requires motion integration, this performance decrease suggests that motion segmentation increased with terminator visibility. This pattern of results strongly suggests that terminators determine whether motion information is integrated or segmented; that is, whether the visual system interprets moving objects as coherent or fragmented (Lorençeau and Shiffrar 1992).
19.2.1 Motion integration across space: action The previous studies indexed some of the information that the visual system uses to interpret the motion of simple shapes. Are the cortical mechanisms that were tapped during these studies also involved in the analysis of human movement? If the perception of human action truly differs from the perception of moving objects, then these two perceptual analyses may differ in their spatial constraints. Is the integration of human movement across space different from the spatial integration of object movements? This question was addressed with an adaptation of the polygon-moving-behindapertures display described above. In this experiment, the moving polygon was replaced with a translating car, opening and closing scissors, or a walking human, as indicated in Fig. 19.4(A). Subjects simply viewed one of these three items moving behind a set of specially constructed apertures (see Fig. 19.4(B)) for six seconds and reported what they observed (Shiffrar, Lichtey, and HeptullaChatterjee 1997). The apertures were constructed so that only straight edges; that is, no corners or end points, were visible. The subjects verbally reported what they saw. An audiotape of their descriptions was given to a naïve scorer who categorized the subjects’ descriptions as either re2ecting local segmentation (independently moving line segments) or global integration (a whole object or being). If human movement is analyzed over a greater spatial extent than object motion, then observers should be more likely to identify the walking human 1gures than the inanimate objects moving behind apertures. In a control condition, subjects viewed static versions of each of the three displays.
387
aapc19.fm Page 388 Wednesday, December 5, 2001 10:06 AM
388
Common mechanisms in perception and action
Fig. 19.4 (A) Examples of the walking 1gure, car, and scissors stimuli used in the multiple aperture studies. (B) The aperture through which each of the stimuli were viewed. Here the apertures are visible, as in the control condition. In the experimental condition, the apertures were statically invisible. This example depicts one static frame of the walker viewed through the apertures.
The results of a series of experiments clearly suggest a fundamental difference between the visual integration of object and human motion across space. First, no subjects in the control conditions were able to recognize the partially occluded car, scissors, or human when they were presented statically. This result adds further support for the hypothesis that stimulus motion is necessary for the visual perception of human action. In the motion condition, a large split was found in subjects’ ability to identify the human and objects. When a walking human 1gure was viewed through apertures, all subjects readily and accurately identi1ed the walker. Typical responses to the walker stimulus included: ‘a walker,’ ‘a man walking,’ and ‘someone moving.’ On the other hand, when observers viewed moving objects (the scissors or cars) through the same apertures under the same conditions, they were unable to recognize these objects. Instead, subjects described these object displays as sets of line segments moving incoherently. Typical descriptions of the moving car and scissors stimuli included: ‘lines moving,’ ‘birds,’ ‘worm-like things that got longer,’ ‘undulating lines,’ and ‘a bunch of lines.’ Such descriptions suggest that subjects could not group motion information across the apertures—just as with the rotating square and translating diamond displays described in the previous section. Yet subjects were readily able to group and interpret motion signals when those signals were consistent with the interpretation of a walking human. This pattern of results suggests that action perception and object
aapc19.fm Page 389 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
perception may depend upon different motion integration mechanisms. This appears to hold true even when both are performed for the purpose of recognition. However, an alternative explanation of subjects’ inability to recognize the partially occluded car and scissors is simply that these 1gures were not recognizable. The walker, car, and scissors represent different classes of stimuli that vary signi1cantly along several dimensions. Did subjects recognize the walker because it was suf1ciently complex and descriptive while the car and scissors stimuli were not? To address this concern, we modi1ed the displays so as to facilitate motion integration. Occluded objects have two different types of surface boundaries: real boundaries and temporary boundaries created by the occluding surface. Accurate object recognition requires that the visual system rely on real boundaries and discount temporary boundaries (Kanizsa 1979). One of the ways that the visual system distinguishes real from temporary boundaries involves the use of occlusion cues. When depth cues suggest that contour terminators are the temporary result of another occluding surface, those terminators do not in2uence image interpretation (Shimojo, Silverman, and Nakayama 1989). On the other hand, in the absence of compelling occlusion cues, terminators play a de1ning role in image interpretation. In the previous experiment, no depth cues were present since the apertures were invisible. In the present experiment, we added occlusion cues to the multiple aperture display so that the terminators would be correctly classi1ed as temporary and subsequently discounted from the motion analysis. To this end, we simply increased the luminance of the area surrounding each aperture so that T-junctions would be created where the lines intersected the now visible apertures. This manipulation should eliminate terminator-based interpretations of the moving lines and thereby facilitate motion integration across space. When subjects viewed the walker, car, and scissors through these new visible apertures, they correctly recognized all three objects. Thus, the car and scissors displays were recognizable and therefore could have been interpreted in the same global manner as the walker. Does the type of displayed locomotion in2uence an observer’s ability to integrate motion cues across space? Or, are all human movements analyzed in the same manner? To answer these questions, we modi1ed the walker-behind-apertures stimulus described above so that the walker’s locomotion fell either within or outside the spatial and temporal parameters corresponding to realistic walking speeds (Barclay, Cutting, and Kozlowski 1978). Subjects were individually presented with the walking stick-1gure behind invisible apertures at one of six possible walking speeds. As before, they were simply requested to describe the display to a tape recorder. Subjects’ responses were categorized by a naïve scorer as either correct, global interpretations or local, disconnected descriptions. The results showed ceiling levels of recognition performance at those walking speeds falling within the range of normal walking. Performance dropped signi1cantly at display rates above and below this spatial–temporal range. Thus, non-normative actions may not be analyzed in the same manner as commonly performed actions. This possibility will be addressed again in subsequent sections. Is the integration of human motion signals over space always different from motion integration with non-human objects? Previous research suggests that spatial orientation may be an important factor in the perception of human action (e.g. Bertenthal and Pinto 1994; Dittrich 1993). To test whether stimulus orientation in2uences motion integration, we presented the same displays as either upright, upside down, or rotated by 90 degrees. All 1gures were viewed through the original set of invisible apertures. Once again, subjects were simply asked to verbally describe one of the nine stimulus displays (car, scissors, or car at 0, 90, or 180 degree orientations). The results indicate that recognition of the walker was strongly in2uenced by stimulus orientation. While all subjects correctly identi1ed the upright walker, only 30% recognized the horizontally oriented walker and only
389
aapc19.fm Page 390 Wednesday, December 5, 2001 10:06 AM
390
Common mechanisms in perception and action
10% recognized the inverted walker. Incorrect responses in the 90 and 180 degree walker conditions included such descriptions as: ‘intersecting lines,’ ‘birds 2ying,’ ‘two sets of lines making circular motion,’ and ‘little dotted lines.’ Correct responses included, ‘someone walking,’ ‘a person,’ ‘a guy walking,’ and ‘RuPaul.’ None of the subjects in the scissors conditions correctly identi1ed that 1gure in any orientation. The car stimulus was only correctly identi1ed once in its canonical orientation. As before, incorrect responses to the car and scissors stimuli involved various descriptions of independently moving line segments. These results are consistent with the orientation speci1city of human motion analyses. That is, only upright human movement appears to be analyzed differently from the motion of inanimate objects. Impossible human movements, such as those shown in our 90 and 180 degree orientation conditions or at improbable walking speeds, appear to be analyzed in the same spatially local manner as objects. These results suggest that the mere presence of the human form in motion is not suf1cient to trigger those neural mechanisms that may be dedicated to the analysis of human movement.
19.3 Motion integration across time: actions and objects Psychophysical researchers commonly use the phenomenon of apparent motion to investigate the temporal nature of motion processes. In classic demonstrations of apparent motion, two stationary dots are presented sequentially. Under appropriate spatio-temporal conditions, the two stationary dots are perceived as a single moving dot. While there are an in1nite number of possible paths connecting the two dots, observers almost always perceive motion along the shortest path (e.g. Burt and Sperling 1981). Researchers have concluded that an object’s identity does not in2uence the perception of its movement since observers perceive the shortest path of apparent motion even when that path requires a signi1cant shape deformation (e.g. Shepard 1984). If the perception of human movement differs from the perception of object movement, will the presentation of human movement in2uence the visual perception of apparent motion? When humans move, their limbs tend to follow curved rather than straight trajectories. Given the visual system’s shortest-path bias, will observers of human movement be more likely to perceive apparent motion paths that are consistent with the movement limitations of the human body or paths that traverse the shortest possible distance? This hypothesis was tested with stimuli consisting of
Fig. 19.5 A sample apparent-motion stimulus from Shiffrar and Freyd (1990). When these two photographs are shown sequentially, subjects perceive the hand moving through the woman’s head at short SOAs. As SOA increases, subjects increasingly report the perception of the woman’s hand moving around her head.
aapc19.fm Page 391 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
photographs of a human model in different positions created so that the biomechanically possible paths of motion con2icted with the shortest paths. For example, one stimulus, shown in Fig. 19.5, consisted of two photographs in which the 1rst displayed a standing woman with her right hand positioned on one side of her head while the second photograph showed this same hand positioned on the other side of her head. The shortest path connecting these two hand positions would require the hand to move straight through the head while a biomechanically plausible path would entail the hand moving around the head. Which path of motion do people see? To answer this question, we created many different picture pairs of a human model who oriented one of her limbs in two different positions. For each set of poses, the shortest path of human movement connecting the two limb positions required one of two possible violations of normal human movement. In one case, which we refer to as a violation of body solidity, a limb would have to pass through some other physically solid part of the model’s body. In the second case, involving a violation of the joint constraint, the shortest path of motion would require the breakage of one of the model’s joints. These two types of stimuli were randomly organized and presented to subjects in a tachistoscope. On every trial, participants simply viewed an alternating picture pair for as many presentation cycles as they liked and then described the path or paths of apparent motion that they perceived. Across trials, we varied the SOA or the amount time between the onset of one photograph and the onset of the next photograph. When participants viewed these alternating picture pairs, their perceived paths of apparent motion changed with the SOA. At short SOAs (less than approximately 200 ms), subjects consistently reported seeing the shortest, physically impossible paths of human movement. For example, under these conditions of rapid picture alternation, subjects viewing the picture pair shown in Fig. 19.5 reported clearly seeing the woman’s hand move straight through her head. Yet, with increasing SOAs, perceived motion paths changed and observers increasingly saw apparent-motion paths that are consistent with normal human movement (Shiffrar and Freyd 1990). So, in the case of Fig. 19.5, as the rate of alternation slowed, subjects increasingly reported that they saw the woman’s hand move around her head. In a second study, we found that when subjects viewed a different set of human model photographs created so that the shortest movement path was a biomechanically plausible path and longer movement paths were physically impossible (i.e. the reverse of that described above), observers always reported seeing the shortest path (Shiffrar and Freyd 1993). Thus, subjects do not simply report the perception of longer paths with longer presentation times. Instead, the perception of normal human movement, per se, becomes increasingly likely over extended temporal intervals. Does the perception of non-human objects in apparent motion also change with temporal display rates? To answer this important question, we designed apparent-motion displays consisting of pairs of photographs of inanimate control objects, such as clocks and erasers, positioned so that their locations and orientations replicated the positions and orientations of the limbs and torso of the human model used in the previously described human apparent-motion studies. Thus, these objects were positioned so that the shortest paths of apparent motion required the perception of some physically impossible motion such as a stapler passing through an eraser. As before, naïve subjects viewed these picture pairs in a tachistoscope and reported their perceived paths of apparent motion across variations in SOA. When viewing these photographs of inanimate objects, subjects consistently reported perceiving the shortest possible path of apparent motion across all SOAs (Shiffrar and Freyd 1990). That is, they perceived objects passing through one another at all display rates. There was no tendency to report the perception of physically possible events, such as one object moving
391
aapc19.fm Page 392 Wednesday, December 5, 2001 10:06 AM
392
Common mechanisms in perception and action
around another object, at slower rates of picture alternation. This pattern of results suggests two conclusions. First, during short temporal intervals, both objects and actions are interpreted as following the shortest path of apparent motion even when that path is physically impossible. This 1nding suggests that under some temporal conditions, human actions and moving objects may be similarly analyzed. On the other hand, when temporal display rates are extended, the perception of human action and moving objects differ, since observers perceive physically possible human movements but physically impossible object movements. These results support the hypothesis that human movement may be analyzed by processes that differ from those underlying the analysis of object movement. In the above experiments, contextually isolated human movements were examined. In the outside world, humans frequently move about objects. Does motion integration over time proceed in the same manner for human movements about the human body as it does for human movements about inanimate objects? If sensitivity to human movements extends to the perception of human movement relative to an object-1lled outside world, then observers might perceive paths of apparent motion that are consistent with normal human movements about inanimate objects. On the other hand, if our sensitivity to human motion re2ects the activity of an isolated, actor-centered system, then observers might only perceive paths of apparent motion consistent with the ways in which humans move relative to themselves. The above question was addressed with a near replication of the previously described apparentmotion experiments, again involving paired still photographs of a human model in different poses. In the control condition, the poses always depicted the model moving one limb about either side of some part of her body. In the experimental condition, each picture pair from the control condition was modi1ed by a replacement of the part of the body about which the limb moved with a similarly positioned, inanimate object having roughly the same size and orientation. In both experimental and control conditions, subjects simply reported the perceived path of apparent motion as the SOA varied. The results showed that nearly identical patterns of apparent motion were reported in both conditions. Speci1cally, at short SOAs, subjects perceived the human model’s displaced limb to pass through another body part as well as through an inanimate object. At long SOAs, the model’s limb now appeared to move around both objects and body parts. This pattern of results indicates that the tendency to see biomechanically consistent paths of apparent motion with increasing temporal duration is not limited to the movement of human limbs about the human body. Instead, our visual sensitivity to human movement appears to be general and incorporates how human bodies move with respect to inanimate objects (Heptulla-Chatterjee, Freyd, and Shiffrar 1996). Is the presentation of realistic images of the human body necessary for the perception of apparent human motion? In other words, do any particular physical cues to the human body, such as skin, eyes, hair, or body shape, trigger the processes underlying the visual analysis of human movement? To answer this question, we created a new set of stimuli depicting the global structure of the human body out of a non-human material. Stimuli were created by videotaping a wooden mannequin posed in approximately the same positions as the human model in our previous work (Shiffrar and Freyd 1990, 1993). In each picture pair, one of the mannequin’s limbs was positioned on either side of some part of its body. Thus, while the human body, per se, was absent from these stimuli, the global form of the human body was preserved. If the perception of human movement requires the presentation of stimuli with textural or facial cues to the human body, then this mannequin should appear to move in a manner that violates human movement constraints at slow display rates. When subjects viewed these simpli1ed, wooden renditions of the human body in apparent motion, their perceived paths of apparent motion changed with the temporal display rate. At long
aapc19.fm Page 393 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
SOAs, subjects still reported paths of apparent motion consistent with the human body even though a human body was not actually present. At short SOAs, they reported the perception of the shortest, physically impossible paths of apparent motion. Thus, under the conditions in which we tested, apparent-motion perception with images of wooden mannequins is indistinguishable from apparentmotion perception with realistic images of a human body. These results indicate that sensitivity to human motion can be evoked by form cues that are not inherently ‘animate’ or ‘biological’ in nature. In other words, as demonstrated by Graziano and Botvinick (this volume, Chapter 6), the cortical mechanisms responsible for body schema can be ‘fooled’ by volumetric shapes having roughly the same size and location as possible body limbs. What form cues are needed for the perception of human movement? Structure from motion studies using point-light walker displays suggest that pairs of corresponding limbs (i.e. two arms or two legs) are suf1cient for the detection of human locomotion within a mask (Pinto and Shiffrar 1999). Given such results, we wondered whether the presentation of a pair of body parts in isolation would be suf1cient for the perception of biomechanically plausible paths of apparent human movement. To that end, we presented subjects with partial body stimuli that depicted only one displaced limb and the occluding body part about which the limb was displaced. This manipulation maintained the curvature of the occluding surface as well as the position, orientation, and apparent solidity of the occluding and displaced body parts. However, these subsections eliminated the global hierarchy of limb orientations and positions of the human body. The purpose of this experiment was to determine whether a main effect of SOA would prevail in the absence of a whole human body. Once again the stimuli consisted of picture pairs displayed in an apparent-motion paradigm. As before, subjects were asked to describe their perceived paths of apparent motion across variations in SOA. Interestingly, the results of this experiment differed from those of the previous experiments in which subjects viewed whole human bodies. In the current experiment, subjects consistently reported the perception of the shortest, physically impossible paths of apparent motion. That is, they reported the perception of a limb passing through another limb or a torso at all SOAs. When considered together with the results of the previous apparent-motion studies, these results suggest that a hierarchy of limb position and orientation cues consistent with a complete human form, or possibly the upper or lower half of the human body, may be necessary for the integrated analysis of human movement (Heptulla-Chatterjee et al. 1996). This result is consistent with the 1ndings of Rizzolatti and his colleagues (1996; see also Gallese et al., this volume, Chapter 17) and Perrett and his colleagues (Jellema and Perrett, this volume, Chapter 18; Perrett et al. 1990) that mirror and STPa neurons in the monkey 1re when a monkey observes the movements of another monkey or a human. Both the monkey and human share the same hierarchy of limb positions and orientations. As such, both forms are suf1cient to drive these cells. In summary, the above experiments suggest several advances in our understanding of the visual analysis of human movement. First, the visual perception of human movement does differ in some fundamental ways from the visual perception of moving objects. The results of a series of multiple-aperture studies indicated that motion integration over space differs for human and objectbased stimuli. Under identical psychophysical conditions, the human visual system selects spatially global interpretations of human movement and spatially local interpretations of object movement. Apparent-motion studies were used to assay the processes underlying motion integration over time. The results of these studies demonstrated important differences between the integration of human movement and object movement over time. Speci1cally, perceived paths of apparent motion respect the physical limitations of human movement but not of object movement under those conditions
393
aapc19.fm Page 394 Wednesday, December 5, 2001 10:06 AM
394
Common mechanisms in perception and action
requiring integration over extended temporal intervals. When considered together, these results strongly support the hypothesis that the processes underlying human-movement perception differ from those underlying object-motion perception. The results of these studies also shed light on some of the cues that the visual system uses to discriminate between object and human movement. First, display orientation may determine whether human motion is interpreted as an action or an object. When human locomotion is presented upsidedown or sideways, object-like motion-integration processes appear to dominate perception. Thus far, it appears that the perception of human movement only differs from the perception of object movement when the gravitational constraints on human movement are respected. Second, the presence of textual cues, such as skin or hair, and facial cues, such as eyes or a mouth, to the human body are not necessary for the perception of human movement. A wooden mannequin and a pointlight walker (Thornton, Pinto, and Shiffrar 1999), both of which lack such information, can be seen to move in the same manner as a naturalistic human form. However, the presentation of a single limb moving relative to a stationary body part does not lead to the same visual movement analyses as those triggered by a whole body. In other words, at least under some conditions, the visual system treats an isolated limb as an object. Thus, the presence of some portion of the human form, in and of itself, does not appear to be suf1cient for human movement processing. Finally, timing appears to play a critical role in the visual system’s categorization of human and object movements. In multipleaperture experiments, recognition performance plummeted when participants observed locomotor displays in which the temporal parameters fell outside of the range of normal human walking. In apparent-motion experiments, participants reported the perception of physically impossible paths of human movement at speeded temporal intervals. Thus, when display timing is incompatible with normal human movement, the visual system appears to interpret the display as a non-human object rather than as a human action even when the display consists of realistic photographs of a human actor.
19.4 What defines an action? Why might the visual system use spatial orientation and movement timing to determine whether a stimulus should be analyzed as a human action or as a non-human object? To answer this question, it may be important to recall that human movement is the only movement that we can both produce and perceive. Might not the human visual system take advantage of the pool of human movement information available in the motor system to assist it in its analysis of the movements of other humans? If the motor system provides assistance during the visual analysis of human movement, then one might expect that motor system activity should be triggered during the visual perception of those human actions with which an observer has some motor experience (e.g. Decety et al. 1997; Prinz 1997; Viviani and Stucchi 1992). That is, motor-system activity during the perception of human movement may depend upon whether the observer is physically able to perform the observed action. To test this hypothesis, we conducted a brain-imaging study in which PET activity was recorded while subjects viewed two-frame apparent-motion sequences of human and object movement (Stevens, Fonlupt, Shiffrar, and Decety 2000). These displays replicated those used by Shiffrar and Freyd (1990, 1993). As before, there were two types of picture pairs. The human body picture pairs showed a human model in different positions such that the biomechanically possible paths of her movement con2icted with the shortest, physically impossible paths. The second set of picture pairs consisted of non-living objects positioned so that the perception of the shortest path of apparent
aapc19.fm Page 395 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
motion would require one solid object to pass through another solid object. When the human picture pairs were presented slowly (with SOAs of 400 ms or more), subjects perceived biomechanically possible paths of apparent human motion. Under these conditions, PET scans indicated signi1cant bilateral activity in the primary motor cortex and cerebellum. However, when these same picture pairs were presented more rapidly (with SOAs less than 300 ms), subjects then perceived impossible paths of human movement, and selective activity in the motor system was no longer found (Stevens et al. 2000). On the other hand, when the pictures of non-living objects were presented at either fast or slow SOAs, no selective motor system activation was indicated. Importantly, subjects in this experiment were never given instructions to imitate the observed actions, either during or after the experiment. Instead, subjects remained stationary and simply viewed two-frame apparent-motion sequences. Thus, selective motor-system activity was not associated with an overt or instructed preparation to act. When considered together, these results suggest that the visual perception of human movement may bene1t from disambiguating motor-system input as long as one is physically capable of performing the observed actions. Thus, we may understand the actions of others in terms of our own motor system (Viviani, this volume, Chapter 21).
19.4.1 Motor experience per se? What type of motor experience might be required for the visual perception of human movement? The above studies can not be used to answer this question since the impossible actions, which included movements such as a hand passing through a torso and a foot passing through a leg, could not have been performed by the observers for two different reasons. First, the observers had no personal experience performing these actions. Second, and obviously relatedly, the actions were physically impossible for any human to perform. This distinction is important because if the actions were physically possible human movements, then an observer might have some basic internal schema for those actions even if he or she had never actually performed them (Castiello et al., this volume, Chapter 16; Graziano and Botvinick, this volume, Chapter 6). For example, while I have never performed a back 2ip, I may have developed some internal representation of that action either from visual experience (such as watching gymnasts in competition) or from some innate body schema that includes information about the range of possible human movements (Berlucchi and Aglioti 1997). Does visual experience, motor experience, or some innate schema of possible bodily movements de1ne which movements the visual system analyzes as actions? A recent study by Brugger and his colleagues (Brugger et al. 2000a) suggests that the neural mechanisms underlying action perception may not require limb-speci1c motor experience. These studies consisted of a series of behavioral, imaging, and neurophysiological investigations of a woman born without legs or forearms. The results beautifully converged to convincingly suggest that body parts can be represented in sensory and motor cortical areas even when they have never been physically present. Furthermore, when presented with the same two-frame apparent-motion displays described above, this woman perceives SOA-dependent paths of apparent human limb movement that follow the same pattern found with observers born with arms and legs (Brugger et al. 2000b). Thus, this individual has functional neural representations of the movements of limbs that she has never had. Thus, motor experience, per se, does not appear to be necessary for the visual analysis of human movement. Consistent with this, studies of early development suggest that motor actions may be represented even though they have never been executed. Infants show a selective sensitivity to biomechanically
395
aapc19.fm Page 396 Wednesday, December 5, 2001 10:06 AM
396
Common mechanisms in perception and action
correct human gait before they can walk (Bertenthal 1993, 1996; Fox and McDaniel 1982). Interpretation of such 1ndings is as complex as it is intriguing. Very young infants exhibit a rhythmic alternation of their legs when they are supported upright (Thelen, Fisher, and Ridley-Johnson 1984). This spontaneous movement pattern suggests that some actions, such as walking, may be subserved by innate mechanisms or representations. These representations may also underlie infants’ visual sensitivity to human movement. Additional evidence for such a hypothesis can be derived from studies of imitation in neonates (Bekkering and Wohlschläger, this volume, Chapter 15). Meltzoff and Moore (1983) have demonstrated that newborn infants are capable of imitating the facial gestures of adult models. Imitation requires that infants map the seen adult gesture to their own (unseen) facial musculature. The presence of this capacity in neonates suggests that an innate body representation is accessible to visual processes. Thus, visual observation of human movement and innate body representations may be suf1cient for the visual analysis and perception of human movement. If so, this might serve as a critical connection in a bi-directional perception–action linkage.
19.5 Neural bases of the visual perception of human movement The brain-imaging results described above most likely re2ect only one component of the neural circuit responsible for the visual analysis of human movement. The above PET data suggest that the human primary motor cortex probably plays an important role in the visual interpretation of another person’s movements. This conclusion extends that provided by earlier magnetoencephalographic or MEG data (Hari et al. 1998). The primary motor cortex is reciprocally connected with the premotor cortex (Rizzolatti, Luppino, and Matelli 1998; Wise, Boussaoud, Johnson, and Caminiti 1997). An extensive series of single-cell recording studies suggests that mirror neurons in the ventral premotor cortex also make important contributions to the visual perception of primate movement (Gallese et al., this volume, Chapter 17). Like neurons in the primary motor cortex, mirror neurons respond both when a monkey performs a particular action and when that monkey observers another primate performing the same action. Since neurons in the ventral premotor cortex only contain representations of hands, arms, and faces, mirror neurons may be dedicated to the interpretation of manual and facial gestures (Gallese et al., this volume, Chapter 17). As such, they may play a pivotal role in the visual interpretation of arm and facial movements subserving communication. This conjecture is strongly supported by brain imaging data which demonstrate that Broca’s area, the human equivalent of the premotor cortex and normally considered to be a critical language area, is selectively activated during the observation of 1nger tapping (Iacoboni et al. 1999). The premotor cortex is indirectly connected with the anterior superior temporal polysensory area (Gallese et al., this volume, Chapter 17). Single-cell recordings in this area have repeatedly identi1ed cells that are selective for monkey and human bodies in motion (e.g. Bruce et al. 1981; Oram and Perrett 1994; Perrett et al. 1990). Importantly, these neurons appear to differ from premotor mirror neurons (Rizzolatti et al. 1996) because while STPa neurons selectively respond to the actions of others, they remain unresponsive to the monkey’s own actions (Hietanen and Perrett 1993). Thus, STPa neurons may play a role in helping us to avoid confusing our actions with the actions of others (Carey, Perrett, and Oram 1997) or may be dedicated to the interpretation of complex social behavior (Jellema and Perrett, this volume, Chapter 18) and even empathy (Brothers 1989). It is important to note, however, that we can not yet postulate with any degree of certainty how these different areas actually contribute to the visual analysis of human movement. The fundamental
aapc19.fm Page 397 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
stumbling block is that each laboratory uses a different set of tasks and a different class of stimuli to study a different area. Obviously, the next necessary step is to examine the selectivity of each of the above cortical areas to the stimuli and tasks used in the examination of the other areas in this circuit.
19.6 Conclusions In conclusion, human movement differs from all other movements since it is the only movement that human observers both produce and perceive. In a series of experiments, we found that the visual analysis of human movement can be fundamentally different from the visual analysis of moving, non-living objects. Behavioral and brain-imaging studies suggest that this difference may re2ect innate body schemas of and/or visual–motor experience with normal human movement. Thus, the perception of human movement appears to constrained by ‘knowledge’ of human motor limitations (e.g. Prinz 1997; Rizzolatti et al. 1996; Shiffrar 1994; Viviani, this volume, Chapter 21). The visual system appears to take advantage of this perception–action linkage to de1ne which visual movement signals result from human action and which result from moving objects. Possibly as a consequence of this linkage, action perception may not be as straightforward as initially thought. Some of these complexities are discussed below. The presentation of human movement, in and of itself, may not be suf1cient to trigger the mechanisms underlying the visual perception of human movement. Instead, as indicated by the studies discussed here, a human action must be consistent with an observer’s internal body schema of possible actions before action-speci1c perception is demonstrated. Requirements for such a model match include, but are certainly not limited to, the following factors. First, display orientation is critical. If an action is presented upside-down, the action’s motion signals appear to be interpreted by the ‘object perception’ system. Second, if the temporal characteristics of a human action are incompatible with human movement dynamics, the action is interpreted as an object. This holds true for motion integration across both space and time. Moreover, in an apparent-motion paradigm, displays containing only two portions of a human body are not suf1cient for the perception of normative human movement. Thus, the presentation of an action outside of the context of a human body may not always be de1ned as an action by the visual system. Finally, imaging data suggest that the motor system may be involved in the visual analysis of possible, but not impossible, human movements. Taken together, these results suggest that only those movements that are consistent with an observer’s internal model of possible human body movements are analyzed by the mechanisms that underlie action perception. Thus, in the visual perception of human action, all actions are not treated equally.
Acknowledgment This research was supported by NIH grant EY12300.
References Adelson, E.H. and Movshon, J.A. (1982). Phenomenal coherence of moving visual patterns. Nature, 300, 523–525. Barclay, C., Cutting, J., and Kozlowski, L. (1978). Temporal and spatial factors in gait perception that in2uence gender recognition. Perception and Psychophysics, 23, 145–152.
397
aapc19.fm Page 398 Wednesday, December 5, 2001 10:06 AM
398
Common mechanisms in perception and action
Berlucchi, G. and Aglioti, S. (1997). The body in the brain: neural bases of corporal awareness. Trends in Neuroscience, 20, 560–564. Bertenthal, B.I. (1993). Perception of biomechanical motions by infants: Intrinsic image and knowledge-based constraints. In C. Granrud (Ed.), Carnegie symposium on cognition: Visual perception and cognition in infancy, pp. 175–214. Hillsdale, NJ: Erlbaum. Bertenthal, B.I. (1996). Origins and early development of perception, action, and representation. Annual Review of Psychology, 47, 431–459. Bertenthal, B.I. and Pinto, J. (1994). Global processing of biological motions. Psychological Science, 5, 221–225. Bonda, E., Petrides, M., Ostry, D., and Evans, A. (1996). Speci1c involvement of human parietal systems and the amygdala in the perception of biological motion. Journal of Neuroscience, 16, 3737–3744. Bridgeman, B. (1992). Conscious versus unconscious processes: The case of vision. Theory and Psychology, 2, 73–88. Brothers, L. (1989). A biological perspective on empathy. American Journal of Psychiatry, 146, 10–19. Bruce, C., Desimone, R., and Gross, C.G. (1981). Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. Journal of Neurophysiology, 46, 369–384. Brugger, P., Kollias, S., Muri, R., Crelier, G., and Hepp-Raymond, M.C. (2000a). Beyond re-membering: Phantom sensations of congenitally absent limbs. Proceedings of the National Academy of Sciences, 97, 6167–6172. Brugger, P., Regard, M., and Shiffrar, M. (2000b). Hand movement observation in a person born without hands: Is body scheme innate? Meeting of the Swiss Neurological Society, London, England, September 2000. Burt, P. and Sperling, G. (1981). Time, distance, and feature trade-offs in visual apparent motion. Psychological Review, 88, 171–195. Carey, D., Perrett, D., and Oram, M. (1997). Recognizing, understanding, and reproducing action. In F. Boller and J. Grafman (Eds.), Handbook of neuropsychology. Amsterdam: Elsevier Science. Decety, J., Grèzes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E., Grassi, F., and Fazio, F. (1997). Brain activity during observation of actions: In2uence of action content and subject’s strategy. Brain, 120, 1763–1777. Dittrich, W.H. (1993). Action categories and the perception of biological motion. Perception, 22, 15–22. Fox, R. and McDaniel, C. (1982). The perception of biological motion by human infants. Science, 218, 486– 487. Goodale, M.A. and Milner, A.D. (1992). Separate visual pathways for perception and action. Trends in Neuroscience, 15, 20–25. Hari, R., Forss, N., Avikainen, S., Kirveskari, E., Salenius, S., and Rizzolatti, G. (1998). Activation of human primary-motor cortex during action observation: A neuromagnetic study. Proceedings of the National Academy of Sciences, 95, 15061–15065. Heptulla-Chatterjee, S., Freyd, J., and Shiffrar, M. (1996). Con1gural processing in the perception of apparent biological motion. Journal of Experimental Psychology: Human Perception and Performance, 22, 916–929. Hietanen, J. and Perrett, D. (1993). Motion sensitive cells in the macaque superior temporal polysensory area I. Lack of response to the sight of the animal’s own limb movement. Experimental Brain Research, 93, 117–128. Hildreth, E. (1984). The measurement of visual motion. Cambridge, MA: MIT Press. Hubel, D. and Wiesel, T. (1968). Receptive 1elds and functional architecture of the monkey striate cortex. Journal of Physiology, 195, 215–243. Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C., and Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526–2528. Kanizsa, G. (1979). Organization in vision: Essays on Gestalt perception. New York: Praeger. Kourtzi, Z. and Shiffrar, M. (1997). One-shot view invariance in a moving world. Psychological Science, 8, 461–466. Kourtzi, Z. and Shiffrar, M. (1999a). Dynamic representations of human body movement. Perception, 28, 49–62. Kourtzi, Z. and Shiffrar, M. (1999b). The visual representation of three-dimensional, rotating objects. Acta Psychologica, 102, 265–292. Lorençeau, J. and Shiffrar, M. (1992). The role of terminators in motion integration across contours. Vision Research, 32, 263–273.
aapc19.fm Page 399 Wednesday, December 5, 2001 10:06 AM
The visual analysis of bodily motion
Meltzoff, A.N. and Moore, M.K. (1983). Newborn infants imitate adult facial gestures. Child Development, 54, 702–709. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action. Oxford: Oxford University Press. Movshon, J.A., Thompson, I.D., and Tolhurst, D.J. (1978). Receptive 1eld organization of complex cells in the cat’s striate cortex. Journal of Physiology, 283, 79–99. Oram, M. and Perrett, D. (1994). Responses of anterior superior temporal polysensory (STPa) neurons to ‘biological motion’ stimuli. Journal of Cognitive Neuroscience, 6, 99–116. Perrett, D., Harries, M., Mistlin, A.J., and Chitty, A.J. (1990). Three stages in the classi1cation of body movements by visual neurons. In H.B. Barlow, C. Blakemore, and M. Weston-Smith (Eds.), Images and understanding, pp. 94–107. Cambridge, England: Cambridge University Press. Pinto, J. and Shiffrar, M. (1999). Speci1city in the perception of biological motion displays. Acta Psychologica, 102, 293–318. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131–141. Rizzolatti, G., Luppino, G., and Matelli, M. (1998). The organization of the cortical motor system: New concepts. Electroencephalography and Clinical Neurophysiology, 106, 283–296. Shepard, R.N. (1984). Ecological constraints on internal representation: Resonant kinematics of perceiving, imagining, thinking, and dreaming. Psychological Review, 91, 417–447. Shiffrar, M. (1994). When what meets where. Current Directions in Psychological Science, 3, 96–100. Shiffrar, M. (2001). Movement and event perception. In B. Goldstein (Ed.), The Blackwell handbook of perception, pp. 237–272. Oxford: Blackwell Publishers. Shiffrar, M. and Freyd, J.J. (1990). Apparent motion of the human body. Psychological Science, 1, 257–264. Shiffrar, M. and Freyd, J.J. (1993). Timing and apparent-motion path choice with human body photographs. Psychological Science, 4, 379–384. Shiffrar, M. and Pavel, M. (1991). Percepts of rigid motion within and across apertures. Journal of Experimental Psychology: Human Perception and Performance, 17, 749–761. Shiffrar, M., Lichtey, L., and Heptulla-Chatterjee, S. (1997). The perception of biological motion across apertures. Perception and Psychophysics, 59, 51–59. Shimojo, S., Silverman, G., and Nakayama, K. (1989). Occlusion and the solution to the aperture problem for motion. Vision Research, 29, 619–626. Stevens, J., Fonlupt, P., Shiffrar, M., and Decety, J. (2000). New aspects of motion perception: Selective neural encoding for apparent human movements. Neuroreport, 11, 109–115. Thelen, E., Fisher, D.M., and Ridley-Johnson, R. (1984). The relationship between physical growth and a newborn re2ex. Infant Behavior and Development, 7, 479–493. Thornton, I., Pinto, J., and Shiffrar, M. (1999). The visual perception of human locomotion. Cognitive Neuropsychology, 15, 535–552. Viviani, P. and Stucchi, N. (1992). Biological movements look constant: Evidence of motor–perceptual interactions. Journal of Experimental Psychology: Human Perception and Performance, 18, 603–623. Viviani, P., Baud-Bovy, G., and Redol1, M. (1997). Perceiving and tracking kinesthetic stimuli: Further evidence of motor–perceptual interactions. Journal of Experimental Psychology: Human Perception and Performance, 23, 1232–1252. Wallach, H. (1935). Über visuell wahrgenommene Bewegungsrichtung. Psychologische Forschung, 20, 325–380. Wise, S.P., Boussaoud, D., Johnson, P., and Caminiti, R. (1997). Premotor and parietal cortex: Corticocortical connectivity and combinatorial computations. Annual Review of Neuroscience, 20, 25–42.
399
aapc19.fm Page 400 Wednesday, December 5, 2001 10:06 AM
This page intentionally left blank
aapc20.fm Page 401 Wednesday, December 5, 2001 10:07 AM
IV Content-specific interactions between perception and action
aapc20.fm Page 402 Wednesday, December 5, 2001 10:07 AM
This page intentionally left blank
aapc20.fm Page 403 Wednesday, December 5, 2001 10:07 AM
20 Content-specific interactions between perception and action Introduction to Section IV Martin Eimer
Of the numerous factors involved in the control of perceptuo-motor links, perhaps the most widely studied is the similarity (or compatibility) between speci1c stimuli and responses. S–R compatibility has important effects on the quality of behavioural performance. Some stimulus–response mappings are more compatible than other mappings, and variations in S–R compatibility affect both the speed and the accuracy of responses. Such compatibility effects are content-speci1c, because they do not re2ect general processing limitations in perceptuo-motor interactions, but are determined by particular features of stimuli and responses, and by speci1c stimulus–response mappings. Recent research has provided numerous insights into the mechanisms underlying S–R compatibility effects. Detailed hypotheses have been put forward about ways in which stimulus and response properties can be compatible or incompatible, and about the nature of the perceptuo-motor links that are affected by S–R compatibility (see Kornblum and Stevens, this volume, Chapter 2). Two important issues that have recently emerged from this research are investigated and discussed in the contributions assembled in this section. First, one idea central to most current explanations of S–R compatibility (the hypothesis that automatic response activation is critically involved in compatibility effects) has been seriously questioned. Second, there is new evidence not only that content-speci1c perceptuo-motor interactions in2uence response-related stages, but that motor processes can also affect perception, thus suggesting that the distinction between perception and action may be more illusory than real.
20.1 S–R compatibility, dual-route models, and automatic response activation Most current accounts of S–R compatibility effects are dual-route models. Such models assume that response activation can occur automatically (direct route) as well as via controlled S–R translation processes (indirect route). When stimulus and response dimensions are similar, stimuli will automatically activate a compatible response via the direct route, regardless of whether the compatible dimension is task-relevant or not. Automatic response activation processes thus facilitate responding when the task-relevant S–R mapping is compatible and produce interference when this mapping is incompatible. Response selection occurs by way of the indirect route, through retrieval or generation of S–R translation rules. Dual route models provide a coherent account of phenomena such as
aapc20.fm Page 404 Wednesday, December 5, 2001 10:07 AM
404
Common mechanisms in perception and action
the Simon effect. Here, an automatic activation of spatially compatible responses is assumed to be responsible for the fact that responses are faster and more accurate when stimulus and response positions correspond than when they are non-corresponding, even though spatial location is entirely irrelevant for response selection. In dual-route models, response activation processes mediated by the direct route are seen as automatic, in the sense that these processes are entirely stimulus driven, unavoidable, and not affected by strategic control. This notion of ‘automaticity’ has recently come under attack. The direct route is assumed to be based on long-term, hard-wired S–R associations that are either innate or the result of lifelong learning. Recent results (discussed by Proctor and Vu, Chapter 22) suggest that these associations are not nearly as immutable as previously thought, because they can be neutralized or even reversed by short-term learning. Activation of the direct route is also assumed to be entirely independent from strategic adjustments made as a consequence of speci1c task demands—such adjustments are the domain of controlled processes mediated by the indirect route. Proctor and Vu present experimental evidence demonstrating that this simple view is not correct. Spatial compatibility effects can be eliminated, magni1ed, or reversed almost at will under conditions where locationrelevant and irrelevant trials are mixed. This demonstrates that task set and task demands play a major role in spatial S–R compatibility effects, and implies that such effects cannot just be accounted for in terms of an ‘automatic’ activation of compatible responses. Activation of the direct route is seen to be exclusively stimulus-driven, and thus independent from the situational context in which a stimulus is encountered. Valle-Inclán, Hackley, and de Labra (Chapter 23) provide behavioural and electrophysiological 1ndings that challenge this assumption. They demonstrate that effects of irrelevant spatial S–R compatibility depend on the nature of the preceding trials: Simon effects disappear when an immediately preceding trial is incompatible. In terms of a dual-route model, this suggests that the direct route is inhibited whenever a previous trial requires a spatially incompatible response. This strategic inhibition of S–R links is further explored by Ridderinkhof (Chapter 24), who employs distributional analyses of performance measures to substantiate the claim that inhibitory control processes play a major role in S–R compatibility situations. His analyses suggest that selective inhibition can be 2exibly adjusted as a function of response requirements on previous trials. Again, such short-term adjustments of inhibitory control seem inconsistent with the ‘automatic response activation’ account.
20.2 Bidirectional perceptuo-motor interactions and shared event-codes While there is abundant evidence for content-speci1c perceptuo-motor effects on response selection and execution, interactions between perception and action are not always unidirectional. Response processes can also have substantial effects on perception. Examples of such interactions are discussed by Viviani, who presents evidence demonstrating that motor knowledge increments perceptual information and thus improves the performance in perceptual tasks. Based on this evidence, Viviani argues that the perception of dynamic events is mediated by perceptuo-motor interactions between perceptual information and implicitly represented motor competence. Along similar lines, recent experiments have demonstrated that the detection and identi1cation of objects that share properties with a currently prepared action are facilitated or impaired in a content-speci1c way. Müsseler and Wühr (Chapter 25) show that stimulus identi1cation performance is substantially
aapc20.fm Page 405 Wednesday, December 5, 2001 10:07 AM
Content-specific interactions between perception and action
impaired under conditions where stimuli that have to be identi1ed share features with an ongoing motor task. Further evidence for content-speci1c cross-talk between perception and action is presented by Stoet and Hommel (Chapter 26). Their experiments investigate the impact of feature overlap between visual–perceptual and response processes, and demonstrate that responses are strongly affected when they share properties with visual objects that are currently perceived or memorized. Such links between motor planning and visual encoding, and between visual processing and motor responses illustrate direct content-speci1c interactions between perception and action. According to the event-coding hypothesis discussed by Müsseler and Wühr and by Stoet and Hommel, such interactions do not merely re2ect close links between functionally and anatomically distinct perceptual and motor processes, but suggest that sensory and motor representations share common resources. It is argued that perceptual and response-related processes access identical codes within a common representational domain. In this view, content-speci1c interactions between perception and action are a result of the fact that perceptual and motor processes operate on a common level of central ‘event codes’.
405
aapc21.fm Page 406 Wednesday, December 5, 2001 10:07 AM
21 Motor competence in the perception of dynamic events: a tutorial Paolo Viviani Abstract. It is argued that tacit knowledge concerning the characteristic properties and constraints of bodily movements is taken into account by the process that turns sensory inputs relative to dynamical events into perceptual representations. First, to highlight the connection between the theme of the chapter and some older ideas, I provide an overview of the classical literature on the interaction between motoric and sensory information. The experimental support for the notion of motor–perceptual interaction is then presented in the main body of the chapter, which reviews recent experimental studies in visual perception and visuomotor coordination. The 1nal section addresses a number of conceptual issues and offers a functional framework for interpreting motor–perceptual interactions.
The general point I would like to press in this chapter is that the perception of dynamic events arises from the interplay between the sensory data and the principles and constraints embodied in our motor competence. More speci1cally, I wish to argue that the way the human body moves provides the default frame of reference for representing dynamic events. The line of reasoning leading to this claim draws on a time-honored tradition. Thus, my 1rst concern will be to provide an overview of the most relevant sources of inspiration (section 21.1). Inasmuch as they relate to the point of view defended here, this section also touches upon some more general issues in visual perception theory. Section 21.2 reviews a number of selected 1ndings that are at least compatible with my claim. Past and current work by myself and my collaborators is given a somewhat privileged status. However, I also draw liberally—if not without biases—from the literature. The third and last section is more speculative. There, I address two questions, namely (1) How does the production–perception coupling come into being? and (2) How is it embodied?
21.1 Issues in the light of tradition As casual users of the senses, we take for granted that they are naturally suited for the purpose of providing an effective representation of the world. Indeed, not before some knowledge of the actual working of the sense organs was gained, did the awareness emerge that our ability to represent the world through these organs is actually problematic. Vision provides a particularly clear illustration of the problem. Perceived objects have width, height, and depth. Yet, as long as we construe the retina as some sort of projection screen, it is not obvious how the three dimensions can be recovered from two-dimensional images. Likewise, the size constancy of objects across varying viewing distances, or the stability of the visual scene when we move the eyes is incomprehensible within the projection-screen metaphor. Some perceptual puzzles dissolved when it was realized that certain functions are subserved by dedicated physiological mechanisms. Thus, for instance, the discovery of binocular disparity by
aapc21.fm Page 407 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
Wheatstone in 1838, and the subsequent identi1cation of disparity-detecting cells in the visual cortex have dispelled much of the mystery about depth perception. Other puzzling phenomena, however, still resist direct reduction. The recurring strategy for dealing with these phenomena has been to envisage the contribution of cognitive preconceptions and sources of information other than the senses. We admit, for example, that one factor responsible for the size constancy effect is our 1rm belief that humans, trees, automobiles and so on only come in (roughly) one size. In the particular case of dynamic events, the most in2uential proposal along this strategic line is that the structuring of perception draws from information arising from actual or intended motor action. Many speci1c ideas, collectively known as the motor theory of perception, originate from this intuition. One of the earliest quali1cations of the motor theory of perception is due to Berkeley (1709) who, long before Wheatstone’s discovery, suggested that depth perception emerges through the interaction between size judgments and the feeling of strain associated with accommodation and convergence. More than a century later, the problem of how spatial attributes such as extent and relative positions could arise from pure sensations—which were believed to have only the attributes of intensity and quality—was still addressed in much the same fashion (cf. Scheerer 1984). A similar explanation was proposed to account for the connection that we manage to establish between muscular exertion (which is experienced directly) and the abstract notion of force by which objects are set in motion. In the 1rst half of the nineteenth century it was widely believed that the connection was mediated by the re2exive consciousness of the overt effects of muscle contractions (Scheerer 1987). Early motor theories of perception emphasized mostly what we would call today the feedback route, carrying to the brain afferent signals arising from the actions associated with the act of perceiving. The accepted doctrine was that the spatial organization of visual sensations results from integrating visual inputs and muscular feelings originating from some ‘muscle sense’ (Lotze 1852). However, when anatomy provided the criteria for distinguishing afferent and efferent pathways from and to the muscles (Bell 1811), the idea began to emerge (e.g. Claparède 1902) that centrifugal information could also contribute to the genesis of perceptions, thus paving the way for a second generation of motor theories.
21.1.1 Efferent (out2ow) theory The discovery that sensory and motor cortical structures are deeply intertwined, and the analysis of neurological symptoms (such as the ‘ghost limb’) that could not be understood on the basis of afferent information alone, laid the ground for the new conceptual model. Yet the main thrust towards establishing a coherent efferent theory (one that still has much currency) came from Helmholtz (1867/ 1962), who made the bold proposal that the main extra-visual contribution to the perception of spatial qualities is the motor command itself, rather than the reafferences elicited by its execution. The proposal was originally formulated to explain the perceptual consequences of moving the eyes, possibly because of the peculiar characteristics of the eye control system. The oculomotor plant behaves as a displacement generator (see later) because the forces that it generates act against a very small, constant inertia. Therefore, a reliable correspondence can be established between motor commands and the resulting displacements of the line of sight. Even before the movement is executed, the commands provide early directional information that can be processed in conjunction with current and expected sensory afferences. However, it is now clear that the conceptual import of the out2ow theory (as Helmholtz’s proposal came to be known) extends beyond the oculomotor domain. Granting the nervous system the capacity of comparing actual changes in the sensory
407
aapc21.fm Page 408 Wednesday, December 5, 2001 10:07 AM
408
Common mechanisms in perception and action
in2ow with expectations based on out2owing commands invited a far deeper intertwining of motor, perceptual, and cognitive processes than previous theories based on ‘muscular feelings’ and ‘muscle sense’. Helmholtz’s analysis of the ‘moon illusion’ provides a striking illustration of this more intimate connection (cf. Grüsser 1986). Without a 1xation point, the 2ight of the clouds would normally elicit optokinetic nystagmus. Nystagmus does not actually occur, because the relevant motor commands are blocked (somewhere downstream from their point of origin) by the will to 1xate the moon. Yet a copy of the commands is still available to the brain, generating the wrong expectation that the entire visual 1eld (including the moon) should partake of a common fate. A con2ict would then arise between the expectations elicited by the motor out2ow and the sensory in2ow, which (veridically) signals that the moon moves with respect to the clouds. The apparent motion of the moon in the direction opposite to that of the clouds is explained by Helmholtz as a cunning trick by the perceptual system to allay this con2ict. Helmholtz’s account of the moon illusion foreshadows the general argument invoked much later to explain such diverse phenomena as size and weight illusions, postural reactions to vection, and motion sickness, all of which are now viewed as occasional, unwanted (yet illuminating) sideeffects of the role of motor expectations in the generation of a stable, coherent representation of the world. The out2ow theory, and its later versions (e.g. Wundt’s (1893) Miterregung, von Holst and Mittelstaedt’s (1950) Corollary discharge), gained widespread acceptance even beyond its intended scope of application. As early as 1878, Sechenov (1878) attempted to reconcile the apparent cleavage between the continuous nature of movements and the categorical nature of perception, by conjecturing that movements with an intrinsic rhythm are instrumental for segmenting the continuous 2ow of sensory information into discrete perceptual units. The extent of the consensus is also well exempli1ed by William James’ celebrated doctrine that certain motor behaviors are the antecedents of the associated feelings rather then their effects (‘We are afraid because we tremble’ (James 1906)). Yet, long before behavioral (e.g. Mateef 1978; Matin 1986; Zinchenko and Vergiles 1972) and neurophysiological (e.g. Duhamel, Colby, and Goldberg 1992; Mays and Sparks 1981) experiments 1nally provided unequivocal support for Helmholtz’s intuition, an even bolder idea began to emerge.
21.1.2 Motor–perceptual interactions without movement One of the 1rst statements of the new idea appears in Stricker’s Studien über die Bewegungsvorstellungen (1882), where one 1nds the following analysis of the genesis of motor images: ‘When I imagine the 2ight of the clouds, I feel in the ocular muscles the same sensations as if I were actually following them; if I try to inhibit this muscular sensation, the image of the moving cloud stops immediately: the cloud appears still’ (quoted by Soury 1892, p. 233). Here, introspection is suggesting a two-way interaction: not only are visual images able to generate motor sensations, but, more importantly, even the mere intention to control the latter modi1es the former. Shortly afterwards, Mach strikes exactly the same chord: ‘If we [ . . . ] attempt to glance quickly to the right [ . . . ], the mere will [to do so] imparts to the images at certain points of the retina a larger “rightward value”, as we may term it for brevity’ (Mach 1885/1897, p. 59, Mach’s emphasis). The notion that not only realized action, but also intended and potential action may play a role in the structuring of perception animated the ongoing epistemological debate on the nature of space perception. An example of this interest—and, at the same time, a good illustration of the Zeitgeist— is the analysis of the concept of ‘representative space’ offered by Poincaré in La science et l’hypothèse
aapc21.fm Page 409 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
(1905). Breaking away from the Kantian doctrine, Poincaré denies that the perception of world’s events is made possible by the a priori availability of a uniform, isotropic frame of reference: ‘To us, it is as impossible to represent the outside objects in the geometric space as it would be to a painter to depict three-dimensional objects on a surface’ (p. 75). In his view, representative (perceptual) space is constructed by the subject through the interaction of visual, tactile, and kinesthetic sensory qualia, and inherits the intrinsic structure of the corresponding nervous mechanisms. In particular, Poincaré posits a ‘motor space’ the properties of which derive directly from the permissible muscle synergies. Thus, motor space ‘may have as many dimension as we have muscles’ (p. 73), and, ultimately, to localize an object in space ‘simply means that we represent to ourselves the movements that we have to make to reach this object’ (p. 75). In a similar vein, Mach (1885) had argued that the perceived symmetry of the visual space is but a re2ection of the symmetry of the oculomotor system. Direct evidence for such an abstract interaction was no more forthcoming than that in favor of the out2ow theory, also because neither Gestalt psychology nor behaviorism, which dominated the scene in the 1rst half of the last century, held the underlying concept of representation in much esteem. Leaving aside the case of apparent motion, to which I will return later, the 1rst demonstration that perception takes concepts and schemata related to movement into account was offered by Michotte (1946), who showed that simple dynamic con1gurations (e.g. two simple shapes moving towards a head-on or a near-miss collision) elicit a richer phenomenal experience than justi1ed by the actual stimuli. The moving dots, observers report, tell a ‘story’ in which dynamic concepts such as inertia, attraction, repulsion, or even fear play a signi1cant role. Shortly afterward, Johansson (1950), investigating the perception of point-lights moving against a uniform background, proposed the 1rst principled analysis of the underlying mechanisms. He argued that the phenomenal experience elicited by such simple stimuli results from applying to the sensory data a hierarchy of representational schemata abstracted from the motion of real objects, and selecting in any given instance the simplest schema that can accommodate the sensory evidence. As far as possible, dynamic con1gurations are perceived as translations or rotations of a rigid object. If such an interpretation is not viable, projective schemata are envisaged next. If they too fail, the rigidity condition is relaxed and the con1guration is perceived as an elastic body undergoing a deformation. In his insistence on the behavior of rigid objects as the source of inspiration for representational schemata, Johansson was adhering to a viewpoint, dating again from the previous century, which emphasizes the model role of mechanical motions. More than 1fty years before, Hertz (1894/1956) had written that ‘We form for ourselves images or symbols of external objects; and the form which we give them is such that the necessary consequents of the images in thought are always the images of the necessary consequents in nature of the things pictured’ (p. 1). Hertz’s intuition continued to be a source of inspiration even after Johansson’s work. In particular, it clearly motivated the 1rst demonstrations of the phenomenon of representational momentum (Freyd and Finke 1984, 1985; Finke and Freyd 1985). The demonstration involves presenting a temporal sequence of still pictures describing the successive positions of a rotating object. At the end of the sequence, the subject selects among several alternatives the picture that matches best the last one in the sequence. It was found that the picture selected more often is not the last one, but one that is rotated further than the last, the advance being proportional to the velocity of rotation suggested by rate of presentation of the sequence. Freyd and Finke interpreted the results by assuming that the memory trace is biased by a representational scheme granting the pictures with inertia. Just as real bodies do not generally stop cold in mid-2ight, so the image that best preserved the identity of the object was the one depicting a later stage of its rotation.
409
aapc21.fm Page 410 Wednesday, December 5, 2001 10:07 AM
410
Common mechanisms in perception and action
Although Johansson considered rigid motions as the default representational model, neither Hertz in his original remark, nor Freyd in her later assessment of representational momentum (Freyd 1987) mentioned inanimate, rigid objects as being special. Yet Hertz’s remark was still quoted by Shepard (1984) to support the contention that rigid motions have a somewhat privileged status as a model for perception. Things were already changing, however.
21.1.3 Biological motion The turning point came when, in later work, the same Johansson demonstrated that movements of living organisms, particularly man, have a special status as far as the perception of dynamic events is concerned (for reviews, see Johansson 1975; Johansson, von Hofsten, and Jansson 1980). In a classic paper (Johansson 1973), he demonstrated that schemata relative to biological motion, such as walking and dancing, are surprisingly salient, and can be triggered even by sketchy visual cues. It is almost impossible to identify an arbitrary body posture if the only thing you see are a dozen or so point lights attached to the major body joints. However, as the body begins to move, a surprisingly realistic image pops out within 200 ms (Johansson 1976). The information afforded by these displays is so rich and detailed to permit, for instance, the identi1cation of the gender of the actor (Kozlowski and Cutting 1977). Moreover, the percept is robust, for it survives the masking effect of unrelated point targets (Cutting, Moore, and Morrison 1988), and it is generated even under apparent motion conditions (Thornton, Pinto, and Shiffrar 1998). The effect is not limited to whole-body movements which, when projected onto a plane, possess projective invariants. In fact, even facial expressions, which involve elastic transformations, can be perceived from the movements of a few point-lights (Bassili 1978). Particularly relevant for the claim defended in this chapter is the fact that not only do the displays permit the identi1cation of the walking pattern of close acquaintances (Cutting and Kozlowski 1977), but the identi1cation of our own walking pattern is actually even more accurate than that of others (Beardworth and Bukner 1981). Thus, in addition to a general scheme for walking, we seem to take stock of more speci1c motor information concerning our own idiosyncratic way of moving. Moreover, because there are many more opportunities of watching our acquaintances than ourselves, the relevant motor competence is not likely to be acquired through visual learning, leaving implicit knowledge as the only alternative. In fact, biological movements begin to be perceived as special so early in life (Bertenthal, Prof1tt, and Cutting 1984; Bertenthal, Prof1tt, Spetner, and Thomas 1985; Bertenthal, Prof1tt, and Kramer 1987), that this knowledge may well be innate. It is debated whether the synthesis of a realistic percept from point-light displays is achieved via a bottom-up hierarchy of processing levels incorporating successively larger elements of local analysis (Cutting 1981; Hoffman and Flinchbaugh 1982; Mather, Radford, and West 1992), or via a global strategy that takes into account the entire display (Bertenthal and Pinto 1994; Shiffrar, Lichtey, and Heptulla Chatterjee 1997). Recent evidence—for instance, the fact that upside-down displays do not always give rise to clear percepts (Dittrich, 1993; Sumi, 1984)—proved damaging to the bottom-up hypothesis, but the question is not adjudged yet. It should be stressed that Johansson himself interpreted his results on biological motion within the same framework he had laid out in earlier studies, with no emphasis on motor knowledge. Actually, in addressing the question of whether the perceptual grouping of the point lights into a human Gestalt was triggered by some holistic recognition of the walking pattern, he opined that ‘the grouping is [instead] determined by general perceptual principles in visual motion perception’ (Johansson 1973,
aapc21.fm Page 411 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
p. 204) and that ‘the principles found in studies of perception of mechanical motions should also be revealed in the perceptual outcome from the complicated systems of biological motion’ (p. 204). These principles, still couched in the vector language developed for the analysis of object motion, imply the decomposition of the global movement into translational and rotational components around the joints (Johansson 1976). In models of perception based on these principles (e.g. Hoffman and Flinchbaugh 1982), point-lights are interpreted as endpoints of a stick, and limbs are interpreted as sticks. The fact that recognition accuracy is only weakly affected when the identi1cation of the stick endpoints is made impossible by placing the point lights in the middle of the joints (Dittrich 1993) poses a serious problem for these models, and so does the perception of upside-down patterns mentioned above. More generally, these results cast some doubt on the validity of the vector decomposition suggested by Johansson. This should not detract, however, from the importance of his demonstration, which ushered in a new domain of perceptual research. In particular, the accuracy of the reports concerning biological motion refueled the debate on the categorical nature of perception. The notion of category is intimately associated with the epistemological tradition which, by denying the possibility of acquiring knowledge through inductive extrapolation, emphasizes instead innate conceptual structures (cf. Fodor 1980). This tradition has long been popular in cognitive science, notably in linguistics, where innate conceptual structures and the cognate notion of domain-speci1c learning are at the heart of the Chomskyan approach. More generally, there is a growing consensus that cognitive development takes stock of the ability to identify the so-called ‘natural kinds’ (cf. Schwartz 1977), in other words, in the seemingly innate propensity to draw non-conventional categorical boundaries that ‘cut nature at its joints’. For instance, all living organisms are reckoned as a natural kind, sharply distinct from the class of non-living entities; the categorical distinction between animate and inanimate entities may even correspond to a domain-speci1c knowledge system subserved by distinct neural mechanisms, as suggested by clinical cases in which the ability to identify living organisms is selectively disrupted (e.g. Caramazza and Shelton 1998; Farah, McMullen, and Meyer 1991; Warrington and Shallice 1984). The role of innate categories in perception is more controversial. The case for perceptual categories is generally stated as follows. In dealing with stimuli that differ by the value of one or more physical parameters, both discrimination and identi1cation functions are expected to be monotonic with respect to the physical scale. In some cases, instead, the stimuli seem to cluster in groups such that discrimination among members of each group is poor, whereas discrimination among members of different groups is sharp. In these cases it may be contended that perception imposes a qualitative (categorical) scale upon the physical continuum such that each scale value identi1es a different group. The best-known instance is the perception of speech sounds, where this peculiar behavior is present as early as two weeks of age (Eimas, Siqueland, Jusczyk, and Vigorito 1971). The fact that the same physical stimuli are treated categorically when the listener is cued into believing that they are speech sounds, and in the quantitative psychophysical fashion when they are instead construed as noises (cf. Jusczyk 1986) supports the view that speech perception is mediated by domainspeci1c mechanisms. By contrast, the same view is seriously questioned by the fact that discriminations based on voice onset time and place of articulation have been found to be categorical also in chinchillas (Kuhl and Miller 1978) and macaques (Kuhl and Padden 1983). The case for a speechspeci1c mode of sound processing, distinguished from a general-purpose acoustic mode, can still be made by showing that phenomena that are present very early in life, such as phonetic equivalence (Grieser and Kuhl 1989; Kuhl 1987; Marean, Werner, and Kuhl 1992), and the ability to detect cross-modal equivalents for speech (Kuhl and Meltzoff 1982), are not present in other species.
411
aapc21.fm Page 412 Wednesday, December 5, 2001 10:07 AM
412
Common mechanisms in perception and action
A step in this direction has been taken recently, by looking at the so-called perceptual magnet effect (Kuhl 1991). In vision, categorical perception—as de1ned above—has been demonstrated for such simple dimensions as hue (cf. Bornstein 1987) and 2icker frequency (Pastore et al. 1977). As regards more complex stimuli, the fact that it is impossible to perceive simultaneously the two interpretations of ambiguous 1gures may be taken to suggest the presence of categorical boundaries. However, the clearest demonstration that slight quantitative changes in the stimulus may have qualitative perceptual consequences is again provided by Johansson’s displays, for instance by the ability to discriminate male and female dynamic postures on the basis of such fairly subtle clues as the position of the (invisible) center of moment of the body (Cutting, Prof1tt, and Kozlowski 1978). As in the case of language perception, such results strongly suggest the intervention of knowledge-based perceptual categories. Moreover, the poverty-of-the-stimulus argument often invoked in language studies to advertise inner sources of domain-speci1c information applies equally well to these surprisingly sharp discriminations. Thus, although Johansson himself never mentioned implicit motor competence as being an important ingredient of the act of perceiving, his work was instrumental in brining this idea back to the forefront. In the next section, I will summarize a selection of recent experimental developments of this idea.
21.2 The contribution of implicit motor competence to perception Three main approaches have been pursued to demonstrate that knowing how we move in2uences the perception of dynamic events. One strategy consists of showing that, faced with incomplete information, the perceptual system 1lls the gap by drawing from motor knowledge. The second strategy is to show that all sorts of strange things happen when the information being supplied is at variance with some motoric rule. The last, more direct strategy is to demonstrate that certain perceptual tasks can be performed more ef1ciently with the help of motor knowledge. The three approaches will be dealt with in this order.
21.2.1 Illusions and delusions The term ‘geometric illusions’ is traditionally employed whenever visual perception does not provide a faithful account of the geometry of the scene. Sometimes, illusions are treated as puzzling phenomena calling for ad hoc explanations. Indeed, many diverse mechanisms have been invoked to explain famous illusions such as the Müller–Lyer double arrow (cf. Coren and Girgus 1978a,b). However, it is possible that a faithful rendering of angles and distances is neither necessary, nor in fact useful in an ecologically meaningful context (Gibson 1966, 1979), and that certain perceptual delusions are simply the result of applying to artfully concoted 2D stimuli rules of interpretation that are well adapted for dealing with real 3D objects (Gillam 1971). Dynamic illusions are a more mixed bag insofar as they include both phenomena (such as the ‘phi’ and ‘sigma-movement’) that may be rooted in the physiology of the visual system, and con1gurational effects (Johansson 1975; Restle 1979) suggesting the intervention of cognitive ‘hypothesis-making’ processes. In both cases, however, the phenomenal experience is penetrable by preconceptions of non-visual nature (Börjesson and von Hofsten 1972, 1973; Braunstein and Andersen 1984). Some of these preconceptions refer to the way the human body moves.
aapc21.fm Page 413 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
21.2.2 Probing mental representations through the apparent motion phenomenon The phenomenon of ‘phi-movement’, also known as apparent or stroboscopic motion (Korte 1915; Wertheimer 1912), affords a congenial tool for investigating the role of tacit motor knowledge in the structuring of perceptions. When two spatially separated objects are displayed sequentially at time intervals that are neither too short nor too long, people report seeing a movement of the 1rst object towards the second. The interesting aspect of the phenomenon is that most of what is perceived is in the eye of the beholder. In the absence of any real motion, conscious experience seems to re2ect the attempt by some inner representational principles to generate a coherent ‘story’ from the scanty information provided by the stimuli (Anstis and Ramachandran 1985). Many observations show that the perceived motion is not always unique, and that the selection among alternative ‘stories’ can be biased by manipulating the displays. Thus, if the objects are simply light points, the motion follows the shortest (straight) path between them. However, if a visible obstacle blocks the straight path, people perceive a curved motion around the obstacle, the shortest route being normally preferred. In the case of extended forms 2ashed in different orientations, a curved motion with a single center of rotation is again preferred to the straight path requiring a combination of rotation and translation (cf. Shepard and Cooper 1982). The interval between the 1rst and the second stimulus (Stimulus Onset Asynchrony, SOA) also in2uences what is perceived, with shorter paths giving the strongest illusion at short SOAs, and longer paths at long SOAs (Korte 1915). The selection of one perceptual solution is also biased by higher order constraints such as rigidity. For instance, a longer path is selected over a shorter one if the latter but not the former implies a non-rigid deformation of the objects (e.g. Kolers and Pomerantz 1971). All the observations above are compatible with the notion that conscious experience carries the imprint of a set of expectations and prejudices concerning the true physical phenomenon of which the stimuli provide only a glimpse. Thus, for instance, the relation between SOA and path length may re2ect the implicit assumption that, ceteris paribus, it takes longer to go further. These expectations and prejudices are not likely to result from perceptual learning. Actually they do not even need to mirror properties of the real world, if only because sometimes the stories that are generated are downright implausible. No one has ever seen an object being transmuted into another one in mid-2ight. Yet this is what we perceive when objects of different shapes are 2ashed sequentially with appropriate SOAs (Kolers and Pomerantz 1971). Our exquisite sensitivity to biological motion must have given us some evolutionary advantage, and it is likely to involve speci1c, innate expectations and prejudices concerning bodily movements that are as deeply entrenched as those that modulate apparent motion. If so, this peculiar illusion should be well suited for demonstrating that the generation of percepts has access to high-level motor knowledge. Over the last years, Shiffrar, Freyd, and their collaborators have developed a line of research based on this intuition. In the 1rst study (Shiffrar and Freyd 1990), observers were shown rapidly alternating sequences of two high-quality pictures showing a human body in different postures. The postures were selected in such a way that only one transition between them corresponded to a natural movement. The other transition either violated a solidity constraint, or was incompatible with the skeletal degrees of freedom (Fig. 21.1(a)). Compatible transitions always corresponded to longer paths than incompatible ones. In a control condition, body postures were replaced by images of objects in two con1gurations, such as a clock marking different hours. Again, the images suggested two possible transitions between con1gurations, following a long and a short path, respectively. However, both transitions were physically
413
aapc21.fm Page 414 Wednesday, December 5, 2001 10:07 AM
414
Common mechanisms in perception and action
Fig. 21.1 Demonstration that apparent motion is penetrable by tacit knowledge of biomechanical constraints. Fast alternating displays of two arm postures (a) generate illusory rotations of the forearm. According to Korte’s law, the perceived motion should always follow the short path (arrow). Instead, even at the smallest values of the Stimulus Onset Asynchrony (SOA), observers sometimes report the forearm to follow the long path (b). This perceptual solution becomes increasingly frequent with increasing SOA. Apparent motion generated by similar alternating displays involving the arm of a clock always follows the short path (c). Somehow, the knowledge that a short forearm rotation would result in a fractured elbow biases the choice of the perceptual solution (modi1ed from Shiffrar and Freyd 1990). plausible. For SOAs ranging between 100 ms and 750 ms, the alternation elicited apparent motion in about 90% of the cases. In the case of body images, both transitions could be perceived, but not always with the same probability (Fig. 21.1(b)). At short SOAs the short (incompatible) transition
aapc21.fm Page 415 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
was more frequent. The opposite happened at long SOAs. No trend emerged instead with objects, the short path being always perceived much more frequently than the long one, independently of the SOA (Fig. 21.1(c)). The results suggest a competition between biasing factors. At short SOAs, the prevailing factor is the preference for shorter paths. At long SOAs body perception becomes penetrable by non-visual (biomechanical) factors such as solidity and skeletal constraints. A later study (Shiffrar and Freyd 1993) con1rmed that perception of the longer, biomechanically compatible patterns of motion is not simply a consequence of the tendency mentioned above to prefer long paths with increasing SOAs. In the pictures used in this experiment both compatible and incompatible paths could be either short or long. As in the 1rst study, when the compatible path was the long one, it was perceived increasingly often with increasing SOA, whereas compatible short paths were almost always preferred, whatever the SOAs. The apparent motion paradigm permits one to identify the structural components of the display that are more directly responsible for the perception of biological motion. Light point displays à la Johansson contain enough information to detect the contact interaction of a body with unseen objects (as, for example, in hammering or stirring, Dittrich 1993), and even to estimate quantitatively the weight of objects being lifted (Runeson and Frykolm 1981). Apparently, we are also implicitly conscious of the rules that legislate how the body can move relative to external objects without contact. A recent study involving displays of a body part moving about inanimate objects (Heptulla Chatterjee, Freyd, and Shiffrar 1996) has shown that biomechanically permissible motions (getting around) are again preferred over impossible ones (passing through). Moreover, objects moving around a still body part are also more often perceived as following the permissible path. This last observation may be taken to indicate a generic consciousness that objects cannot get through each other. This is not so, however. About two times out of three, in watching stroboscopic displays of objects moving about objects, perception no longer cares about the solidity constraint, and takes the short path anyway. Thus, the presence of a body part, or of a credible approximation to a body part, is a necessary structural component for triggering the implicit constraints that bias the selection of biologically permissible paths.
21.2.3 Other preconceptions I summarized evidence that the illusory path of a phi-movement is biased by tacit knowledge about the properties of bodies and objects. Next, I will consider two experiments (Viviani and Stucchi 1989, 1992) that demonstrate a peculiar distortion of both the linear dimensions and the velocity of 2D light point displays. The conditions under which these distortions occur suggest a spontaneous tendency by the observer to assume the display to be the projection of a biological motion. Before turning to the experiments, however, I have to describe a characteristic property of biological motion that provides the basis for these and other studies to be considered later. The movement of massive objects always carries the imprint of the external force 1eld that is acting on them. However, the imprint is more or less clear depending on the relative importance of the inertial, frictional, and elastic forces that are also present. For instance, it is not obvious that the same agent (a gravitational 1eld) is responsible both for the revolution of the planets around the sun (huge inertia, no friction) and for the erratic fall of a feather (small inertia, large friction). When the active force is comparatively large, the system behaves as a displacement generator, and the induced movement is a rather direct expression of this force. Eye movements are a good example of such a simple situation. Limb movements are more disparate because the load that muscles are acting against
415
aapc21.fm Page 416 Wednesday, December 5, 2001 10:07 AM
416
Common mechanisms in perception and action
can be extremely variable. There is just no way of bringing under one description the tossing of a timber and the gestures of an orchestra conductor, the imprint of the motor plan being much more evident in the latter than in the former case. By studying gestures performed against relatively weak loads, motor psychologists have discovered a number of peculiarities of the underlying motor plans. One of these peculiarities concerns the kinematics of the movement. Trajectory and velocity of a moving object are always functionally related. The relationship is generally complex, and depends on the instantaneous value of the active force as well. In contrast, in the case of voluntary free gestures, i.e. when the inertial, viscous, and elastic resistive forces are only those of the limb itself, the velocity–trajectory relationship is surprisingly simple, in spite of the fact that the active torques are time-varying and are generated by many muscles. It has been shown (Viviani and Schneider 1991) that the instantaneous velocity V(t) and the radius of curvature R(t) of the trajectory are related by the expression: V(t) = [K(t)/(1 + αR(t)]1–β)
(1)
In this equation the parameters α and β are constant throughout the movement. The 1rst ranges between 0 and 0.1, depending on the average velocity. The second has a value very close to 2/3 in adults and slightly higher values in young children (Viviani and Schneider 1991). The parameter K (called the velocity gain factor) is constant over relatively long segments of the trajectory. Because it was 1rst proposed in the approximate form A(t) = KC(t)2/3 involving angular velocity A and curvature C (Lacquaniti, Terzuolo, and Viviani 1983; Viviani and Terzuolo 1982), eqn (1) was dubbed the Two-thirds Power Law and is still known with this name. There are reasons to believe that the power law re2ects certain aspects of the neuronal dynamics of the motor control system. Neurophysiological recordings from the primary motor cortex of the monkey suggested that movement direction is coded by recruiting a selected population of neurons with different directional selectivity, the selection criterion being that the vector sum of the individual contributions coincides with the required direction (Massey, Lurito, Pellizzer, and Georgopoulos 1992). Direction and velocity changes during a movement correspond to a continuous independent updating of both the selected population and the level of activation of the recruited neurons. The velocity–curvature covariation captured by the power law re2ects at the level of overt behavior the relation between the two updating processes at the neuronal level (Schwartz 1994). A theoretical analysis (Pellizzer 1997) indicated that, ultimately, the power law expresses an intrinsic limitation in the rate at which these processes can be jointly controlled. Finally, it has been shown (Viviani and Flash 1995) that movements that comply with the power law with β = 2/3 display a peculiar form of smoothness in that they minimize the average jerk (derivative of the acceleration). Because minimum-jerk movements can be controlled in a most effective way (Flash 1990), it may be tempting to attribute a functional signi1cance to the neural dynamics described above, and to the corresponding power law. For our purposes, it is suf1cient to stress that the power law is violated only by onedegree-of-freedom movements (such as the arm swing during walking) where gravity and biomechanical constraints are dominant factors. Attempts to generate free hand movements that violate the law fail even after extensive training under visual guidance (Viviani and Monoud 1990). Conversely, mechanical movements (e.g. those of robotic arms) do not generally obey the V–R relation described by eqn (1). Therefore, the power law is a kind of signature that can be used to tell rather reliably what is biological and what is not. In one experiment (Viviani and Stucchi 1989) subjects were shown a light point repeatedly tracing an elliptic trajectory. After watching ten movement cycles, they had to indicate the orientation of the
aapc21.fm Page 417 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
major axis of the ellipse, which could be either vertical or horizontal. The 1rst trajectory was very elongated and the task was easy. In all subsequent trials the eccentricity changed according to a response-dependent rule. After a correct answer, the task was made more dif1cult by decreasing the eccentricity of the ellipse. After a mistake, the eccentricity was increased. Using a double staircase method with forced-choice response rule, the trajectory perceived as a circle was identi1ed as the one for which responses were at chance level. The entire procedure was repeated under three cinematic conditions. In the 1rst session, the velocity of the light point was constant; in the second one, the velocity pro1le at each point of the trajectory was made equal to that of a biological motion tracing an ellipse with eccentricity 0.7 and a horizontal major axis; in the last session, the velocity was that of a biological motion tracing an ellipse with eccentricity 0.7 and a vertical major axis. The rationale for these manipulations is the following. The Two-thirds Power Law predicts (and experiments con1rm) that circles, and only circles, are traced at constant velocity. Thus, in the 1rst condition, stimuli with large eccentricities were at variance with the biological model. However, the discrepancy decreased as subjects approached the point of subjective equality (trajectories perceived as circles). In the second condition, upright ellipses with large eccentricities were even more deviant with respect to the biological model, whereas horizontal stimuli were roughly in agreement with it. The situation was reversed in the third condition. Thus, in the last two cases circular trajectories were viewed under cinematic conditions which were incompatible with the biological model. The gist of the experiment then was to show that the perception of the aspect ratio (vertical axis/horizontal axis) is biased when the stimuli are not compatible with the biological model. The results con1rmed this expectation. In the 1rst condition, there was essentially no bias in the perception of the aspect ratio. In the second one, subjects perceived as circles trajectories that were actually quite elongated in the vertical direction. Moreover, the differential limen (JND) was twice as large as in the constant-velocity condition. No systematic bias emerged in the third condition. However, the differential limen was again signi1cantly larger than in the 1rst condition. In short, the results suggest an interaction between form and kinematics in which the decisive factor is whether or not the velocity–curvature relation is similar to that found in human limb movements. In particular, the large bias in the second condition is compatible with the hypothesis that subjects were always inclined to 1t the stimuli within the biological model. When the 1t was poor, they attempted to allay the discrepancy by deforming the geometry in the direction dictated by the Twothirds Power Law. Indeed, perceiving a vertical ellipse as a circle implies a compression of the vertical extent, i.e. a 2attening of the portions of the trajectory where velocity was higher. The study of the interaction between form and kinematics was pursued in a second experiment (Viviani and Stucchi 1992). In one condition, subjects were shown a light point continuously tracing a closed random trajectory. In the course of the observation, they could control the way the velocity of the point varied along the trajectory. Subjects were informed that the control acted step by step on the value of just one parameter. At high values of the parameter, velocity increased at the point of high curvature and decreased at points of low curvature. The opposite occurred at low values of the parameter. They were also told that there was just one middle value for which velocity was constant throughout, and that their task was to 1nd this value. During the search, the trajectory remained unchanged. The game was fair insofar as (unknown to the subjects) all possible velocity distributions were computed according to the general equation (1) and the control acted on the exponent β. Thus, when 1 – β = 0 velocity was indeed constant. For 1 – β < 0 the movement was grossly at odds with the biological model for it decelerated when it should have accelerated and vice versa. For 0 < 1 – β < 1/3 velocity decreased with increasing curvature, but less than it does in biological movements.
417
aapc21.fm Page 418 Wednesday, December 5, 2001 10:07 AM
418
Common mechanisms in perception and action
βs
(a)
AVR: 0.333
SDV: 0.058 SC1
(b)
0.833
0.833
0.583
0.583
0.333
0.333
0.083
0.083
– 0.166
– 0.166 5 (c)
9
AVR: 0.340
13
17
5
N
SDV: 0.041 SC3
(d)
0.833
0.833
0.583
0.583
0.333
0.333
0.083
0.083
– 0.166
– 0.166 5
9
13
AVR: 0.305
17
SDV: 0.085 SC2
9
13
AVR: 0.284
5
17
SDV: 0.071 SC4
9
13
17
βf (e)
AVR: 0.354
SDV: 0.049 SC5
0.833
0.70
(f)
0.60 0.583
0.50
0.333
0.40
0.083
0.30 0.20
– 0.166
µ
σ
0.10 5
9
13
17
SC1
SC2
SC3
SC4
SC5
Fig. 21.2 Demonstration that biological movements are perceived as constant. Results from one typical subject. Panels (a) to (e): Each staircase trace describes the successive adjustments made by the subject attempting to identify the constant-velocity pro1le. Each step corresponds to a value of the exponent of the power law that relates the velocity to the curvature of the (invariable) trajectory. A true constant velocity obtains when the exponent is zero (continuous horizontal lines). The twelve traces superimposed in each panel correspond to different initial values of the exponent. Quite independently of the initial value, the selected exponent is always close to that observed in biological movements. ( f ): Averages and standard deviations across all subjects and all initial values of the selected exponent for 1ve different trajectories (SC1 to SC5). The velocities corresponding to these averages vary by as much as 250%, yet these changes are not perceived. Note that in all panels the quantity plotted in ordinates actually corresponds to 1 – β in eqn (1) of the text (from Viviani and Stucchi 1992). For 1 – β > 1/3 the biological trend was exaggerated. Finally, the movement was truly biological only for 1 – β = 1/3. There were 1ve different trajectories, each of which was presented 12 times with a different initial distribution of velocity determined by selecting β at random among 12 values. Figure 21.2 shows
aapc21.fm Page 419 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
the results for a typical subject. Panels (a) to (e) (one for each trajectory) show the progressive convergence of the control parameter toward the value for which velocity appeared constant. Panel (f) shows mean and standard deviation of the selected β for each trajectory. The outcome of the experiment was quite clear. All subjects consistently and accurately selected the biological movement (i.e. β = 2/3) as the best approximation to a constant velocity stimulus. Moreover, when the initial β was positive, nobody, while searching for the appropriate setting of the parameter, ever approached the value β = 1 corresponding to true constant velocity. Similar results were obtained both when the random trajectory was replaced by ellipses of various eccentricities, and in several control conditions. The size of the illusion revealed by these experiments is surprisingly large: velocity variations as large as 250% that occurred in the biological case were not detected. After the experiment, we showed to the subjects what a true constant-velocity movement looked like; we also showed a point moving on a straight path with the velocity pro1le that had been perceived as constant. Both demonstrations were always received with the utmost skepticism. The results described above lead to two distinct (albeit related) conclusions. On the one hand, in formulating velocity judgments, we have access to some implicit knowledge of the motor rule expressed by the Two-thirds Power Law. On the other hand, movements that comply with this rule are perceived as uniform. It is worth stressing that the 1rst conclusion does not entail the second. We could ‘resonate’—as Gibson (1966) would put it—to biological movements while acknowledging that they are not uniform. Thus, the reason why we disregard the large velocity variations present in our stimuli remains to be explained. Runeson (1974), noting that certain motions arising in nature are similar to that of a mass that is acted upon by a continuing force in the presence of viscous friction, argued that the mass–spring–dashpot model is implicitly assumed as a model for all (one-dimensional) displays and concluded that ‘only natural motions look constant’ (p. 11). Along similar lines, one may surmize that biological movements are important enough from the ecological point of view to justify their role of default model for many dynamic displays. If so, one can make the further assumption that perceiving such displays involves a comparison with the best 1tting model prediction, and that accelerations and decelerations are perceived only when the actual velocity deviates from the predicted one. Are motor–perceptual interactions con1ned to vision? The fact that perceptual phenomena like apparent motion have their counterpart in other sensory modalities (Sherrick and Rogers 1966; Perrot 1974) suggests that these interactions may be more pervasive. A demonstration that this is in fact the case was provided by a recent experiment on kinesthetic perception (Viviani, Baud-Bovy, and Redol1 1997). The logic of the experiment and the stimuli were very similar to those of the Viviani and Stucchi (1989) study described above. The key difference was that the dynamic elliptic stimuli were not presented visually, but fed into a computerized robotic arm which drove the passive right arm of the blindfolded subject (Fig. 21.3). The movement continued until the subject identi1ed the orientation of the major axis of the ellipse, which he knew to be either vertical or horizontal. As in the visual experiment, the eccentricity in the 1rst trials was so large that the orientation was easily detected, but it decreased after each correct response, making the task increasingly dif1cult. Also, we tested the same three cinematic conditions described above. In the 1rst condition velocity was constant. In the other two conditions the velocity pro1les would be biological if the trajectory were an ellipse (eccentricity: 0.7) with a horizontal or a vertical major axis, respectively. The results were even clearer than in the case of vision (Fig. 21.4). When the biological model 1t well the kinesthetic in2ow near the point of objective equality (constant velocity for quasi-circular
419
aapc21.fm Page 420 Wednesday, December 5, 2001 10:07 AM
420
Common mechanisms in perception and action
Fig. 21.3 Experimental set-up used for studying kinesthetic perception. A six-degrees-of-freedom robotic arm drives the subject’s right arm through elliptic trajectories with a controllable velocity pro1le. In one experiment the subject has to indicate the orientation of the major axis of the ellipse. In a second experiment (see Fig. 21.7) the subject has to reproduce with the left arm the movement imposed on the right one (from Viviani et al. 1997). trajectories), the aspect ratio of the stimulus was perceived with a small constant error (CE) and a small differential limen (JND). Large, systematic CEs and large JNDs were measured in the other two conditions where the modulation of velocity was inconsistent with the quasi-constant curvature of the trajectory. When the movement decelerated at the right and left extremes of the trajectory, the subjective circle was in fact a vertical ellipse. The opposite bias was present in the other condition. The fact that kinesthetic estimates of vertical and horizontal extent are biased just like visual estimates invites an obvious inference: because two sensory channels that have little in common display the same sensitivity to the relation between form and velocity, the site of the interaction between perception and implicit motor competence must be upstream of the primary sensory mechanisms, perhaps at a level where stimuli are represented in some amodal format.
21.2.4 Only lawful perceptual stimuli guide action effectively Many actions are accomplished successfully even when the main source of information that is meant to provide guidance is incomplete or corrupted. One reason for such robustness is the ability to exploit secondary sources of information that would otherwise be redundant. The second, even more important reason is the brain’s ability to 1ll in for the missing information by drawing from its stock of preconceptions about the way the world goes. Of course, this very same ability can actually make things worse if the course of events in the world actually fails to comply with our preconceptions. Conversely, any evidence that an action performed under the guidance of a sensory input is not performed as accurately as expected points to the possibility that the input deviates from some tacit expectation. The four experiments summarized below were designed for the purpose of identifying one peculiar form of deviance, and, by the same token, one peculiar type of expectation.
aapc21.fm Page 421 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
0.8 0.6 0.4 0.2 0.0 – 0.2 – 0.4 – 0.6 – 0.8
Condition A
0.8 0.6 0.4 0.2 0.0 – 0.2 – 0.4 – 0.6 – 0.8
Condition B
0.8 0.6 0.4 0.2 0.0 – 0.2 – 0.4 – 0.6 – 0.8
Condition C
Eccentricity
v
0
Flat ellipses
Tall ellipses Flat ellipses
Tall ellipses Flat ellipses
Tall ellipses 5 10 15 20 25 30 35 40 45 50 55 60 65 70 Trial number
Fig. 21.4 Perception of the aspect ratio (horizontal axis/vertical axis) of an elliptic motion is affected by the relation between velocity and curvature. Left: velocity (V) distribution of the right hand driven by the robotic arm. Velocity (heavy lines) is plotted in polar coordinates using the hand trajectory as zero reference. The three distributions are exempli1ed in the special case of a circular trajectory; however, the same distributions were used for all trajectories within a condition. Condition A: constant velocity, i.e. the biological distribution for circular motions. Condition B: velocity was maximum at the top and bottom portions of the trajectory; this is the biological distribution for a horizontal ellipse with eccentricity 0.7. Condition C: velocity was maximum at the leftmost and rightmost portions of the trajectory; this is the biological distribution for a vertical ellipse with eccentricity 0.7. Right: Results for one typical subject. Staircase traces describe the variations of the eccentricity (ordinate) of the trajectory for all trials (abscissa) of a complete experiment. By convention, positive and negative values of the eccentricity refer to horizontal and vertical ellipses, respectively. Ascending and descending series alternated randomly within the experiment. After each correct answer, the eccentricity was reduced (in absolute value); after each mistake the eccentricity was increased. The experiment ended when there had been at least 15 inversions in both series indicating that responses were at chance level. The trajectory perceived as circular was identi1ed by the average eccentricity (dotted lines) calculated from the last 10 inversions. There is very little bias in Condition A where the imposed movement is biological near the point of perceived circularity. The trajectories perceived as circles in Conditions B and C are vertically and horizontally elongated ellipses, respectively (reproduced from Viviani et al. 1997).
421
aapc21.fm Page 422 Wednesday, December 5, 2001 10:07 AM
422
Common mechanisms in perception and action
All the experiments involved variants of the pursuit-tracking task in which the hand or the eye is asked to follow dynamic visual targets. In the 1rst experiment (Viviani and Mounoud 1990), the task was to follow, with a stylus, a two-dimensional point light target which was tracing an elliptic trajectory. The experimental factors included the rhythm of the target, the orientation of the major axis of the ellipse, and the way the instantaneous velocity varied along the trajectory. For each combination of rhythm and orientation there were two cinematic conditions (Fig. 21.5). In the 1rst one, (N), the velocity varied according to the biological model. In the second one, (T), the velocity varied in a way that would have been biological for an ellipse oriented at 90 deg with respect to the actual trajectory. Therefore, although both average and maximum velocity were identical in the two cases, the second condition forced the subject to perform a movement that he would never have produced spontaneously. In the biological condition tracking was quite accurate, even at the highest rhythm. Deviations from the target trajectory were very modest, and the delay of the hand with respect to the target was small and constant. By contrast, in the non-biological condition tracking was poor in both space and time domains. The pursuit trace deviated systematically from the target; in addition, it was rotated by as much as 30 deg with respect to the orientation of the ellipse. Things were even worse in the time domain where the interval between target and pursuit oscillated between leads and lags. Aside from the rotation, all the deviations could be accounted for in a simple way: although the task required a strict coupling with the target, the hand went on performing a distorted but highly identi1able biological movement, as if the target was unable to provide appropriate guidance. Similar results were also obtained in another experiment (Viviani, Campadelli, and Mounoud 1987) in which the target followed unpredictable trajectories generated by recording actual scribbling movements. Once again, the key experimental factor was the velocity pro1le, which could be either the original (biological) one, or a constant velocity pro1le which violated the Two-thirds Power Law. One may ask whether the performance errors demonstrated by these two tracking studies have to do with the hand control system per se. A recent experiment involving eye movements (de’Sperati and Viviani 1997) showed that the problem is as general as the perceptual biases described in the previous section. We asked subjects to follow with the gaze a point light tracing ellipses on the computer screen, and recorded the eye tracking movements with high-accuracy scleral-coil lenses. By keeping the cycling rhythm constant, and varying the eccentricity of the ellipses, we explored the upper range of the velocities that, in the case of one-dimensional, predictable targets, are still compatible with accurate smooth pursuit. As in the hand pursuit experiments, the main controlled variable was the velocity pro1le of the target along the trajectory. Each eccentricity was tested with several pro1les computed by inserting a different value of the exponent in the general expression of the power law (eqn 1). Only the pro1le for β = 2/3 corresponded to that of a biological movement. In ocular tracking tasks, smooth pursuit phases are generally interspersed with catch-up saccades, the number and size of which depend on the dif1culty of the task. For each velocity pro1le, tracking accuracy was estimated by two indexes based on retinal position error (RPE) before and after catch-up saccades. The 1rst index measured the increase in the distance between gaze and target during the pursuit phase between successive saccades. The second index measured the difference between the RPE immediately before and after each saccade. Both indexes had a clear minimum around the value β = 2/3, demonstrating that even pursuit eye movements are most effective when the target follows a biological movement. The similarity between hand and eye tracking behavior is further emphasized by comparing the instantaneous values of the RPE around the trajectory (Fig. 21.6) with the analogous plots of the instantaneous hand-target delay (cf. Fig. 21.5). Despite the large difference between hand and eye control systems, both are constrained to generate movements that comply
aapc21.fm Page 423 Wednesday, December 5, 2001 10:07 AM
Fig. 21.5 Non-biological movements cannot be tracked by the hand. Results of a visuo-manual pursuit-tracking experiment (one typical subject). Subjects had to track with a stylus a light point target tracing elliptic trajectories (eccentricity: 0.9). N: Biological condition; the instantaneous velocity of the target complies with the Two-thirds Power Law (i.e. the target mimics a voluntary hand movement tracing the same horizontal ellipse). T: Non-biological condition; the target velocity is that of voluntary hand movement following a vertical ellipse with eccentricity 0.9. Average, maximum, and minimum velocities are identical in the two conditions, but the orientation of the extrema is rotated by 90° (inset polar diagrams). Lower panels: Average pursuit trace (heavy lines) superimposed on target trajectory (thin lines). Upper panels: polar plots of the instantaneous delay between target and pursuit using the target trajectory as zero reference. By convention, values inside (outside) the trajectory indicate that the pursuit leads (lags) the target. In the non-biological condition, tracking is inaccurate in the space domain, and totally in the time domain deviant (Reproduced from Viviani and Mounoud 1990.)
aapc21.fm Page 424 Wednesday, December 5, 2001 10:07 AM
424
Common mechanisms in perception and action
Fig. 21.6 Non-biological movements cannot be tracked by the eye. Results of an experiment in which subjects had to pursue-track a light point which was tracing elliptic trajectories. The main experimental variable was the velocity pro1le of the target, which was controlled by setting the value of the exponent (β) in the power law (eqn (1) in the text). Only one pro1le corresponded to a biological movement (β = 2/3), all others representing graded departures from this model. The angular distance between target and gaze (Retinal Position Error, RPE) affords a measure of the effectiveness of the tracking. The four polar plots describe the RPE (average over all trails and all subjects; data points pooled over the indicated 16 sectors) for four selected values of β and one trajectory (eccentricity: 0.968). The RPE is small and approximately constant in the biological case. It becomes larger and more variable as the stimulus departs more markedly from the biological model. with the power law, and both resist in a similar manner all attempts to make them move in any other way. To conclude this section on tracking, let us consider again the case of kinesthetic stimuli. The computer-controlled robotic arm used to test the perception of the aspect ratio, was also used to test the ability to reproduce with the left hand the passive displacements imposed on the right hand. In this experiment the trajectories imposed by the robot were always ellipses elongated either horizontally or vertically. The task was to rotate voluntarily the left hand in synchrony with right one, following the same perceived trajectory (cf. Fig. 21.3). The rotation had to be performed both symmetrically (i.e. by engaging homologous muscles of the arms) and anti-symmetrically (i.e. by pairing the agonist muscles of one arm with the antagonist muscles of the other). Because neither arm was visible during the task, the imitation was based only on the kinesthetic inputs coming from the displaced limb. There were two conditions that corresponded to those in the Viviani and Mounoud (1990) visuo-manual tracking experiment, described earlier. In the first condition the velocity distribution was congruous with the eccentricity and the direction of the elongation of the trajectory according to eqn (1). In the second condition, in contrast, there was maximum discrepancy because the movement decelerated at points of low curvature and accelerated at points of high curvature. Finally, there was a third condition in which velocity was constant. Only in the congruous case did the left hand faithfully reproduce the movement of the right hand. In the second condition the trajectory was distorted and much more variable from cycle to cycle
aapc21.fm Page 425 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
(Fig. 21.7). The results for the constant velocity condition were somewhat intermediate between the other two. The inability of the left hand to reproduce the motion of the right one was particularly obvious in the time domain. Instead of being small and constant, the delay between hands varied systematically within a cycle in a manner that resembled strikingly both the results of the Viviani and Mounod study (cf. Fig. 21.5), and the RPE plots from the ocular pursuit experiments (cf. Fig. 21.6). In all three cases, the variations in the delay simply re2ected the fact that the tracking movement remained close to the biological model instead of reproducing the imposed template.
Fig. 21.7 Non-biological movements cannot be mimicked. The left hand is trying to reproduce the vertical elliptic movement imposed by the robotic arm (cf. Fig. 21.3) on the right arm (eccentricity: 0.9). Column A: Biological condition; the velocity of the imposed movement reproduces that of a voluntary active movement following the same vertical elliptic trajectory. Column B: First nonbiological condition; the imposed movement has constant velocity. Column C: Second non-biological condition; the velocity of the imposed movement is that of a voluntary movement tracing a horizontal ellipse. Row D: Polar plots of the velocity (heavy lines) using the trajectory (G) as zero reference. Also shown the trajectory (K) that would make the velocity distribution compatible with the Two-thirds Power Law. Row E: Continuous polar plots of the instantaneous delay between left and right hands (heavy lines) for one representative trial. The right-hand trajectory is used as a reference for measuring delays: data points outside (inside) this reference indicate that the left hand lags (leads) the right one. Inside the reference line are also shown the ten movement cycles of the left hand. Row F: Average and standard deviation of the delay computed over all trials in all participants, pooling the data points for the indicated eight sectors. Note the similarity with the analogous data from the visuo-manual pursuit tracking experiment (Fig. 21.5) and with the distribution of the RPE in the eye tracking experiment (Fig. 21.6). (Reproduced from Viviani et al. 1997.)
425
aapc21.fm Page 426 Wednesday, December 5, 2001 10:07 AM
426
Common mechanisms in perception and action
In summary, all four studies demonstrated that neither visual nor kinesthetic inputs provide appropriate guidance for the movement unless they comply with the prescriptions of the power law. Although the required responses were well within the acceptable dynamic range of the hand and of the eye, the pursuit of non-biological targets was somewhat disorganized. Moreover, whatever regularity remained in the motor responses was the expression of a tendency to remain close to the biological model normally followed by unguided, spontaneous movements. It must be stressed that, in all but one study, stimuli were periodic and highly predictable. Yet there was no evidence, even after many cycles, that the sensorimotor loop would learn how to take advantage of such regularity.
21.2.5 A helping hand So far, we have seen that all sorts of strange things happen when innate motor schemata are applied to dynamic stimuli that violate certain biological constraints. To conclude this section, I will emphasize instead the good things that may happen when the biological constraints are satis1ed. The popularity of mental imagery, already a subject of interest by the end of the nineteenth century, has been increasing ever since the techniques of brain imaging opened a window into the underlying mechanisms. Motor images, both when they refer to the imager being engaged in an action (grasping, pointing, running, etc.), and when they refer to the representation of an external agent, presuppose a voluntary (albeit imaginary) effort. It is debated whether this fact alone sets motor images apart from other types of imagery (e.g. mental rotations of geometrical 1gures), or whether, as argued by Annet (1995), the distinction has no real ground. Be that as it may, it is now 1rmly established that motor images elicit activity in many cortical areas normally engaged in the planning of actual movements (cf. Crammond 1997; Jeannerod and Decety 1995). Although the activation of the primary motor area is still controversial (Kawamichi, Kikuki, Endo, Takeda, and Yoshizawa 1998), motor images follow the general rule that applies also to visual, auditory, and tactile images, namely that many cortical areas that are responsive to actual sensimotor events are equally responsive to events evoked from within. Moreover, as is the case for visual images (Shepard and Cooper 1982), temporal and cinematic properties of mental images mimic those of the real, represented events (Parsons 1994; Sirigu et al. 1995, 1996). In monkeys, neurons in the superior temporal polysensory area respond selectively to visual stimuli representing biological motion (Oram and Perrett 1994). More importantly, certain cells (the so-called ‘mirror’ neurons) of section F5 of area 6 that are normally activated during the performance of grasping movements are also activated when the animal observes someone else performing the same gesture (di Pellegrino, Fadiga, Fogassi, Gallese, and Rizzolatti 1992; Gallese, Fadiga, Fogassi, and Rizzolatti 1996). This important discovery has been generalized to humans by showing that the inferior frontal gyrus is active when one simply observes a grasping gesture. Moreover, a PET study (Decety et al. 1994) has shown that the same cortical area is also active when grasping is only imagined. Interestingly, the meaningfulness of the observed gesture appears to be crucial for the involvement of the frontal area (Decety et al. 1997). Also, the fact that section F5 in monkeys partially overlaps with Broca’s area in humans suggests the fascinating hypothesis that mirror neurons may have played a role in the genesis of language (Rizzolatti, Fadiga, Gallese, and Fogassi 1996). The observations summarized above point to a close connection among the mechanisms responsible for interpreting a perceived action, imaging performing that action, and preparing for its performance. They also invite inferences about the functional signi1cance of such a connection. As for the relation
aapc21.fm Page 427 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
between perception and action, Rizzolatti et al. (1996)—echoing Liberman—have suggested that ‘when an external stimulus evokes a neural activity similar to that which, when internally generated, represents a certain action, the meaning of the observed action is recognized because of the similarity between the two representations, the one internally generated and that evoked by the stimulus’ (p.137). In the same vein, Parsons and Fox (1998) noted that ‘The implicit knowledge the brain possesses about movements it can actually generate may also in2uence our interpretation of observed actions’ (p. 599). As regards motor imagery, instead, the hypothesis that motor templates (covertly) activated by the imager (presumably similar to those that would be engaged in a real action) are also instrumental for perceiving the action that they are supposed to realize was formulated by Annet (1969), even before supporting physiological evidence became available. The fact that the con1gural aspects of handwritten letters can be described more accurately while imagining the letter being drawn than while looking at the actual patterns (Zimmer 1982) supports the even stronger hypothesis that motor imagery represents the medium in which recognition or identi1cation of dynamic events take place. In a more subdued vein, Decety et al. (1994) suggested that mentally evoked acts ‘involve rehearsal of neural pathways related to cognitive stages of motor control’ (p. 600). Recent experiments have lent support to these general views. People are generally good at identifying objects seen from different viewpoints, the subjective report being that identi1cation often occurs after some mental manipulation of the image. Discriminating images of the left and right parts of the body (hands, feet) is a somewhat special case because, rather than manipulating the image, observers report that they compare the stimulus to a mental representation of their own body, moving the hand in imagination until it matches the stimulus (Parsons 1987). These subjective reports were con1rmed by an experiment (Parsons 1987) in which subjects were shown pictures of either the left or the right hand in different orientations. In one condition, the task was simply to reproduce with a real movement of the appropriate hand the posture being shown. In a second condition no overt movement was required, the task being simply to tell whether the stimulus was a right or a left hand. In both conditions response times varied as a function of the orientation of the hand. More importantly, the two response times were highly correlated, suggesting that left/right judgements were indeed reached by comparing an initial perceptual cue with a simulated movement of the corresponding hand. More recently (Parsons and Fox 1998), a PET study in which hand images were presented in either hemi-1eld has con1rmed that left/right judgments elicit limb-speci1c activation of the hand contralateral to the side of the presentation. The perceptual in2uence of covert action and the role of handedness were investigated by de’Sperati and Stucchi (1997). Right- and left-handed subjects were shown computer animations of a screwdriver rotating along its main axis, and had to decide whether the tool was screwing or unscrewing (Fig. 21.8). The orientation of the screwdriver and the sense of rotation varied from trial to trial. For some orientations the posture for gripping the tool was quite natural for the right hand, and awkward for the left. The converse was true for other orientations. In one condition subjects were told to rely only on visual cues. In a second condition they were encouraged to imagine their dominant or non-dominant hand grasping the screwdriver. As expected on the basis of previous studies on mental rotation (cf. Shepard and Cooper 1982), response times depended on stimulus orientation (Fig. 21.9). The new 1nding was that response times were longer when subjects were instructed to imagine using the non-dominant hand for a grip that would have been more natural for the dominant one. Again, the results suggest that eliciting a mental image of the dominant hand was the natural strategy for responding, whether or not this was suggested. When instructions explicitly con2icted with this tendency, the response took as much as 0.7 s longer than when no instruction was given.
427
aapc21.fm Page 428 Wednesday, December 5, 2001 10:07 AM
428
Common mechanisms in perception and action
Fig. 21.8 Static representation of the stimuli used for demonstrating the role of motor imagery. Actual stimuli were presented in a random order at the center of the screen and were rotating (180 deg/sec) around their main axis. Subjects had to indicate as soon as possible whether the rotation was clockwise (screwing) or counterclockwise (unscrewing). In the observation condition, subjects were not encouraged to use motor imagery. In the imagery conditions, subjects were encouraged to imagine grasping the screwdriver with either their dominant or non-dominant hand before answering. Both right- and left-handers were tested (reproduced from de’Sperati and Stucchi 1997).
21.2.6 Perceptual anticipation Reading a handwritten text is a most impressive perceptual feat. On the one hand, letter templates are recovered quite reliably in spite of the extravagant idiosyncratic variations to which they are subjected. On the other hand, the recovery process does not discount the variations, for otherwise we would not be able, for instance, to recognize a familiar handwriting. Because theories of handwriting recognition working their way up from abstracted geometrical features to templates (cf. Gibson and Levin 1975) proved inadequate, attempts have been made to incorporate the contribution of motor knowledge. By taking inspiration from what graphologists have always claimed, Zimmer (1982) conjectured that recognition uses information about the way the letter was written, and that such information is derived from tacit knowledge of the writing method. A recognition experiment (Freyd 1983) involving an arti1cial character set supported the conjecture. Subjects who had learned one of two methods of tracing the characters in the template form performed well when asked to identify characters that had been distorted in a way that was congruous with the method they had learned. Performance dropped when the distortion was instead consonant with the other method. A subsequent experiment (Babcock and Freyd 1988) tested the sensitivity to variations in the handwritten trace. It was found that, when asked to reproduce arti1cial letters from memory, subjects unconsciously adopted the stroke direction used to trace the template set that they had memorized. Therefore, motor competence makes it possible to extract from a memory trace information
aapc21.fm Page 429 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
Right-handers (n=15) 5.5
5.0
4.5 4.0 3.5
Left-handers (n=15) 5.5
OBS RHI LHI
5.0 Response time (s)
(b)
Response time (s)
(a)
v
3.0 2.5
4.5 4.0 3.5 3.0 2.5
2.0
2.0
1.5
1.5 –165 –115 –65 –15 15
65 115 165
Screwdriver orientation (deg)
OBS RHI LHI
–165 –115 –65 –15 15
65 115 165
Screwdriver orientation (deg)
Fig. 21.9 Response times (averaged over all trials and all subjects) as a function of the experimental factors (OBS: Observation condition; RHI: Right-hand imagery condition; LHI: Left-hand imagery condition). Screwdriver orientation has a major effect of all response times. Response times in the observation condition were not statistically different from those measured when the subjects imagined using the dominant hand. Instead, latencies were signi1cantly longer when subjects had to imagine using the non-dominant hand (reproduced from de’Sperati and Stucchi 1997).
relating to production. A recent study to be described below demonstrated that motor competence is also instrumental in exploiting anticipatory affects. Planning and execution of complex sequences of movements involve a signi1cant amount of look-ahead revealed by the fact that units of motor action being executed often carry the imprint of yet-to-be-executed units. Anticipatory adjustments are present in many language-related movements such as speech (Benguérel and Cowan 1974), Morse code (Bryan and Harter 1897), typing (Viviani and Laissard 1996), and handwriting (Thomassen and Schomaker 1986). In handwriting, anticipatory adjustments can be used to predict the letter that is about to be traced (Orliaguet, Kandel, and Boë 1997). Kandel, Orliaguet, and Viviani (2000) investigated the basis of this predictive ability. Speci1cally, we tested the hypothesis that reliable predictions can be made only when the stimuli complied with the Two-thirds Power Law. We recorded a set of 100 instances each of the trigrams LLL and LLN handwritten by one individual (Fig. 21.10, A, B). Two template traces of the middle L were generated from each set by selecting and averaging the 10 traces with the least within-set temporal variance, and the least between-set geometric variance (Fig. 21.10, C, D). The shapes of the templates were almost indistinguishable, and their total duration was normalized to 1 sec. However, the velocity pro1les of the middle L embedded in the two trigrams were different (Fig. 21.10, E, F). In one variant of the experiment, subjects were shown the templates traced on the computer screen by a light point. They were informed that the trace was excerpted from a continuous writing movement, and that the following letter (not presented) could be either another L or a N. The task was to guess this letter, but subjects had the option of not answering if they had no clue. Following the technique introduced by Viviani and Stucchi (1992),
429
aapc21.fm Page 430 Wednesday, December 5, 2001 10:07 AM
430
Common mechanisms in perception and action
(a)
Trigram : LLL
(c)
Trigram : LLN
1 cm
(d)
1 cm
(b) V (cm/sec) 30 20
R (cm) V (cm/sec) 40 30 (e) 30 20 20
10
30 20
10 10
0 0.0
R (cm) 40 (f)
0 0.5 1.0 Time (sec)
10 0 0.0
0.5 Time (sec)
0 1.0
Fig. 21.10 Stimuli used to demonstrate that only in biological movements do coarticulatory cues provide the basis for perceptual anticipation. (a), (b): Typical instances of the complete traces of the two trigrams used in the experiment. The traces are recordings of actual writing movements. Only the portion between the two dots was actually shown. The stimulus was a light point tracing the middle letter on the computer screen. Immediately after the disappearance of the trace, subjects had to guess the third letter. The experimental variable was the velocity pro1le of the light point. Using the exponent of the power law (eqn (1) in the text) as a parameter, we tested seven velocity pro1les, only one of which mimicked the biological writing movement. (c), (d): Average trajectories (templates) of the middle letter computed over 10 recordings of the trigrams in (a) and (b), respectively. Bands around the averages indicate the geometrical variability. (e), (f): average and standard deviation of the tangential velocity for the two templates in (c) and (d), respectively. Durations have been normalized to 1 s and velocities have been scaled accordingly. Also shown (lower traces) is the instantaneous value of the radius of curvature of the templates (reproduced from Kandel et al. 2000). the invariable trajectory of each template was traced on the screen with 7 different velocities computed from eqn (1) by setting the exponent β to the values 1/6, 2/6, 3/6, 4/6, 5/6, 6/6, 7/6. At the middle value (β = 4/6) the velocity covaried with the curvature of the trace as it does in biological movements. For all other values of β, stimuli departed in a controlled fashion from the biological model. The results were unequivocal (Fig. 21.11). When the velocity mimicked that of the original handwriting movement, the following letter was guessed with a much higher probability than chance. Moreover, the rate of ‘No answer’ remained very low even though subjects were a priori quite skeptical about their guessing ability. The performance degraded with increasing distance from the biological
aapc21.fm Page 431 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
1.0 Trigram : LLL Trigram : LLN
0.9
Response probabilit y
0.8
Correct answer
0.7 0.6 0.5 0.4 0.3 Wrong answer 0.2
No answer
0.1 0.0
1/6
2/6
3/6
4/6 5/6 Exponent
6/6
7/6
Fig. 21.11 Response probabilities (average over all subjects) as a function of the exponent of the power law. The middle value (4/6) corresponds to the Two-thirds Power Law approximation to a biological movement. Note that the rate of wrong answers increases well beyond chance level when the stimulus departs drastically from the biological model (reproduced from Kandel et al. 2000).
model. At the four extreme values of β there was even a paradoxical inversion of the predictions. These 1ndings are in keeping with the view that discriminal information is evoked by the stimuli through the interaction with implicit motor knowledge. The only objective basis for predicting the following letter was the internal timing of the traces, which remained invariant across β values. One can then make the hypothesis that discriminal information gained from this only cue was used to trigger an internal simulation of the complete gesture (i.e. including the next letter), and that the response was selected on the basis of the outcome of the simulation. If so, the high error rate at extreme β values may not be such a paradox. Perhaps the invariant timing of the stimulus suggests an initial (correct) guess. However, when the velocity–curvature covariation is grossly at variance with the simulation (which, by de1nition, follows the Two-thirds Power Law), the observer is induced to reject the initial guess and opt for the (wrong) alternative.
21.3 Speculations I have attempted to marshal empirical evidence for the claim that perception of dynamical events involves, inter alia, framing the sensory data within a set of relational constraints derived from our motor competence. I do not expect the case to be watertight, if only because the sample of studies reviewed here is neither exhaustive nor random. Moreover, certain issues still need to be addressed in order for the claim to make contact with current theorizing on sensorimotor interactions. In this closing section, I will try to state these issues as concisely as possible by asking two questions. First, where do constraints come from? Second (and, relatedly), what is their format?
431
aapc21.fm Page 432 Wednesday, December 5, 2001 10:07 AM
432
Common mechanisms in perception and action
21.3.1 Where do constraints come from? In phrasing the point of view defended here in terms of constraints acting on internal representations, I have adopted a clear cognitive stance departing both from the Helmholtzian doctrine that percepts are constructed by simply combining sensory cues, and from the Gibsonian doctrine that the only constraints that we need to take into account are the invariances that nature imposes from without on the sensory 2ow. Actually, I have used the term ‘model’ somewhat interchangeably with the term ‘constraints’, to emphasize the basic tenet of cognitive psychology that the transition between sense data and percepts is realized when the former are framed within an internally available script. Internal constraints (or model, or scripts) may be innate or emerge from the internalization of signi1cant, recurrent regularities present in the world. Linguistic constraints (Universal Grammar) that map multidimensional conceptual representations onto one-dimensional sequences of expressive gestures are a prime example of the former possibility; whatever their remote origin, they are now well entrenched in the genetic code. Although the nature of linguistic constraints is still controversial, it remains that if a child does not talk by age 4, something is awry in his brain—whereas executing a novel musical score in real time is a skill based on the harmonic and melodic rules that some individuals have interiorized through practice: we are all born to talk, but no one is a born pianist. Where does the perception of dynamic events stand between these two extremes? One in2uential opinion was expressed by Shepard (1984). Shepard argued that the process of internalization is not an all-or-none affair, and that the difference between temporary attitudes and dispositions, at one end of the continuum, and genetic determinants, at the opposite end, is a matter of degree, not of substance (actually, a similar view has been defended even in the case of language; see Lieberman 2000). As regards the perception of dynamic events, Shepard acknowledges the reality of short-term phenomena, such as priming and cueing (e.g. the path-guided apparent motion; Shepard and Zare 1983). The emphasis, however, is on entrenched constraints. In his words: ‘There are good reasons why the automatic operations of the perceptual system should be guided more by general principles [ . . . ] than by speci1c principles governing the different probable behaviors of particular objects’(p. 426), and ‘What is perceived is determined [ . . . ] by much more general abstract invariants that have instead been picked up genetically over an enormous history of evolutionary internalization’ (p. 431). Favoring innate factors over learning sounds very reasonable and uncontroversial. Because it is so important for survival, coding in the genes the principles that make dynamic perception effective is a far better solution than acquiring these principles from scratch during each individual life. Two other components of Shepard’s credo are more substantive and more open to debate. First, he states that, no matter how far back in the past, the ultimate source of inspiration for the principles he is advocating has to be traced to the regularities present in the external world. Even ‘[syntactic rules] may have been built upon already highly evolved rules of spatial representation and transformation, [and may be] to some extent traceable to abstract properties of the external world’ (note 6, p. 431). Then, he further quali1es his view by stating that, as far as perception of motion is concerned, the single most important source of inspiration is cinematic geometry which ‘governs motions of rigid objects, or of local parts of rigid objects, during brief moments of time’ (p. 422). Both statements deserve comment. As for the 1rst one, it is indeed likely that some of the most general principles (e.g. the impossibility of two solid objects occupying the same position at the same time, or the fact that the duration of a displacement cannot be zero) re2ect mandatory world-constraints. However, because at any stage of its development the brain has also been part of the world, there seems to be no reason
aapc21.fm Page 433 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
to rule out the hypothesis that some speci1c principles re2ect constraints pertaining to the functioning of the brain itself. Actually, some amount of constraining from within may turn out to be particularly useful whenever perception requires a preliminary parsing of the sensory messages. Language perception provides a prime example of such a situation, and it is no accident that one of the most articulated motor theories of perception is the one proposed by Liberman to account for the effectiveness with which phonemes are extracted from the acoustic 2ow (Liberman, Cooper, Shankweiler, and Studdert-Kennedy 1967; Liberman and Mattingly 1985). Liberman argued that the objects of speech perception are the intended phonetic gestures of the speaker represented in the listener’s brain as the invariant motor commands that he himself would have issued to produce the same sounds. If this premise is valid (recall the problematic presence of categorical sound perception in animals), the principles at work must have become engrained in the perceptual system pari passu, and in strict association with the development of the speci1c mechanisms for speech production, quite independently of any regularity present in nature. By analogy, the same internal coupling between production and perception might have evolved in other perceptual domains, such as the facial expression of emotions (cf. Eckman and Davidson 1994), expressive hand gestures, and, more generally, in all those cases where the best possible normative model for interpreting incoming sensory information is provided by the very same production rules that we would apply to broadcast that information. After all, non-living objects do not laugh, conduct an orchestra, or make threatening gestures. How could they compete with us as a model for all these activities? Moreover, given that our motor competence provides a direct, reliable basis for such a model, is it really necessary to invoke the genetic interiorization of unspeci1c principles as Shepard did? The hypothesis that cinematic geometry is a source of inspiration of our perceptual models, is clearly supported by robust phenomena (cf. Shepard and Cooper 1982). However, inasmuch as it countenances only cinematic quantities, this hypothesis makes no provision for distinguishing among types of movement (see above). Speci1cally, it does not acknowledge the distinction between biological and non-biological movements that I have stressed throughout this chapter. In fact, such distinction is mentioned only cursorily in Shepard’s 1984 paper. For sure, some simple biological movements (e.g. the swing of the legs during walking) can be described within the framework of kinematics. This is not the case, however, for the vast majority of expressive limb movements (the very same movements for which I have advocated a direct production–perception coupling) inasmuch as they carry the imprint of the logic of the underlying control system. As a simple illustration of this point, consider the well-known positive correlation between the distance to be covered and the average velocity of pointing movements (Fitts’ law). This correlation is the overt manifestation of some intrinsic rule of the motor system and is certainly a perceptually salient cue in the identi1cation of biological gestures. Yet moving objects do not display such a correlation unless the driving force was planned accordingly—an implausible circumstance in the early days of evolution. In short, while I agree that cinematic geometry might have contributed to shaping perceptual models, I would rather assign a preeminent role to biological dynamics. By reversing the balance suggested by Shepard, I am here suggesting that, even when this may not be appropriate, humans tend to adopt biological motion as the default model for dynamic perception.
21.3.2 What is the format in which constraints are represented? A discussion of the neuronal bases of the production–perception coupling is beyond the scope of this chapter. Instead, I will adopt a functionalist point of view by asking just what we might mean by
433
aapc21.fm Page 434 Wednesday, December 5, 2001 10:07 AM
434
Common mechanisms in perception and action
saying that motor competence provides a framework for dynamic perception. Before attempting an answer, however, I have to hark back to a more general question. Traditionally, the fact that perception and action are subserved by distinct anatomical structures has motivated the inference that perceptual contents and action goals are represented separately in different formats (indeed, the very fact of countenancing a production–perception coupling presupposes a degree of functional autonomy between the underlying representations). This seemingly uncontroversial assumption has been challenged by the Theory of Event Coding (TEC) which holds that perceived and to-be-executed events are coded within a common representational medium (cf. Hommel, Müsseler, Aschersleben, and Prinz 2001). The theory’s key concept is the event code, which is construed – both for percepts and motor gestures – as a collection of features describing identi1able properties of the event. By necessity, the reference frame adopted by the theory is uncompromisingly allocentric: the features coded internally refer to the distal events, not to the proximal stimulation. Moreover, features are amodal: in their perceptual role, they integrate the contribution of all available sensory channels; in their motor role, they modulate the activities of various components of the motor system. The theory provides a natural framework for interpreting both priming and interference effects. Suppose that two events share a feature. Perceiving or planning one event will 1rst activate all its features, including the feature shared with the other event. Thus, in this preliminary phase, priming prevails. In contrast, interference sets in at a subsequent stage, when the features of the selected event are bound together into a coherent unit, and the shared feature is no longer available to the non-selected event. Inasmuch as TEC posits that the distinction between percepts and actions is not the re2ection of a type distinction between their internal representations, but rather the re2ection of the different roles that event codes are called to play in any given circumstance, it would appear to be ideally suited to explaining why motor competence biases perception. Upon re2ection, however, things may turn out to be less straightforward. To exemplify, let us consider again Shiffrar and Freyd’s experiment (cf. Fig. 21.1) from the point of view of TEC. Presumably the sequential presentation of the body images activates two distinct events, corresponding to a small and a large rotation of the forearm, respectively. Which one wins over the other and reaches consciousness is determined by the relative strength of competing biasing factors, i.e. the timing of the presentation, and the degree of biomechanical compatibility. The fact that both have a chance, however, poses a problem to TEC because no event corresponding to a small rotation should exist: after all, why should one want to generate a code that, qua movement, would break the elbow? The more general point that I want to emphasize by this example is that positing a type identity between percepts and gestures does not do justice to the fact that perception has more degrees of freedom than movement. It is indeed possible that the internal representation of percepts relative to gestures is intimately related to, or even coincides with the representation of the gestures. It is also possible, as we have seen, that gestures provide a privileged model for interpreting sensory data. It is questionable, however, that TEC’s basic assumption holds true in the case of percepts (and there are many) for which the model is hopeless because there is no action that may be sensibly related to that percept. In the spirit of TEC, the notion of coupling adopted here becomes almost redundant. Instead, if my criticism is correct, it still makes sense to ask which form the in2uence of motor competence on perceptual representations takes. Ironically, a source of inspiration comes from the notion of resonance introduced by the very same Gibson who did not think much of internal representations anyway. The 1rst attempt to spell out a theory of perception based on resonance is again due to Shepard (1984) who used actual resonating systems as a metaphor for identifying the features that recom-
aapc21.fm Page 435 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
mend this concept as a basis for perception. The major features of linear resonators (the only type considered by Shepard) are: (1) Productivity. Resonators are designed to respond maximally to just one category of stimuli; however, they also exhibit the spontaneous tendency to respond in a sustained, characteristic manner to unstructured impulses from within. (2) Tunability. Many resonating systems can be tuned to different frequencies by acting on just one parameter (e.g. the length of the diapason’s rod, or the value of the capacity in an oscillating electrical circuit). Tunability is important insofar as it entails controllability. (3) Robustness. A resonator tuned to one frequency also responds (although to a lesser degree) to stimuli of different frequency, as well as to incomplete or corrupted stimuli. This is a desirable feature for a system that has to deal reliably with the vagaries of the environment. (4) Sympathy. A collection of resonators, only one of which is activated (from without or from within, see point 1 above), tend to join in, responding indirectly to the activity of the precursor. This spontaneous spreading of activation provides a hypothetical but congenial basis for such well-known phenomena as perceptual completion. Ever since Shepard’s initial proposal, the scope of the resonator metaphor has broadened considerably, leading to the so-called dynamical systems theory. Mostly under the impulsion of S. Kelso, M. Turvey, E. Thelen, and their collaborators, a full-2edged theory of motor coordination has been developed (cf. Kelso 1995; Kugler and Turvey 1987; Thelen and Smith 1994) by cross-breeding and generalizing certain seminal ideas that had been around for quite a while. One idea, 1rst advertised by Bernstein (1967), is that the variables involved in controlling movements are organized in groups (synergies) by a network of mutual constraints. Synergies within a dynamic system can be characterized by collective variables in a control space with much lower dimensionality than that of the system itself. Another idea is that, to a 1rst approximation, certain movements resemble the oscillations of resonators. In order for these intuitions to become fruitful, the original resonator concept had to be generalized along two directions. Firstly, it was expedient to consider non-linear resonators (or oscillators as they are now more commonly called), that is, systems in which displacement and velocity-dependent energy terms are no longer related by linear equations. Secondly, the notion of sympathetic resonance (point 4 above) had to be extended to include explicit coupling among a set of oscillators, thus proving a basis for the establishment of synergies. A characteristic property of this generalized class of dynamical systems is that they posses limit cycles, that is, trajectories in the phase space (basin of attraction) towards which the behavior of the system converges for any initial state within a set. A system may possess more than one limit cycle, each with its own basin of attraction, allowing the (asymptotic) stable state to depend on the initial conditions. Moreover, dynamical systems have the property that a small change in their parameters may result in a sudden (catastrophic) change in the qualitative type of its attractors (bifurcations), thus providing a way to make continuity and discreteness coexist within a uni1ed framework. Finally, the single most signi1cant feature of the dynamical approach is that the way the process being described unfolds in real time is taken into account at the same time as the sequence of states through which it achieves the desired goal. Unlike most computational models, which specify only the nature and sequence of the intervening processing stages, dynamical models also emphasize the time course of the process, which becomes one of their testable predictions. Although the dynamical systems approach was developed initially to account for motor behavior, a growing number of cognitive scientists are now advertising the possibility—indeed the desirability —of applying the approach to many other domains of cognitive science, including perception and development, for which, until recently, the computational paradigm has been the reference guide (cf. Port and van Gelder 1995). In particular, multiple limit cycles seem ideally suited to modeling such
435
aapc21.fm Page 436 Wednesday, December 5, 2001 10:07 AM
436
Common mechanisms in perception and action
Sensory inputs
Motor resonators
Perceptual resonators
Coupling
Mirror neurons??
Motor commands
Fig. 21.12 A functional scheme for describing motor–perceptual interactions; full description in the text. well-known phenomena as bistable and categorical perception. To conclude, I wish to suggest a further extension, by arguing that the dynamic systems approach may also be conducive to addressing the problem of representing the production–perception coupling. The scheme I am entertaining here (Fig. 21.12) features two sets of hierarchically organized oscillators, subserving the generation of percepts and the organization of actions, respectively. Each set comes with a full complement of within-set couplings (some genetically speci1ed, some acquired through learning) responsible for integrating and orchestrating the activation of the individual components in the respective domains. The scheme also contemplates between-set couplings which, again, may be inborn or acquired. Consonant with what I said before, the perceptual set is granted a richer repertoire of resonant modes (or limit cycles) than the motor set. Which perceptual mode prevails at any one time depends largely on the sensory in2ow. However, the selection of the winner is also in2uenced by the couplings with the motor set. Through these couplings, resonant modes are induced in the motor oscillators even in the absence of any direct activation originating from the will to move. In fact (and here I am taking direct inspiration from TEC) there is no reason to posit a type difference between motor and perceptual oscillators. Oscillations (as well as between- and within-set couplings) may well originate from the same physiological mechanisms, as long as the respective roles are assigned by their speci1c source of activation. The important point is that, whether activation spreads from the perceptual to the motor set or the other way around, the result is itself an integrated, superordinate limit cycle. Thus, some familiar perceptual experiences would be precluded to an individual in whom the motor set of oscillators were not functional. Conversely (but, perhaps, to a lesser extent) suppressing perceptual resonances should alter the normal motor repertoire, not only because reafferences would be missing, but also because some global resonant modes would no longer be available. To give a feeling of how this scheme might work, let us see how it accounts for some of the observations summarized in the preceding section. Consider 1rst the peculiar motor–perceptual interaction described by Stricker (see Section 21.1). Imagining the moving clouds involves triggering from within (i.e. from visual memory) a complex pattern of resonant modes in the perceptual set. By itself, the trigger would not be suf1cient for sustaining the pattern. A stable percept emerges because the initial activation spreads to the motor set through a coupling that, presumably, has been established by the fact that pursuit eye movements are normally associated with that perceptual pattern.
aapc21.fm Page 437 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
If the motor set is silenced (damped) by a act of will, its necessary contribution to the establishment of the global pattern is no longer there, and the clouds stop. As a second example, let us again take up Shiffrar and Freyd’s 1990 experiment. Here the trigger comes from without in the form of a sequence of images and, as argued before, excites two distinct perceptual modes. Again, activation spreads to the motor set. In this case however, the contribution of the motor set is highly selective, because no resonance exists there which would correspond to an elbow-breaking torsion. Reinforcement is given to only one of the two competing perceptual modes. The experiment has shown that this contribution is decisive provided that it overcomes the biasing action of the timing of the sequence. The 1nal example is the misperception of the aspect ratio demonstrated by Viviani and Stucchi (1989). In the control condition of that experiment (condition A), when the light point is tracing a circle at a constant velocity, the perceptual resonant mode is reinforced by the coupling with the motor set, which is itself capable of sustaining a similar mode (recall that we spontaneously draw circles at constant velocity). In condition B, the same circular trajectory is traced with a velocity that would be natural for a horizontally elongated ellipse. The two determinants of the motion (shape and velocity) no longer correspond to a limit cycle in the phase space of the motor set. Therefore, the sympathetic resonance of the motor set is not stable. Striving for equilibrium, the complete dynamic system ends up distorting the perceived shape (possibly, the velocity as well) in a direction that reduces the discrepancy. Surely, the conceptual scheme presented above is highly speculative, as the title of the section promised. However, it is not beyond the reach of empirical tests. In particular, I believe that experiments in which subjects are asked to make dynamic perceptual judgments while performing independent motor tasks should be able to address the basic hypothesis behind the scheme. I also believe that the scheme provides one of the simplest solutions to the problem of representing time, a most important ingredient for perceiving dynamic events. It has been argued (Freyd 1987) that represented and external time must be related isomorphically, so that the 1rst shares with the second the properties of continuity and directionality (cf. Palmer 1978). In other words, the representation itself should be construed as a process unfolding with its own time-scale. If one accepts these premises, oscillators stand out as ideal candidates for the role of timekeepers. Moreover, the implicit knowledge that is supposed to affect perception via the coupling between motor and perceptual sets may well concern the internal timing of the perceptual process.
Acknowledgement The preparation of this chapter was partly supported by FNRS Research Grant, 31–55620.98.
References Annet, J. (1969). Feedback and human behaviour. Harmondsworth: Penguin Books. Annet, J. (1995). Motor imagery: Perception or action? Neuropsychologia, 33, 1395–1417. Anstis, S. and Ramachandran, V. (1985). Kinetic occlusion by apparent motion. Perception, 14, 145–149. Babcock, M.K. and Freyd, J.J. (1988). The perception of dynamic information in static handwritten forms. American Journal of Psychology, 101, 111–130.
437
aapc21.fm Page 438 Wednesday, December 5, 2001 10:07 AM
438
Common mechanisms in perception and action
Bassili, J.N. (1978). Facial motion in the perception of faces and of emotional expressions. Journal of Experimental Psychology: Human Perception and Performance, 4, 373–379. Beardworth, T. and Bukner, T. (1981). The ability to recognize oneself from a video recording of one’s movement without one’s body. Bulletin of the Psychonomic Society, 18, 19–22. Bell, C. (1811). On the motions of the eye, and illustration of the uses of the muscles and nerves of the orbit. Philosophical Transactions of the Royal Society, London, 113, 166–168. Benguérel, A.P. and Cowan, H.A. (1974). Coarticulation of upper lip protrusion in French. Phonetica, 30, 41–55. Berkeley, G. (1709). An essay towards a new theory of vision. Dublin. [reprinted: London: J.M. Dent (1969)]. Bernstein, N. (1967). The coordination and regulation of movements Oxford, UK: Pergamon Press. Bertenthal, B.I. and Pinto, J. (1994). Global processing of biological motions. Psychological Science, 5, 221–225. Bertenthal, B.I., Prof1tt, D.R., and Cutting, J. (1984). Infant sensitivity to 1gural coherence in biomechanical motion. Journal of Experimental Child Psychology, 37, 213–230. Bertenthal, B.I., Prof1tt, D.R., Spetner, N.B., and Thomas, A. (1985). Development of the perception of biomechanical motions. Child Development, 56, 531–543. Bertenthal, B.I., Prof1tt, D.R., and Kramer, S.J. (1987). Perception of biomechanical motion in infants: Implementation of various processing constraints. Journal of Experimental Psychology: Human Perception and Performance, 13, 577–585. Börjesson, E. and von Hofsten, C. (1972). Spatial determinants of depth perception in two-dot motion patterns. Perception and Psychophysics, 11, 263–268. Börjesson, E. and von Hofsten, C. (1973). Visual perception of motion in depth: Application of a vector model to three-dot motion patterns. Perception and Psychophysics, 13, 169–179. Bornstein, M.H. (1987). Perceptual categories in vision and audition. In S. Harnad (Ed.), Categorical perception, pp. 287–300. Cambridge: Cambridge University Press. Braunstein, M.L. and Andersen, G.J. (1984). Shape and depth perception from parallel projections of threedimensional motion. Journal of Experimental Psychology: Human Perception and Performance, 10, 749–760. Bryan, W.L. and Harter, N. (1897). Studies in the physiology and psychology of the telegraphic language. Psychological Review, 4, 27–53. Caramazza, A. and Shelton, J.R. (1998). Domain-speci1c knowledge systems in the brain: The animate–inanimate distinction. Journal of Cognitive Neuroscience, 10, 1–34. Claparède, E. (1902). Expériences sur la vitesse du soulèvement des poids de volumes différents [Experiments on the speed of lifting of weights with different volumes]. Archives de Psychologie, 1, 69–94. Coren, S. and Girgus, J.S. (1978a). Seeing is deceiving: The psychology of visual illusion. Hillsdale, NJ: Erlbaum. Coren, S. and Girgus, J.S. (1978b). Visual illusions. In R. Held, H.W. Leibowitz, and H.L. Teuber (Eds.), Handbook of sensory physiology, Vol.8: Perception, pp. 549–568. New York: Springer-Verlag. Crammond, D.J. (1997). Motor imagery: Never in your wildest dream. Trends in Neuroscience, 20, 54–57. Cutting, J.E. (1981). Coding theory adapted to gait perception. Journal of Experimental Psychology: Human Perception and Performance, 7, 71–87. Cutting, J.E. and Kozlowski, L.T. (1977). Recognizing friends by their walk: Gait perception without familiarity cues. Bulletin of the Psychonomic Society, 9, 353–356. Cutting, J.E., Prof1tt, D.R., and Kozlowski, L.T. (1978). A biomechanical invariant for gait perception. Journal of Experimental Psychology: Human Perception and Performance, 4, 357–372. Cutting, J.E., Moore, C., and Morrison, R. (1988). Masking the motion of human gait. Perception and Psychophysics, 44, 339–347. Decety, J., Perani, D., Jeannerod, M., Bettinardi, V., Tadary, B., Woods, R., Mazziotta, J.C., and Fazio, F. (1994). Mapping motor representations with positron emission tomography. Nature, 371, 600–602. Decety, J., Grèzes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E., Grassi, F., and Fazio, F. (1997). Brain activity during observation of actions: In2uence of action content and subject’s strategy. Brain, 120, 1763– 1777. de’Sperati, C. and Stucchi, N. (1997). Recognizing the motion of a graspable object is guided by handedness. NeuroReport, 8, 2761–2765. de’Sperati, C. and Viviani, P. (1997). The relationship between curvature and velocity in two-dimensional smooth pursuit eye movements. Journal of Neuroscience, 15, 3932–3945. di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180.
aapc21.fm Page 439 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
Dittrich, W.H. (1993). Action categories and the perception of biological motion. Perception, 22, 15–22. Duhamel, J.-R., Colby, C.L., and Goldberg, M.E. (1992). The updating of the representation of visual space in parietal cortex by intended eye movements, Science, 255, 90–92. Eckman, P. and Davidson, R.J. (Eds.) (1994). The nature of emotion: Fundamental questions. Oxford: Oxford University Press. Eimas, J.L., Siqueland, E.R., Jusczyk, P., and Vigorito, J. (1971). Speech perception in infants. Science, 171, 303–306. Farah, M.J., McMullen, P.A., and Meyer, M.M. (1991). Can recognition of living things be selectively impaired? Neuropsychologia, 29, 185–193. Finke, R.A. and Freyd, J.J. (1985). Transformation of visual memory induced by implied motions of pattern elements. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 780–794. Flash, T. (1990). The organization of human arm trajectory control. In J. Winters and S. Woo (Eds.), Multiple muscle systems: Biomechanics and movement organization, pp. 282–301. Berlin: Springer-Verlag. Fodor, J.A. (1980). Fixation of belief and concept acquisition. In M. Piattelli Palmarini (Ed.), Language and learning: The debate between Jean Piaget and Noam Chomsky. London: Routledge and Kegan. Freyd, J.J. (1983). Representing the dynamics of a static form. Memory and Cognition, 11, 342–346. Freyd, J.J. (1987). Dynamic mental representations. Psychological Review, 94, 427–438. Freyd, J.J. and Finke, R.A. (1984). Representational momentum. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 126–132. Freyd, J.J. and Finke, R.A. (1985). A velocity effect for representational momentum. Bulletin of the Psychonomic Society, 23, 443–446. Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609. Gibson, J.J. (1966). The senses considered as perceptual systems. London: George Allen and Unwin. Gibson, J.J. (1979). The ecological approach to visual perception. Hillsdale, NJ: Erlbaum. Gibson, E.J. and Levin, H. (1975). The psychology of reading. Cambridge, MA: MIT Press. Gillam, B. (1971). A depth processing theory of the Poggendorff illusion. Perception and Psychophysics, 10, 211–216. Grieser, D. and Kuhl, P. (1989). Categorization of speech by infants: Support for speech-sound prototypes. Developmental Psychology, 25, 577–588. Grüsser, O.J. (1986). Interaction of efferent and afferent signals in visual perception. A history of ideas and experimental paradigms. Acta Psychologica, 63, 3–21. Helmholtz, H. von (1867). Handbuch der Physiologischen Optik. Leipzig: Voss. [English translation: J.P.C. Southall (Ed. and transl.) A treatise on physiological optics. New York: Dover, 1962.] Heptulla Chatterjee, S., Freyd, J., and Shiffrar, M. (1996). Con1gural processing in the perception of apparent biological motion. Journal of Experimental Psychology: Human Perception and Performance, 22, 916–929. Hertz, H. (1894/1956). The principles of mechanics. (D.E. Jones and J.T. Walley, transl.) New York: Dover. Hoffman, D.D. and Flinchbaugh, B.E. (1982). The interpretation of biological motion. Biological Cybernetics, 42, 195–204. Holst, E. von and Mittelstaedt, H. (1950). Das Reafferenzprinzip. Naturwissenschaften, 37, 464–476. [English translation in: R. Martin (Ed. and transl.) Selected papers of Erich von Holst: The behavioral physiology of animals and man, Vol. 1. London: Methuen, 1973]. Hommel, B., Müsseler, J., Aschersleben, G., and Prinz, W. (2001). The Theory of Event Coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences (in press). James, W. (1906). Psychology. London: Macmillan. Jeannerod, M. and Decety, J. (1995). Mental motor imagery: A window into the representational stages of action. Current Opinions in Neurobiology, 5, 727–732. Johansson, G. (1950). Con1gurations in events perception. Uppsala: Almqvist and Wiksell. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–211. Johansson, G. (1975). Visual motion perception. Scienti1c American, 232, 76–88. Johansson, G. (1976). Spatio-temporal differentiation and integration in visual motion perception. Psychological Research, 38, 379–393. Johansson, G., von Hofsten, C., and Jansson, G. (1980). Event perception. Annual Review of Psychology, 31, 27–63. Jusczyk, P. (1986). Speech perception. In K.R. Boff, L. Kaufmann, and J.P. Thomas (Eds.), Handbook of perception and human performance, Vol. II: Cognitive processes and performance (pp. 1–57). New York: Wiley.
439
aapc21.fm Page 440 Wednesday, December 5, 2001 10:07 AM
440
Common mechanisms in perception and action
Kandel, S., Orliaguet, J.-P., and Viviani, P. (2000). Perceptual anticipation in handwriting: The role of implicit motor competence. Perception and Psychophysics, 62, 706–716. Kawamichi, H., Kikuki, Y., Endo, H., Takeda, T., and Yoshizawa, S. (1998). Temporal structure of implicit motor imagery in visual hand-shape discrimination as revealed by MEG. Cognitive Neuroscience, 9, 1127– 1132. Kelso, J.A.S. (1995). Dynamic patterns: The self-organization of brain and behavior. Cambridge, MA: MIT Press. Kolers, P. and Pomerantz, P. (1971). Figural change in apparent motion. Journal of Experimental Psychology, 87, 99–108. Korte, A. (1915). Kinematoskopische Untersuchungen [Cinematoscopic investigations]. Zeitschrift fuer Psychologie, 72, 194–296. Kozlowski, L.T. and Cutting, J.E. (1977). Recognizing the sex of a walker from a dynamic point-light display. Perception and Psychophysics, 21, 575–580. Kugler, P.N. and Turvey, M.T. (1987). Information, natural law, and the self-assembly of rhythmic movement. Hillsdale, NJ: Erlbaum. Kuhl, P.K. (1987). Perception of speech and sound in early infancy. In P. Salapatek and L. Cohen (Eds.), Handbook of infant perception, Vol. II: From perception to cognition, pp. 274–382. New York: Academic Press. Kuhl, P.K. (1991). Human adults and human infants show a ‘perceptual magnet effect’ for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50, 93–107. Kuhl, P.K. and Meltzoff, A. (1982). The bimodal perception of speech in infancy. Science, 218, 1138–1141. Kuhl, P.K. and Miller, J.D. (1978). Speech perception by the chincilla: Identi1cation functions for synthetic VOT stimuli. Journal of the Acoustical Society of America, 63, 905–917. Kuhl, P.K. and Padden, D.M. (1983). Enhanced discriminability at the phonetic boundaries for the place feature in macaques. Journal of the Acoustic Society of America, 73, 1003–1010. Lacquaniti, F., Terzuolo, C.A., and Viviani, P. (1983). The law relating kinematic and 1gural aspects of drawing movements. Acta Psychologica, 54, 115–130. Liberman, A.M. and Mattingly, I.G. (1985). The motor theory of speech perception revisited. Perception, 21, 1–36. Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. (1967). Perception of speech code. Psychological Review, 74, 431–461. Lieberman, P. (2000). Human language and our reptilian brain: The subcortical bases of speech, syntax, and thought. Cambridge, MA: MIT Press Lotze, R.H. (1852). Medizinische Psychologie oder Physiologie der Seele [Medical psychology or the physiology of the soul]. Leipzig: Weidmann. Mach, E. (1885). Beiträge zur Analyse der Emp1ndungen [English translation: Contributions to the analysis of sensations. La Salle, IL: Open Court (1897)]. Marean, G.C., Werner, L.A., and Kuhl, P. (1992). Vowel categorization by very young infants. Developmental Psychology, 28, 396–405. Massey, J.T., Lurito, J.T., Pellizzer, G., and Georgopoulos, A.P. (1992). Three-dimensional drawing in isometric conditions: Relation between geometry and kinematics. Experimental Brain Research, 88, 685–690. Mateef, S. (1978). Saccadic eye movements and localization of visual stimuli. Perception and Psychophysics, 24, 215–224. Mather, G., Radford, K., and West, S. (1992). Low-level visual processing of biological motion. Proceedings of the Royal Society of London, 249, 149–155. Matin, L. (1986). Visual localization and eye movements. In K.S. Boff, L. Kaufmann, and J.P. Thomas (Eds.), Handbook of perception and human performance, Vol. 1: Sensory processes and perception, pp. 1–45. New York: Wiley. Mays, L.E. and Sparks, D.L. (1981). Saccades are spatially, not retinocentrally, coded. Science, 208, 1163–1165. Michotte, A. (1946). La perception de la causalité [The perception of causality]. Louvain: Publications Universitaires de Louvain. Oram, M. and Perrett, D. (1994). Responses of anterior superior temporal polysensory (STPa) neurons to ‘biological motion’ stimuli. Journal of Cognitive Neuroscience, 6, 99–116. Orliaguet, J.-P., Kandel, S., and Boë, L.J. (1997). Visual perception of cursive handwriting: In2uence of spatial and kinematic information on the anticipation of forthcoming letters. Perception, 26, 905–912. Palmer, S.E. (1978). Fundamental aspects of cognitive representation. In E. Rosch and B.B. Lloyd (Eds.), Cognition and categorization, pp. 259–303. Hillsdale, NJ: Erlbaum. Parsons, L.M. (1987). Imagined spatial transformations of one’s hands and feet. Cognitive Psychology, 19, 178–241.
aapc21.fm Page 441 Wednesday, December 5, 2001 10:07 AM
Motor competence in the perception of dynamic events: a tutorial
Parsons, L.M. (1994). Temporal and kinematic properties of motor behavior re2ected in mentally simulated action. Journal of Experimental Psychology: Human Perception and Performance, 20, 709–730. Parsons, L.M. and Fox, P.T. (1998). The neural basis of implicit movement used in recognising hand shape. Cognitive Neuropsychology, 15, 583–615. Pastore, R.E., Ahroon, W.A., Buffuto, K.A., Friedman, C.J., Puleo, J.S., and Fink, E.A. (1977). Common factor model of categorical perception. Journal of Experimental Psychology: Human Perception and Performance. 4, 686–696. Pellizzer, G. (1997). Transformation of the intended direction of movement during motor trajectories. Cognitive Neuroscience and Neuropsychology, 8, 3447–3452. Perrot, D.R. (1974). Auditory apparent motion. Journal of Auditory Research, 14, 163–169. Poincaré, H. (1905). La Science et l’Hypothèse. Paris: Flammarion. [English translation: Science and hypothesis. New York: The Science Press.] Port, R.P. and van Gelder, T. (1995). Mind as motion: Explorations in the dynamics of cognition. Cambridge, MA: Bradford Book/MIT Press. Restle, F. (1979). Coding theory of the perception of motion con1gurations. Psychological Review, 86, 1–24. Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131–141. Runeson, S. (1974). Constant velocity—not perceived as such. Psychological Research, 37, 3–23. Runeson, S. and Frykholm, G. (1981). Visual perception of lifted weights. Journal of Experimental Psychology: Human Perception and Performance, 7, 733–740. Scheerer, E. (1984). Motor theories of cognitive structure: A historical review. In W. Prinz and A. F. Sanders (Eds.), Cognition and motor processes, pp. 77–97. Berlin: Springer-Verlag. Scheerer, E. (1987). Muscle sense and innervation feelings: A chapter in the history of perception and action. In H. Heuer and A.F. Sanders (Eds.), Perspective in perception and action, pp. 171–194. Hillsdale, NJ: Erlbaum. Schwartz, A.B. (1994). Direct cortical representation of drawing. Science, 265, 540–542. Schwartz, S.P. (1977). Naming, necessity and natural kinds. Ithaca, NY: Cornell University Press. Sechenov, I. (1878). The elements of thought. [English translation in: Selected physiological and psychological works. Moscow: Foreign Languages Publishing House, 1956, pp. 265–401.] Shepard, R.N. (1984). Ecological constraints on internal representation: Resonant kinematics of perceiving, imagining, thinking, and dreaming. Psychological Review, 91, 417–447. Shepard, R.N. and Cooper, L.A. (1982). Mental images and their transformations. Cambridge, MA: MIT Press/Bradford Books. Shepard, R.N. and Zare, S. (1983). Path-guided apparent motion. Science, 220, 632–634. Sherrick, C.E. and Rogers, R. (1966). Apparent haptic movement. Perception and Psychophysics, 1, 175–180. Shiffrar, M. and Freyd, J.J. (1990). Apparent motion of the human body. Psychological Science, 1, 257–264. Shiffrar, M. and Freyd, J.J. (1993). Timing and apparent motion path choice with human body photographs. Psychological Science, 4, 379–384. Shiffrar, M., Lichtey, L., and Heptulla Chatterjee, S. (1997). The perception of biological motion across apertures. Perception and Psychophysics, 59, 51–59. Sirigu, A., Cohen, L., Duhamel, J.-R., Pillon, B., Dubois, B., and Agid, Y. (1995). Congruent unilateral impairments for real and imagined movements. Neuroreport, 6, 997–1001. Sirigu, A., Duhamel, J.-R., Cohen, L., Pillon, B., Dubois, B., and Agid, Y. (1996). The mental representation of hand movements after parietal cortex damage. Science, 273, 1564–1568. Soury, J. (1892). Les fonctions du cerveau [The functions of the brain]. Paris: V.ve Babé, Libraire Editeur. Stricker, W. (1882). Studien über die Bewegungsvorstellungen [Studies on the representation of movement]. Vienna: Hankel. Sumi, S. (1984). Upside-down presentation of the Johansson moving light-spot pattern. Perception, 13, 283–286. Thelen, E. and Smith, L.B. (1994). A dynamic systems approach to the development of cognition and action. Cambridge, MA: Bradford Books/MIT Press. Thomassen, A.J.W.M. and Schomaker, L.R. (1986). Between-letter context effects in handwriting trajectories. In H.S. Kao, G.P. Van Galen, and R. Hoosain (Eds.), Graphonomics: Contemporary research in handwriting, pp. 253–272. Amsterdam: Elsevier. Thornton, I., Pinto, J., and Shiffrar, M. (1998). The visual perception of human locomotion. Cognitive Neuropsychology, 15, 535–552. Viviani, P. and Flash, T. (1995). Minimum Jerk, Two-thirds Power Law, and isochrony: Converging approaches to movement planning. Journal of Experimental Psychology: Human Perception and Performance, 21, 32–53.
441
aapc21.fm Page 442 Wednesday, December 5, 2001 10:07 AM
442
Common mechanisms in perception and action
Viviani, P. and Laissard, G. (1996). Motor templates in typing. Journal of Experimental Psychology: Human Perception and Performance, 22, 417–445. Viviani, P. and Mounoud, P. (1990). Perceptuo-motor compatibility in pursuit tracking of two-dimensional movements. Journal of Motor Behavior, 22, 407–443. Viviani, P. and Schneider, R. (1991). A developmental study of the relationship between geometry and kinematics in drawing movements. Journal of Experimental Psychology: Human Perception and Performance, 17, 198–218. Viviani, P. and Stucchi, N. (1989). The effect of movement velocity on form perception: Geometric illusions in dynamic displays. Perception and Psychophysics, 46, 266–274. Viviani, P. and Stucchi, N. (1992). Biological movements look constant: Evidence of motor-perceptual interactions. Journal of Experimental Psychology: Human Perception and Performance, 18, 603–623. Viviani, P. and Terzuolo, C.A. (1982). Trajectory determines movement dynamics. Neuroscience, 7, 431–437. Viviani, P., Campadelli, P., and Mounoud, P. (1987). Visuo-manual pursuit tracking of human two-dimensional movements. Journal of Experimental Psychology: Human Perception and Performance, 13, 62–78. Viviani, P., Baud-Bovy, G., and Redol1, M. (1997). Perceiving and tracking kinesthetic stimuli: Further evidence of motor–perceptual interactions. Journal of Experimental Psychology: Human Perception and Performance, 23, 1232–1252. Warrington, E.K. and Shallice, T. (1984). Category speci1c semantic impairments. Brain, 107, 829–854. Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung [Experimental studies on the perception of movement]. Zeitschrift für Psychologie, 61, 161–265. Wundt, W. (1893). Grundzüge der physiologischen Psychologie [Foundations of physiological psychology]. Leipzig: Engelmann. Zimmer, A. (1982). Do we see what makes our script so characteristic or do we only feel it? Modes of sensory control in handwriting. Psychological Research, 44, 165–174. Zinchenko, V.P. and Vergiles, N.Y. (1972). Formation of visual images: Studies of stabilized images. New York: Plenum Press Consultants Bureau.
aapc22.fm Page 443 Wednesday, December 5, 2001 10:09 AM
22 Eliminating, magnifying, and reversing spatial compatibility effects with mixed location-relevant and irrelevant trials Robert W. Proctor and Kim-Phuong L. Vu
Abstract. Regardless of whether stimulus location is relevant or irrelevant to a task, responses are faster and more accurate when stimulus location and response location correspond than when they do not. Stimulus– response compatibility (SRC) effects of this nature are robust and typically considered to be automatic consequences of stimulus-response associations that are either hard-wired or acquired through years of experience. An exception to the robustness of SRC effects has been shown to occur when compatible and incompatible mappings are mixed. In the present paper, we review the literature on mixing compatible and incompatible mappings and show that mixing does not always reduce the SRC effect. We then present results from studies in which location-irrelevant (LI) trials are mixed with location-relevant (LR) trials. The SRC effect for LR trials of physical locations to keypresses is eliminated when stimulus color, rather than location, is relevant on half of the trials. However, the SRC effect for LR trials is unaffected by mixing when the location information is conveyed by arrows and ampli1ed when it is conveyed by words. With vocal location responses, the SRC effects for all three stimulus types are enhanced by mixing. In addition, regardless of stimulus type and response modality, mixing intensi1es the correspondence effect for trials on which location is irrelevant when the mapping for LR trials is compatible and eliminates or reverses the effect when the mapping is incompatible. These results show that SRC effects are not as hard-wired as previously depicted and are affected by the demands on subjects imposed by the task environment.
22.1 Introduction Most researchers in cognitive psychology are aware that Attention and Performance II, in 1968, was the Donders Centenary Symposium on Reaction Time, held in honor of F.C. Donders’ pioneering contributions to the use of reaction time (RT) to measure mental processes. The proceedings of this symposium included an English translation of Donders’ (1868/1969) seminal paper, ‘On the speed of mental processes’, in which he outlined his view that the processes intervening between stimulus onset and response can be decomposed into discrete stages, whose durations can be measured. What is not widely known is that Donders reported results demonstrating that some stimulus–response (S–R) relations yield faster selection of responses than do others, a phenomenon that is known today as S–R compatibility (SRC). Donders estimated that the time to choose between responses with the left and right hands was 66 ms when the stimulus was an electrical impulse presented to the left or right foot, and 122 ms when the stimulus was a light of red or white color. He also noted that the estimated choice time for repeating one of two vowel sounds was 56 ms. From these effects, Donders concluded that choice among two response alternatives is faster when the stimuli are paired with their natural responses.
aapc22.fm Page 444 Wednesday, December 5, 2001 10:09 AM
444
Common mechanisms in perception and action
The only author of whom we are aware who acknowledges Donders’ (1868/1969) demonstrations of mode SRC effects is Prinz (1997), who described them as ‘a beautiful set of compatibility experiments—certainly the very 1rst ones in the history of psychology’ (p. 250). Ironically, this eloquent depiction of Donders’ prescience, though included in an edited book devoted to SRC (Hommel and Prinz 1997), is presented in a chapter titled, ‘Why Donders has led us astray.’ The sentiment expressed in Prinz’s title re2ects his belief that too little emphasis has been placed on the role of intention, or task set, in the choice RT literature, a de1ciency he attributes in some degree to Donders not explicitly addressing this topic. Due in part to Prinz’s efforts, this trend has been reversed in recent years, and the experiments we describe in this paper are among an increasing number that emphasize the role of task set. Although Prinz (1997) pointed out Donders’ (1868/1969) work on mode SRC effects, he did not mention that Donders also brie2y described the 1rst instance of spatial mapping effects. Donders noted that, in comparison to the condition in which the assigned response was on the same side as the stimulus, ‘when movement of the right hand was required with stimulation of the left side or the other way round, then the time lapse was longer and errors common’ (p. 421). It was not until 85 years later that the concept of SRC was formalized by Paul Fitts, to whom the proceedings of the 1rst Attention and Performance meeting were dedicated. Thus, at least with regard to SRC, it seems that rather than Donders leading us astray, we should have followed his lead more closely. Fitts and Seeger (1953) used an eight-choice task with three different S–R arrangements to demonstrate that responses were faster when the con1guration of stimulus locations matched that of the response locations. The study of Fitts and Deininger (1954), which also used eight-choice tasks, showed that the mapping of stimuli to responses affected performance: responses were fastest when they corresponded to the stimuli, next fastest when they were systematically related, and slowest when they were unrelated. According to Fitts and Deininger, ‘Compatibility effects are conceived as resulting from hypothetical information transformation processes (encoding and/or decoding) that intervene between receptor and effector activity. The rate of processing information is assumed to be maximum when these recoding processes are at a minimum’ (p. 483). Although Fitts and colleagues used eight-choice tasks, many subsequent studies have used twochoice tasks (e.g. Broadbent and Gregory 1962), as Donders (1868/1969) did in his early work, typically with visual stimuli. A standard 1nding from two-choice tasks is a spatial SRC effect such that when left–right stimulus locations are mapped to left–right keypresses, the mapping of left to left and right to right yields faster responding than the opposite mapping. Consistent with Donders’ 1ndings of mode SRC effects, for two-choice tasks in which the stimuli and responses are left and right, the more natural pairings of spatial–manual and verbal–vocal S–R sets are more compatible than the pairings of spatial–vocal and verbal–manual S–R sets (Proctor and Wang 1997; Wang and Proctor 1996). A third SRC effect, called the Simon effect, is found in two-choice tasks for which stimulus location is irrelevant and the relevant stimulus dimension is non-spatial (e.g., color). Performance is better when the irrelevant stimulus location corresponds with the response location than when it does not (e.g. Lu and Proctor 1995; Simon 1990; Umiltà and Nicoletti 1990). Beginning with Donders (1868/1969), most accounts of SRC effects have emphasized learned associations between stimuli and responses. For example, in answering the question why a choice required less time when repeating vowel sounds versus making manual responses to color, Donders said, ‘The answer is the response given to the sound is the simple imitation which has become natural by training, more so than the conventional response with the right or the left hand in the case of differences in colour’ (p. 421). Donders accounted for performance with the compatible mapping
aapc22.fm Page 445 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
Fig. 22.1 Illustration of the dimensional overlap model by Kornblum et al. (1990). The top route depicts automatic activation of the corresponding response, and the bottom route depicts identi1cation of the assigned response by intentional S–R translation. From ‘Dimensional overlap: Cognitive basis for stimulus–response compatibility—A model and taxonomy’, by S. Kornblum, T. Hasbroucq, and A. Osman 1990. Psychological Review, 97, p. 257. Copyright 1990 by the American Psychological Association.
for S–R locations in a similar manner, stating, ‘The tendency to respond in this way is already present as a consequence of habit or training’ (p. 421). Fitts (1964) also attributed SRC, at least in part, to habit strength, stating, ‘As we progress from tasks low in compatibility to ones of relatively high compatibility, Ss [subjects] are presumably making more and more use of very well-established habits (i.e. using responses which show strong population stereotypes)’ (p. 271). The penchant for attributing SRC effects in large part to learned associations has continued to the present. The most popular accounts of SRC effects currently are dual-route models, in which response activation can occur either automatically or through intentional translation. The most well known exemplar is that of Kornblum, Hasbroucq, and Osman (1990). In their model (see Fig. 22.1), activation of the corresponding response occurs via a direct route whenever a stimulus dimension overlaps with (i.e. is similar to) a response dimension, and this activation is independent of the S–R mapping de1ned for the task. Automatic activation is therefore presumed to facilitate responding when the mapping is compatible and to interfere when it is incompatible. Response identi1cation occurs by way of an indirect route, through retrieval or generation by rule of the assigned response. A related way of characterizing automatic activation is to distinguish activation produced by longterm S–R associations, typically described as learned or hard-wired, from that produced by short-term S–R associations de1ned for the speci1c task (Barber and O’Leary 1997; Umiltà and Zorzi 1997). The long-term associations correspond to the direct response-selection route and the short-term associations to the indirect translation route. Zorzi and Umiltà (1995) implemented the distinction between short- and long-term associations in a model for the Simon effect. Their model is a connectionist network consisting of three groups of interconnected processing nodes (see Fig. 22.2). The position of the imperative stimulus is encoded by position nodes, the value of the relevant stimulus
445
aapc22.fm Page 446 Wednesday, December 5, 2001 10:09 AM
446
Common mechanisms in perception and action
Fig. 22.2 Illustration of a connectionist model for the Simon effect by Zorzi and Umiltà (1995). Position nodes are connected to responses by the long-term memory (LTM) links and feature nodes are connected by short-term memory (STM) links. From ‘The role of LTM links and STM links in the Simon effect,’ by M. Tagliabue, M. Zorzi, C. Umiltà, and F. Bassignani, 2000. Journal of Experimental Psychology: Human Perception and Performance, 26, p. 660. Copyright 2000 by the American Psychological Association. attribute is encoded by feature nodes, and the two response alternatives are represented by response nodes. The position nodes and response nodes are connected by long-term links, and the feature nodes are connected to short-term memory nodes, which in turn are connected to the response nodes by short-term links. In Zorzi and Umiltà’s model, the Simon effect is attributed to the long-term links, consistent with the prevailing view that SRC effects are to a large extent due to pre-experimental associations and, consequently, are relatively immutable. Despite the robustness of SRC effects, occasional exceptions have been reported. Shaffer (1965) mixed compatible and incompatible mappings of location-relevant trials. On each trial, a mapping signal was presented simultaneously with the imperative stimulus (a light in a left or right position), to indicate whether the S–R mapping on the trial was compatible or incompatible. With pure blocks of compatible or incompatible mappings, the SRC effect was 54 ms. However, when compatible and incompatible trials were mixed, the SRC effect was a nonsigni1cant −8 ms. With regard to the Simon effect, Hedge and Marsh (1975) found a performance advantage for trials on which stimulus location did not correspond with response location. Speci1cally, subjects responded to the color (red or green) of a stimulus, which could occur in a left or right location, by moving a hand from a home key to a red or green key located to the left or right. For some conditions, the mapping of stimulus color to key color was incompatible, that is, the green key was mapped to the red stimulus, and vice versa. With this mapping, a reverse Simon effect was evident for which responses were faster when the stimulus location was on the side opposite the response key rather than on the same side. Thus, this ‘Hedge and Marsh’ reversal is a violation of the principle that responses are faster when S–R locations correspond than when they do not. Although these violations of SRC have been known for some time, and both the effects of mixing and the Hedge and Marsh reversal have attracted considerable interest, accounts of SRC effects have continued to emphasize pre-existing associations. The primary concern of the present paper is to examine the role of task-de1ned S–R associations on performance of, for the most part, two-choice reaction tasks by mixing task sets. We restrict consideration mainly to spatial compatibility effects,
aapc22.fm Page 447 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
broadly de1ned to include those obtained when the location information is conveyed by location words, arrow directions, or physical locations. We begin by summarizing prior research for which compatible and incompatible mappings of spatial locations were mixed and the models that have been proposed to explain the results. Then we describe recent research we have conducted using mixed and blocked presentations of location-relevant (LR) and location-irrelevant (LI) trials. We next examine effects obtained when the location information is conveyed by arrow direction or location word, rather than by spatial location. The 1nal set of experiments examines conditions in which the LR stimuli are presented in one mode and the LI stimuli in another mode. Overall, the results of the mixing studies imply that SRC effects are not an automatic consequence of long-term associations, as often assumed, and that the short-term S–R associations de1ned by the task set contribute substantially to the pattern of results obtained.
22.2 Mixing compatible and incompatible mappings When subjects perform a two-choice SRC task, the instructions de1ne the S–R associations for the task. The instructions also specify an identity rule for the compatible mapping and opposite rule for the incompatible mapping, which allows the possibility of generating the correct response to a stimulus by applying the appropriate rule, rather than retrieving the associated response (Duncan 1977b). According to most dual-route models, the long-term associations between each stimulus location and its corresponding response also affect performance. These associations may be in accord with the task-de1ned associations (the compatible mapping) or counter to them (the incompatible mapping). The advantage for compatible mappings is attributed at least in part to the direct activation of the corresponding response produced by the long-term associations. Since Shaffer’s (1965) study, researchers have been concerned with how to characterize performance for situations in which compatible and incompatible mappings are mixed. Duncan (1977b) introduced a method in which four stimulus locations arranged in a row are mapped to four keypress responses (made with the index and middle 1ngers of each hand). In his original study, the two inner stimulus locations had one mapping and the two outer locations the alternative mapping, and mixing slowed RT equally for both mappings. Subsequent studies have also used this condition in which the mappings differ between inner and outer locations, as well as ones in which each mapping is assigned to the two left or two right locations or to alternate stimulus locations. The typical 1nding, however, has been that the mapping effect is reduced but not eliminated when the mappings are mixed (Duncan 1977a, 1978; Ehrenstein and Proctor 1998; Stoffels 1996b). At least three types of explanations have been proposed for the mixing effects: pure translation (translation ef1ciency), two-step response selection, and alternative-routes accounts.
22.2.1 Pure translation (translation ef1ciency) Shaffer (1965) explained the greater effect of mixing on compatible than incompatible spatial mappings as follows: ‘It is as though when I [the mapping] was variable S [the subject] selected at each trial from a set of equally dif1cult transformations: when I was 1xed or known in advance the null transformation could be considered as a special class and was easier to compute’ (p. 287). This pure translation account implies that intentional S–R translation occurs more ef1ciently when all trials are compatible (possibly because an identity rule can be applied) than when the trials are of mixed mappings.
447
aapc22.fm Page 448 Wednesday, December 5, 2001 10:09 AM
448
Common mechanisms in perception and action
22.2.2 Two-step response selection The second explanation, proposed by Duncan (1977a,b, 1978), is explicitly a rule-based translation account. Response selection with mixed mappings occurs in two steps: a decision is made as to whether the mapping is compatible or incompatible, and then the appropriate mapping rule is applied. Duncan’s (1977b) model was based initially on his atypical 1nding of additive effects for mixed-blocked presentation and SRC in the four-choice task version. However, the model can explain the more common 1nding of a reduced SRC effect when mappings are mixed, by assuming that a reduction from four to two choices is relatively more bene1cial for the incompatible mapping than for the compatible mapping. Evidence consistent with the model is that errors tend to be the response that would be correct if the alternative mapping rule were applicable for the trial (Duncan, 1977a, 1978; Ehrenstein and Proctor 1998; Stoffels 1996b).
22.2.3 Alternative-routes model The third account is a version of the dual-route model according to which the direct route contributes to response selection only when the mapping is compatible on all trials (De Jong 1995; Van Duren and Sanders 1988). Response selection occurs by way of the indirect translation route when all trials are incompatible or when compatible and incompatible trials are mixed. An intuitive way to characterize this alternative routes model is that a response can be selected on the basis of initial response tendencies if these are going to be correct, but not if they lead to the wrong response on a signi1cant proportion of trials. The fact that the dominant 1nding has been a signi1cant reduction of the mapping effect under mixed conditions, and that this reduction is evident in Shaffer’s (1965) task version in which the number of stimuli and responses is always two, has led to the alternative routes view being favored. However, Stoffels (1996b) and Ehrenstein and Proctor (1998) have suggested that it is necessary to include rule-based translation along with the alternative-routes model to explain the full range of results. Precuing studies provide evidence that the compatible mapping bene1ts from direct activation when it is known in advance that the trial will be compatible. In Shaffer’s (1965) study, one group of subjects received the mapping stimulus 333 ms before the imperative stimulus. The precued group showed an SRC effect similar to that obtained in the pure mapping conditions, rather than the absence of effect found for the group that was not precued. Stoffels (1996b) and Ehrenstein and Proctor (1998) obtained similar results using variations of Duncan’s (1977b) four-choice procedure. A precue designated two of the four stimuli as possible, with the two stimuli being the compatible subset, the incompatible subset, or a mixed subset (one from the compatible and one from the incompatible subset). In most cases, the bene1t at the longest precuing interval was larger for the compatible subset than for the incompatible subset, indicating that the SRC effect increased in magnitude. If the precue only allowed advance selection of the appropriate mapping, as Duncan’s two-stage model suggests, then the SRC effect should be additive with that of the compatibility manipulation. The larger precuing bene1t for compatible trials than for incompatible trials is consistent with the alternative routes model, assuming that subjects adjust their strategies to use the direct processing route when informed that the mapping will be compatible. Most studies on mixed compatible and incompatible mappings have reported repetition analyses, with the typical 1nding being that repetition of the task category from the preceding trial is bene1cial. Shaffer (1965, 1966) obtained additive bene1ts of repetition for both the mapping signal
aapc22.fm Page 449 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
and the stimulus location in the two-choice task version, with the bene1t being larger for repeating the mapping signal than repeating the stimulus location. He noted, ‘The relative magnitude of these transition effects corresponded to the relative reductions in RT obtained by presenting I [the mapping signal] or M [the stimulus position] in advance’ (p. 287). Stoffels (1996b, Exp. 3) also noted that the pattern of repetition effects in the four-choice task version was similar to the pattern of precuing bene1ts. For pure blocks of one mapping, repetition of the stimulus location was more bene1cial for the incompatible than compatible mapping, as is typically found. However, for mixed blocks, repetition of the task, but without repetition of the speci1c stimulus, was more bene1cial for the compatible mapping than for the incompatible mapping, resulting in an increased SRC effect. An additional bene1t of repeating the same stimulus was also obtained that was of similar magnitude for the compatible and incompatible mappings. Stoffels suggested that the bene1t of task repetition for the compatible mapping is due to the direct, or automatic, response-selection route not being inhibited on repetition trials. In summary, regardless of which task version is used, the SRC effect is reduced, if not eliminated, when compatible and incompatible mappings are mixed. The SRC effect is larger for trials on which the task is repeated than for trials on which it is not. The effect is also reinstated when the mapping is precued suf1ciently far in advance of the imperative stimulus to allow preparation for the cued mapping. On the whole, the 1ndings are most consistent with the view that a direct response-selection route is used (a) for pure blocks of compatible S–R mappings, (b) on task repetition trials in mixed blocks, and (c) when the mapping is precued in advance in mixed blocks.
22.3 Mixing location-relevant and location-irrelevant trials We have conducted experiments that focus on mixing compatibly or incompatibly mapped LR trials with LI trials. Stimulus color is used to designate whether location is relevant or irrelevant on each trial. Mixing LR and LI trials allows issues to be examined that cannot be through mixing compatible and incompatible LR trials. One is whether the SRC effect is eliminated by inclusion of LI trials, or whether elimination only occurs when both tasks require responding to spatial location but with different mappings. If the SRC effect is eliminated, other manipulations can be used to determine why. Another issue is the extent to which the LR mapping intrudes on performance when location is irrelevant. The contributions to performance made by the task-de1ned and long-term associations of S–R locations on the LI trials can be evaluated. Results from our experiments, described below, show that by mixing LR and LI trials, it is possible to eliminate and magnify the SRC effects for both trial types, as well as to reverse the effect for the LI trials. A characterization of the task set in terms of associations for the conditions in which LR and LI trials are mixed is as follows. When stimulus location is relevant, the instructions specify short-term associations between stimulus locations and responses that are either consistent (compatible mapping) or inconsistent (incompatible mapping) with the long-term associations. Hence, according to most dual-route models, the advantage for compatible mappings is due at least in part to direct activation of the corresponding response produced by the long-term associations. For a Simon task, the taskde1ned associations relate the non-spatial stimulus dimension to the response locations. Thus, when stimulus location is relevant on some trials and irrelevant on others, task-de1ned associations of both stimulus dimensions to responses must be maintained in an active state. The questions of interest are whether the associations de1ned for one task intrude on performance of the other task and whether the contribution of the long-term associations is altered.
449
aapc22.fm Page 450 Wednesday, December 5, 2001 10:09 AM
450
Common mechanisms in perception and action
22.3.1 Mapping effects for location-relevant trials We have conducted experiments that used mixed presentation of physical location stimuli in which the LR mapping was compatible or incompatible (Marble and Proctor 2000, Expts. 1 and 4 Proctor and Vu 2001, Exp. 1; Proctor, Vu, and Marble in press, Exp. 1). Stimuli were 1lled circles presented in left or right locations of the display screen, and responses were left and right keypresses (see Fig. 22.3). Half of the trials were LR (white circles) and half LI (red or green circles). Proctor et al.’s and Proctor and Vu’s studies also included pure LR conditions, with the mapping being compatible or incompatible. For the pure conditions, performance was better with the compatible than incompatible mapping; however, for the mixed conditions, there was no SRC effect for the LR trials (see Table 22.1). Thus, the two-choice SRC effect is eliminated when LI and LR trials are mixed, as it is for mixed compatible and incompatible mappings (Shaffer 1965). Proctor and Vu’s (2001, Exp.1: physical location stimuli) data were submitted to a task repetition analysis on LR trials that partitioned the trials into three types: task repetition of the same or different S–R pair, and task nonrepetition. Responses were slowest for nonrepetition trials, in which the previous task was LI. For task repetition trials, responses were faster when the same S–R pair was repeated than when it was not. When the mixed conditions were compared to conditions of pure LR trials, there was a signi1cant interaction of repetition type (same or different), condition, and compatibility. For the pure condition, the incompatible mapping showed a 25 ms repetition bene1t for the same S–R pair, but the compatible mapping showed a repetition cost of 25 ms. Although this cost of repetition for the compatible mapping seems surprising, Shaffer (1965) reported a similar repetition cost when the compatible and incompatible mappings were mixed. For the mixed condition (Proctor and Vu 2001), the task repetition bene1t was similar for the compatible and incompatible mappings. However, the compatible mapping showed less bene1t of a task repetition when the alternative stimulus was presented than did the incompatible mapping (Mean Differences (MDs) = 85 and 124 ms, respectively). This difference was offset by the compatible mapping showing a larger additional bene1t than the incompatible mapping for
Table 22.1 Mean SRC effect for reaction time (in milliseconds) and percentage of error (in parentheses) for LR trials as a function of experiment and condition Experiment
Condition Mixed
Left–right codes on both location-relevant and irrelevant trials Marble and Proctor (2000, Exp. 1) −8 (−1.95%) Marble and Proctor (2000, Exp. 4) −6 (−0.55%) Proctor, Vu, and Marble (in press, Exp. 1) −16 (−1.17%) Proctor and Vu (2001, Exp. 1) −16 (−1.67%) Proctor, Vu, Marble (in press, Exp. 3, mixed) −1 (0.10%) Left–right codes on only location-relevant trials Proctor, Vu, Marble (in press, Exp.2) Proctor, Vu, Marble (in press, Exp. 4)
54 (0.73%) 71 (1.20%)
Pure – – 68 (1.41%) 77 (2.11%) 117 (0.20%)
56 (1.61%) –
aapc22.fm Page 451 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
Marble & Proctor (2001; Exps. 1&4) Proctor et al. (in press; Exp.1) Proctor & Vu (2001; Exp.1)
Proctor et al. (in press; Exp.2)
Proctor et al. (in press; Exp.3)
Proctor et al. (in press; Exp.4)
Fig. 22.3 Components of the overall task partitioned by location-relevant (white stimuli) and location-irrelevant (colored stimuli) trials as a function of experiment. * Participants are responding to color of stimuli with spatial location irrelevant. As shown in the figure, participants press the left key if the circle is red and the right key if it is green. Although not illustrated, half the subjects responded to the red stimuli by pressing the right key and green stimuli by pressing the left key. ** For half the participants, white circles appeared in the top row and colored ones in the bottom row, as illustrated. The other half received the colored circles in the top row and white ones in the bottom row.
451
aapc22.fm Page 452 Wednesday, December 5, 2001 10:09 AM
452
Common mechanisms in perception and action
repeating the stimulus as well as the task (MDs = 124 ms and 68 ms, respectively). The greater task repetition bene1t for the incompatible mapping when the stimulus location changed could be due to subjects being primed for ‘opposite’ with that mapping but ‘same’ with the compatible mapping. The elimination of the SRC effect when LI trials are mixed with the LR trials is consistent with the view of the alternative routes hypothesis that compatible mappings bene1t from activation by way of the direct route only when all trials are compatible. From this view, there are several possible reasons why mixing LI trials with LR trials could eliminate the contribution of the direct route. First, the requirement of having to respond to color on some trials, and thus to maintain a task set of color-to-response associations as well as location-to-response associations, may be the crucial factor. Second, the contribution of the direct route may be reduced because the two types of stimuli occur in the same physical locations, thus precluding a location distinction being used as a basis for immediate responding. Third, left and right codes are generated for both trial types, meaning that the presence of a left or right location code is not suf1cient for signaling the response. Experiments 2–4 of Proctor et al. (in press) evaluated these possibilities (see Fig. 22.3). In Proctor et al.’s (in press) Experiment 2, the LI stimuli were displayed in the center of the screen instead of left–right locations. In this case, the mixed condition showed an SRC effect similar to that obtained in the pure condition. Thus, the requirement to respond to color is not the critical factor. In Experiment 3, stimuli were presented in left–right locations above or below 1xation. For the pure LR conditions, subjects responded to the left or right stimulus location with a compatible or incompatible mapping, ignoring the top–bottom distinction. This condition yielded an SRC effect of 117 ms. The mixed–random condition was similar to the mixed conditions of the previous experiments in that the LR and LI stimuli could occur in any location. This condition yielded a nonsigni1cant 32 ms SRC effect. Of most importance was another mixed condition, in which the LR stimuli occurred in one row and the LI stimuli in the other, that showed no SRC effect (MD = − 1 ms). Thus, even when the stimuli for the two tasks occur in distinct locations, the SRC effect is eliminated if both trial types produce left–right location codes. Experiment 4 of Proctor et al.’s (in press) study presented LR stimuli in left–right positions and LI stimuli in top–bottom positions. With this method, a location code is generated for both trial types, but the LR code is left–right and the LI code is top–bottom. In this case, the presence of a left–right code provides a suf1cient basis for responding because it is only present on LR trials. Consequently, we predicted that a typical SRC effect would be evident. The results supported this prediction, showing a 71-ms advantage for the compatible mapping over the incompatible mapping. Together, Proctor et al.’s Experiments 3 and 4 indicate that a distinct location code (top or bottom) for the LR and LI tasks is insuf1cient to allow relatively rapid responding with the compatible mapping if left–right location codes are present for both tasks. Only when the LI task does not activate left–right codes is the SRC effect evident for the LR task. Overall, the results obtained with physical location stimuli indicate that the SRC effect is eliminated when LR and LI trials are mixed, as when compatible and incompatible mappings are mixed (Shaffer 1965). Consequently, they can be interpreted in terms of the alternative routes model, according to which the direct route is not a factor for situations in which it would lead to an incorrect response. This interpretation is that the direct route is used for compatible mappings when a location code provides a suf1cient basis for responding, but not when the same location codes occur for both of the mixed tasks. However, this picture is complicated by experiments, described later, that used left and right pointing arrows and left and right location words.
aapc22.fm Page 453 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
22.3.2 The Simon effect for location-irrelevant trials When LR and LI trials are mixed, the primary question regarding the LI trials is how the magnitude and direction of the Simon effect are in2uenced by the LR mapping. Our experiments (Marble and Proctor 2000; Proctor and Vu 2001; Proctor et al. in press) also allowed this question to be answered (see Table 22.2). With the compatible LR mapping, the Simon effect averaged 39 ms across the experiments. With the incompatible LR mapping, a reverse Simon effect averaging −53 ms was obtained. Thus, relative to the pure blocks of LI trials, the Simon effect is enhanced when the LR trials are compatibly mapped and reversed when they are incompatibly mapped. Since the reverse Simon effect was as large as the positive Simon effect, there apparently was no direct activation of the corresponding response, independent of the task mapping, from long-term associations. This is because any activation of the corresponding response would add to the positive effect in the mixed compatible condition and subtract from the reversed effect in the mixed incompatible condition. Marble and Proctor (2000) reported a task repetition analysis on the RT data for LI trials that showed responses to be 104 ms faster when the trial was a task repetition than when it was not. The task repetition effect had negligible impact on the positive and reverse Simon effects. When the LR mapping was compatible, the positive Simon effect was 46 ms on repetition trials and 47 ms on nonrepetition trials. When the LR mapping was incompatible, the reverse Simon effect was 60 ms on repetition trials and 78 ms on nonrepetition trials. Mordkoff (1998), Leuthold et al. (1999), and Valle-Inclán, Hackley, and de Labra (2002; this volume, Chapter 23) reported that the Simon effect occurs when the previous trial is corresponding, but not when it is noncorresponding. Mordkoff and Leuthold et al. attributed this pattern of results to the direct route being suppressed following a noncorresponding trial. A repetition analysis of the pure LI condition of Marble and Proctor’s Experiment 1 showed similar results: the Simon effect was positive when the S–R locations corresponded on the preceding trial (M = 73 ms) and negative when they did not (M = −31 ms). For the mixed condition, the task repetition trials showed a similar, but stronger, pattern. When the LR mapping was compatible, the Simon effect was 103 ms following a corresponding trial and –27 ms following a noncorresponding trial. When the LR mapping was incompatible, the Simon effect was 17 ms following a corresponding trial and −144 ms following a noncorresponding trial. The magnitude of the difference in effects following corresponding and noncorresponding trials did not depend signi1cantly on whether the LR mapping was compatible or incompatible. The 1nding of a Table 22.2 Mean Simon effect for reaction time (in milliseconds) and percentage of error (in parentheses) for LI trials as a function of experiment and condition Experiment
Condition Pure
Mixed compatible
Mixed incompatible
Marble and Proctor (2000, Exp. 1) Marble and Proctor (2000, Exp. 4) Proctor, Vu, and Marble (in press, Exp. 1) Proctor and Vu (2001, Exp. 1)
21 (2.8%) 15 (2.5%) – –
44 (8.0%) 26 (2.0%) 42 (4.4%) 42 (5.3%)
−64 (−7.5%) −42 (−7.0%) −61 (−5.5%) −44 (−5.4%)
Average
18 (2.7%)
39 (4.9%)
−53 (−6.4%)
453
aapc22.fm Page 454 Wednesday, December 5, 2001 10:09 AM
454
Common mechanisms in perception and action
reverse Simon effect in the pure and mixed compatible conditions does not conform with the suppression account because it predicts that any effect in these conditions should be positive. One interpretation of these experiments is that the S–R associations de1ned by the LR mapping are applied on the LI trials. This process could be under the participant’s control or occur relatively involuntarily as a function of the task requirements. This issue can be evaluated by precuing the task prior to presentation of the imperative stimulus (e.g. De Jong 1995; Shaffer 1965, 1966). In Marble and Proctor’s (2000) Experiment 2, an incompatible mapping was used for the LR task. The task was precued with 100% validity by the word ‘COLOR’ or ‘SPACE’ presented at the center of the screen at a stimulus onset asynchrony (SOA) of 150, 300, 600, 1200, or 2400 ms. The precue was effective, with RT being faster at the longest SOA than at the shortest for both cued LI and LR trials. However, the reverse Simon effect for LI trials was evident at all SOAs. Thus, when cued by as much as 2400 ms before the presentation of the imperative stimulus, subjects were not able to prevent application of the LR mapping. When the LR and LI tasks are mixed in equal number, the corresponding response is correct on 75% of the trials and the noncorresponding response on 25% of the trials if the LR mapping is compatible, and vice versa if it is incompatible. This relation suggests that the in2uences of the LR mapping on the Simon effect could be due to a bias to respond consistent with the mapping. This possibility seems plausible because the Simon effect and related effects of irrelevant information reverse when incongruent trials are more frequent than congruent trials (Greenwald and Rosenberg 1978; Hommel 1994; Logan 1980; Logan and Zbrodoff 1979; Toth et al. 1995). Marble and Proctor’s (2000) Experiment 4 evaluated the biasing possibility by measuring performance in pure blocks of LI trials for which the response was corresponding for 75% of the trials and non-corresponding for 25%, or vice versa. Performance on the pure blocks was compared with the mixed conditions in which the LR mapping was compatible or incompatible. Relative to the Simon effect for a baseline LI condition in which the S–R locations corresponded on 50% of the trials (MD = 15 ms), the Simon effect was enhanced when the corresponding relation predominated and reversed when the noncorresponding relation did. But there were differences in the result patterns for the pure LI trials and the mixed trials. Speci1cally, for the frequency manipulation in the pure blocks the reverse Simon effect when noncorresponding trials predominated (MD = −36 ms) was smaller than the Simon effect when corresponding trials predominated (MD = 58 ms), as in Hommel’s (1994) and Toth et al.’s (1995) studies. However, in the mixed conditions, the reverse Simon effect obtained with an incompatible LR mapping (MD = −42 ms) was at least as large as the positive Simon effect obtained with a compatible mapping (MD = 26 ms). Thus, although biasing may contribute to the results obtained with mixed presentation, it is not the sole factor. Another difference between the pure and mixed blocks in Marble and Proctor’s (2000) Experiment 4 was apparent when the LI trials were divided into two groups based on whether the S–R locations corresponded on the previous trials (for the mixed condition, previous trial type was not a factor). For the pure Simon trials, responses were 40 ms faster when the correspondence relation on the present trial (corresponding or noncorresponding) was a repetition of that for the previous trial than when it was not. This difference was independent of the relative frequency manipulation. In contrast, for the mixed conditions, the bene1t of correspondence repetition depended on the LR mapping. With the compatible mapping, a large bene1t of repeating the correspondence relation was evident for the noncorresponding trials, but not for the corresponding trials. However, when the LR mapping was incompatible, a large bene1t of repeating the correspondence relation was evident for the corresponding trials, but not for the noncorresponding trials. Thus, in contrast to the pure
aapc22.fm Page 455 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
Simon blocks, the bene1t for repeating the correspondence relation for mixed blocks occurred only on those trials for which the spatial S–R relation was opposite to that in effect on LR trials. More detailed analyses that partitioned the repetition trials for the mixed condition according to whether the previous trial was LR or LI showed that the asymmetric pattern described above was due primarily to a substantial cost of task switching. This cost is re2ected in the RTs for a change from corresponding to noncorresponding trials with the compatible mapping and from noncorresponding to corresponding trials with the incompatible mapping. With the incompatible mapping, the reverse Simon effect was approximately twice as large on the task switch trials (−68 ms) as on the task repetition trials (−36 ms), but it was evident even when the task was repeated. Thus, the reversal is not simply due to re-applying the opposite rule from the preceding trial. Additional evidence that bias is not the sole factor and that subjects do not have complete control over application of the task-de1ned associations comes from studies in which subjects practiced with an incompatible location mapping and then were transferred to a Simon task. Proctor and Lu (1999, Exp. 2) had subjects practice three sessions responding to the location in which the letter H or S appeared with an incompatible mapping. In a fourth session, subjects were transferred to a Simon task in which one of the letters was assigned to the left response and one to the right response. A signi1cant reverse Simon effect was evident in both the RT (−14 ms) and error data (−2.0%). Because all of the transfer trials were LI, the reversal could not be due to a response bias induced by differential frequencies of the compatible and incompatible location relations. The reverse Simon effect in the transfer session implies that the task-de1ned associations between incompatible S–R locations continue to be activated even though they are no longer relevant to the task. In Proctor and Lu’s (1999) Experiment 2, the same stimuli were used for both the LR practice trials and the LI transfer trials. Therefore, it is impossible to tell whether the location associations are linked to the speci1c stimulus set that was used. In their Experiment 3, one group of subjects performed in practice and transfer sessions with the letter stimuli, as in Experiment 2, and another group with color stimuli during the practice session and letters during the transfer session. The reverse Simon effect was of similar magnitude for the two conditions (−25 ms in the former and −30 ms in the latter). Thus, the location associations are independent of the symbolic characteristics of the stimuli for which the incompatible mapping is practiced. Tagliabue et al. (2000) conducted similar experiments, but with subjects receiving the initial LR task for only a single short session. When the transfer session was conducted without delay after practice with the incompatible mapping, no Simon effect was apparent. In other experiments, a 24-hour or 7-day delay was introduced, and the Simon effect still was absent, regardless of whether the stimulus properties were the same as those in the practice condition. The fact that the Simon effect did not reverse after practice with the incompatible mapping in Tagliabue et al.’s study is likely due to the use of fewer practice trials than in Proctor and Lu’s (1999) study. Tagliabue et al. 1t a modi1ed version of Zorzi and Umiltà’s (1995) connectionist model to their data by allowing practice to in2uence either the short-term links or the long-term links. Based on goodness of 1t, they concluded that the short-term links, rather than the long-term links, were affected by practice with the incompatible mapping and continued to exert their in2uence in the transfer session. In summary, the effect of correspondence between S–R locations on LI trials varies as a function of the LR mapping. When that mapping is compatible, the Simon effect is enhanced; when it is incompatible, the Simon effect is reversed. Repetition analyses show that the in2uence of the LR mapping is not restricted solely to trials on which the preceding trial was LR. A bias toward the LR S–R relations may play a role in the mixing effects, but it is not the major factor. The in2uence of the
455
aapc22.fm Page 456 Wednesday, December 5, 2001 10:09 AM
456
Common mechanisms in perception and action
LR mapping on performance occurs when the LI task is precued well in advance, as well as when the LI task is performed alone after practicing the LR task. Thus, the task-de1ned associations between stimulus and response locations exert a substantial effect on performance even when the subject is aware that they are irrelevant to the current task.
22.4 Spatial information in symbolic and verbal modes Spatial information can be signaled not only by physical locations, but also by left–right pointing arrows and ‘left’–‘right’ words. An SRC effect is obtained for both; the mapping of left stimulus to left keypress and right stimulus to right keypress yields better performance than the alternative mapping (Wang and Proctor 1996). When vocal left–right responses are compared with keypresses, the pairings of physical locations or arrow directions with keypress responses and words with vocal responses are more compatible than the opposite pairings (Wang and Proctor 1996). When irrelevant location information is conveyed by arrow directions or location words, Simon effects are obtained, although the effect is small for words (Baldo, Shimamura, and Prinzmetal, 1998; Barber and O’Leary 1997; Lu and Proctor 2001). Other evidence also suggests that arrows tend to automatically activate their corresponding responses (Eimer 1995). Thus, with left–right keypresses, spatial locations are the most compatible stimulus mode and location words the least compatible mode (Wang and Proctor 1996).
22.4.1 Mixing compatible and incompatible mappings We recently conducted an experiment similar to Shaffer’s (1965) study, in which all trials were LR and the responses were left–right keypresses (Vu and Proctor 2001, Exp.1). As in his study, compatible and incompatible mappings were mixed, but conditions were examined in which the location information was conveyed by arrow direction and location word, as well as physical location. Trial type was signaled by stimulus color, with red signaling the compatible mapping and white the incompatible mapping, or vice versa. For physical locations, the SRC effect was 71 ms for the pure condition, but a nonsigni1cant 5 ms when the two mappings were mixed, replicating Shaffer’s 1ndings. Arrow directions showed similar results, with the SRC effect being 80 ms in the pure condition and 3 ms in the mixed condition. However, for location words the SRC effect increased from 35 ms in the pure condition to 117 ms in the mixed condition. Vu and Proctor (2001) performed repetition analyses for the pure and mixed conditions (see Table 22.3). For the pure condition, physical locations showed a slight cost for repeating the same S–R pair for both compatible and incompatible mappings. In contrast, arrows and words showed bene1ts for repeating the same S–R pair for both mappings. For the mixed conditions, all stimulus types showed similar patterns of results: RT was faster for task-repetition than task-nonrepetition trials. Physical locations and arrows showed little SRC effect when the stimulus position changed, regardless of whether the trial was a task-repetition or nonrepetition. However, when the stimulus position repeated, the compatible mapping bene1ted relative to the incompatible mapping when the task repeated but showed a cost when the task changed. Words showed a similar pattern, but superimposed on the large overall bene1t for the compatible mapping. De Jong (1995, Exp. 1) also used a method similar to Shaffer’s (1965), with the location information conveyed by an upright arrow tilted to the left or right. At the shortest SOA of 100 ms between the mapping and imperative stimuli, the SRC effect was reduced to 18 ms from a value of 45 ms obtained in pure blocks of only one mapping. As Shaffer found for physical locations, presenting
aapc22.fm Page 457 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
Table 22.3 Mean reaction time (in milliseconds) for pure and mixed compatible and incompatible mappings of Vu and Proctor’s (2001) study as a function of experiment, stimulus mode, and repetition type Condition Stimulus mode:
Repetition type Nonrepetition Same
Nonrepetition Different
Task repetition Different
Task repetition Same
Comp
Comp
Incomp
Comp
Incomp
Comp
Incomp
– – –
– – –
312 374 465
388 462 504
325 367 445
394 440 476
665 773 781
664 776 909
572 633 685
578 646 829
474 498 568
520 549 708
– – –
– – –
– – –
434 486 505
464 535 627
436 483 499
455 518 635
753 768 754
841 820 738
901 874 811
716 711 690
736 740 773
593 613 619
632 670 721
Incomp
Experiment 1: Keypress responses Pure Physical locations – – Arrow directions – – Location words – – Mixed Physical locations 647 606 Arrow directions 733 664 Location words 728 796 Experiment 2: Vocal responses Pure Physical locations – Arrow directions – Location words – Mixed Physical locations 793 Arrow directions 806 Location words 719 Note:
Comp = Compatible; Incomp = Incompatible
the mapping stimulus 600 ms prior to the imperative stimulus restored the SRC effect. Repetition analyses showed that the reduction of the SRC effect at the 100-ms SOA was much stronger when the trial mapping changed than when it did not. De Jong (1995) included a manipulation of the relative proportion of compatible and incompatible trials. At the 10-ms SOA, the SRC effect was numerically larger when 67% of the trials were compatible (29 ms) than when 50% of the trials were compatible (23 ms). When only 33% of the trials were compatible, the SRC effect was reduced to 6 ms. For none of these conditions was the effect at the 10-ms SOA as large as the 41 ms effect obtained for all frequency conditions at the 800-ms SOA. Thus, mixing reduced the SRC effect, and the bias induced by the relative frequency manipulation acted to increase and decrease the base effect. This is similar to the results obtained when relative frequency of corresponding and noncorresponding trials is manipulated for the Simon task (Hommel 1994; Toth et al. 1995). De Jong interpreted this pattern of results in terms of the alternative routes model, proposing that the degree of suppression of the automatic route is an increasing function of the percentage of incompatible trials in the sequence.
457
aapc22.fm Page 458 Wednesday, December 5, 2001 10:09 AM
458
Common mechanisms in perception and action
Vu and Proctor’s (2001) Experiment 2 was similar to their Experiment 1, but with vocal ‘left’– ‘right’ responses instead of keypresses. The SRC effect for location words was reduced from 131 ms in the pure condition to 75 ms in the mixed condition, instead of being enhanced. Physical locations and arrows showed similar trends: for pure conditions, the SRC effect was 24 ms for physical locations and 42 ms for arrows, and for mixed conditions, the effect was 18 and 26 ms, respectively. Repetition analyses for the pure condition showed that, for all stimulus types (physical locations, arrows, and words) there was little difference between repeating the same or different S–R pair for both mappings (see Table 22.3). For the mixed condition, for both task repetitions and nonrepetitions, all stimulus types showed a bene1t for the compatible mapping when the stimulus position changed. With physical locations and arrows, the SRC effect was larger when the task changed than when it repeated; also, when the stimulus position repeated, the compatible mapping bene1ted when the task repeated but showed a cost when the task changed. Words showed a similar pattern, but superimposed on a large overall bene1t for the compatible mapping. The reduction of SRC effects for mixed mappings of words to vocal responses is not restricted to location words. Van Duren and Sanders (1988) conducted a study similar to our word condition, but using vocal numeral responses to digits: subjects responded to the digits 2 and 3 by naming them and the digits 4 and 5 by naming the opposite member of the pair. When presented in pure blocks of compatible or incompatible mappings, a 90 ms SRC effect was obtained. Mixing the two mappings slowed only the responses for the compatible mapping, reducing the SRC effect to 25 ms. Morin and Forrin (1962) conducted a similar study in which digit names were spoken in response to digit stimuli or shapes. The correct response to a digit stimulus was always its name, making this mapping compatible. Because digit names were arbitrarily assigned to the shapes, this task can be considered an ‘incompatible’ mapping. The difference between the incompatible conditions used by Morin and Forrin and by Van Duren and Sanders (1988) is that the shape stimuli are related to the digit names only by task-de1ned associations, whereas the digit stimuli have long-term associations to their corresponding responses. Mixing the tasks had virtually no effect on the incompatible shape naming task but slowed the compatible digit-naming task considerably. Forrin and Morin (1967) proposed a model to explain their mixing effects that combines the two-step model and the alternative routes model (see Ehrenstein and Proctor 1998, for detailed discussion of the model). To test the model, they conducted an experiment in which they varied the number of numerals and shapes. Their primary prediction from the model was that the set size for the alternative trial type would not have an in2uence on RT for a particular trial because after the appropriate category was selected, the response-selection route appropriate to the task would be used. However, there was a signi1cant 10-ms effect of set size for the shapes on numeral naming RT, which Forrin and Morin interpreted as counter to the model. Forrin and Morin also included blocks of trials in which the stimulus subset was precued 1 s prior to the imperative stimulus. Consistent with the studies of SRC, the precue reduced the effect of mixing by 14 ms, with this effect being independent of set size for digit stimuli but not for the shape stimuli. Forrin (1975) reported an experiment in which the stimuli were letters and digits, all of which were to be named. For pure lists, RT was 17 ms shorter for digit-naming than letter-naming. However, for mixed presentation, RT to digits increased 29 ms, whereas that to letters increased only 6 ms, making the RTs for the two categories similar in magnitude. This 1nding indicates that the condition that yields the shortest RT in pure blocks can be slowed by mixing, even when all responses are highly compatible and consistent with the long-term associations. This seems to be problematic for the alternative routes interpretation of the mixing effects because there is no obvious reason why the
aapc22.fm Page 459 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
direct route could not continue to be used under mixed conditions when the response triggered by each stimulus should be correct. Repetition analyses showed a 7-ms bene1t for category repetition and no additional bene1t for repetition of the speci1c stimulus. However, Marcel and Forrin (1974) obtained both category and item repetition bene1ts in the digit-letter naming task at response–stimulus intervals of 300 and 1600 ms, but only a category repetition effect at the 2900 ms interval. More importantly, they showed that this category repetition effect was eliminated by precuing the appropriate category 1000 ms prior to the imperative stimulus. In summary, when the location information was conveyed by arrow direction, results similar to those for physical locations were obtained: mixing increased RT more for compatible than for incompatible mappings, and this effect was reduced by precuing the mapping. In addition, the SRC effect was reduced less when the compatible mapping was repeated than when the preceding trial was incompatible. For location words, mixing reduced the SRC effect when the responses were vocal, but enhanced it considerably when the responses were keypresses. The studies that used digit/letter or digit/shape stimuli and vocal responses showed mixing, repetition, and precuing effects similar to those obtained in the studies of spatial SRC. These studies provide the additional information that mixed presentation reduces the bene1t of a compatible mapping even when the mixed stimuli have no long-term associations to responses from the same category and when they are also compatible but produce slower responses under blocked presentation.
22.4.2 Mapping effects for location-relevant trials mixed with location-irrelevant trials We conducted experiments with mixed LI and LR trials in which the location information for both trial types was arrow direction, location word, or physical location (Proctor and Vu 2001; Proctor et al. 2000). Proctor and Vu’s Experiment 1 used conditions in which the relevant location mapping was either compatible or incompatible. The physical location stimuli showed the pattern of results described previously, a 77 ms SRC effect with pure presentation of LR trials and a nonsigni1cant −16 ms effect with mixed presentation. When the location information was conveyed by arrows, an SRC effect of 42 ms was obtained with mixed presentation that was similar to the 32 ms effect with pure presentation. De Jong (1995) similarly obtained an SRC effect of 33 ms when the location information was conveyed by a left or right tilting upward pointing arrow and color was the relevant dimension on LI trials. Our study showed that for location words, the SRC effect was larger with mixed presentation (M = 172 ms) than with pure presentation (M = 21 ms), as was found when compatible and incompatible LR mappings were mixed. Thus, mixing location mappings with color mappings did not reduce the SRC effect for arrow directions or words, as it does for physical locations, and even sharply increased the effect for location words. Task repetition analyses for LR trials similar to those described previously for physical location stimuli were performed for the arrows and words (see Table 22.4). The ordering of repetition types for these stimuli was similar to that for physical locations: responses were slowest for nonrepetition trials, intermediate for task repetition–different, and fastest for task repetition–same. For words, but not arrows, this effect was quali1ed by an interaction with compatibility. This interaction for words was mainly due to the difference between the repetition and nonrepetition trials being smaller for the compatible (MD = 121 ms) than incompatible mapping (MD = 265 ms). When the task was repeated, the pure blocks for arrow stimuli showed little advantage of repeating the identical S–R pair from the previous trial (MD = 10 ms), in contrast to the bene1t shown in the mixed blocks (MD = 77 ms). Words showed a similar pattern, with the advantage for repeating the same S–R pair being less in the pure blocks (MD = 54 ms) than in the mixed blocks (MD = 126 ms).
459
aapc22.fm Page 460 Wednesday, December 5, 2001 10:09 AM
460
Common mechanisms in perception and action
Proctor and Vu’s (2001) Experiment 2 was similar to their Experiment 1, except that vocal ‘left’– ‘right’ responses were used. For all stimulus modes, the SRC effect was larger with mixed presentation of LR and LI trials than with pure presentation of LR trials, being larger for words (182 ms) than for arrows (67 ms) or physical locations (65 ms). A repetition analysis was also conducted (see Table 22.4). All stimulus types showed the same pattern of results. Responses were slowest on nonrepetition trials, intermediate on task repetition–different and fastest on task repetition–same. In addition, there was little repetition bene1t for the speci1c item in pure conditions (MDs = 6 ms for physical locations, 4 ms for arrows, and 13 ms for words). Thus, all three stimulus types showed similar patterns of repetition effects, including a large task-repetition bene1t. Across Proctor and Vu’s (2001) two experiments, the results indicate that whenever there is a verbal component to the task, either for the stimuli or responses, mixing magni1es the SRC effect. A possible explanation for why mixing increases the SRC effect when the task mode is verbal is as follows. For pure compatible mappings, the name of the stimulus is the correct response, and it is activated easily and executed rapidly. For pure incompatible mappings, the stimulus name is strongly associated with the assigned location response (e.g. ‘right’ is the highest associate of ‘left’; Proctor et al. 2000). Thus, subjects can generate the assigned response relatively quickly by being prepared to respond with the highly associated alternative location. For the mixed conditions, response selection for both mappings is mediated, at least in part, by 1rst naming the stimulus. With the compatible mapping, the response consistent with this name can be emitted quickly once the subject decides that location is relevant. With the incompatible mapping, the subject is not prepared to emit the assigned response but must engage in the additional time-consuming process of generating the correct response by applying an ‘opposite’ rule to the stimulus-location name.
22.4.3 Simon effects for location-irrelevant trials mixed with location-relevant trials With keypress responses, the arrow stimuli showed a response pattern for LI trials similar to that obtained with physical locations. The Simon effect was 75 ms when the LR mapping was compatible and −37 ms when it was incompatible. The major difference is that the reverse Simon effect
Table 22.4 Mean reaction time (in milliseconds) for location-relevant trials for mixed conditions of Proctor and Vu’s (2001) study as a function of experiment, stimulus mode, and repetition type Stimulus mode
Repetition type Nonrepetition
Task-repetition alternative
Complete repetition
Experiment 1: Keypress responses Physical locations 738 Arrow directions 714 Location words 935
633 585 806
538 508 680
Experiment 2: Vocal responses Physical locations 770 Arrow directions 688 Location words 726
686 609 688
602 553 628
aapc22.fm Page 461 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
obtained with arrows was not as large as the positive Simon effect, whereas for physical locations the negative effect is at least as large as the positive effect. De Jong (1995, Exp. 3), using left or right upward pointing arrow stimuli for mixed LR and LI trials, obtained an even stronger asymmetry. The Simon effect was 100 ms when the LR mapping was compatible and was reduced to a positive effect of about 30 ms when the LR mapping was incompatible. Location words showed positive Simon effects of 49 ms when the LR mapping was compatible and 29 ms when it was incompatible. In other experiments, the incompatible mapping condition for words has shown a small reverse Simon effect (Proctor et al. 2000, Experiments 1, 2, and 4). Proctor et al. pointed out that the Simon effect obtained with incompatibly mapped location words has a bimodal distribution, with subjects showing either a large negative or positive effect. The variability across experiments appears to re2ect probabilistic sampling from the bimodal distribution. This suggests that under mixed presentations with incompatible LR mappings, some subjects use verbal mediation in which they name the word before applying an opposite rule. Consistent with this interpretation, Proctor et al. demonstrated that mean RT was slower for the subjects showing a positive Simon effect than for those showing a negative Simon effect. Task repetition analyses of Proctor and Vu’s (2001) study indicated that, for all stimulus types, responses were faster when the LI task was repeated than when the previous trial was LR (see Table 22.5). None of the stimulus types showed two-way interactions of correspondence and task repetition or a three-way interaction of those variables with LR mapping. This indicates that the overall pattern of Simon effects (positive Simon effects with a compatible LR mapping for all stimulus types, negative Simon effects with an incompatible mapping for physical locations and arrows, and a positive Simon effect with an incompatible mapping for location words) was evident when the preceding trial was LR and when it was LI. Another analysis partitioned the LI repetition trials according to whether the preceding trial was corresponding or not (see Table 22.6). All three stimulus types showed similar results. For both compatible and incompatible mappings, the Simon effect was positive following a corresponding trial and negative following a non-corresponding trial. Table 22.5 Mean reaction time (in milliseconds) for location-irrelevant trials for mixed conditions of Proctor and Vu’s (2001) study as a function of experiment, stimulus mode, and repetition type Stimulus mode
Repetition type Task repetition
Task nonrepetition
Experiment 1: Keypress responses Physical locations 665 Arrow directions 580 Location words 687
774 707 929
Experiment 2: Vocal responses Physical locations 643 Arrow directions 614 Location words 675
766 715 817
461
aapc22.fm Page 462 Wednesday, December 5, 2001 10:09 AM
462
Common mechanisms in perception and action
Table 22.6 Mean Simon effects for reaction time (in milliseconds) for locationirrelevant trials for mixed conditions of Proctor and Vu’s (2001) study as a function of experiment, stimulus mode, compatibility, and correspondence on previous trial Stimulus mode
Correspondence on previous trial Corresponding
Experiment 1: Keypress responses Compatible mapping Physical locations 124 Arrow directions 58 Location words 78 Incompatible mapping Physical locations 32 Arrow directions 10 Location words 66 Experiment 2: Vocal responses Compatible mapping Physical locations Arrow directions Location words Incompatible mapping Physical locations Arrow directions Location words
Non-corresponding
−65 −31 −21 −133 −102 −8
108 103 118
−13 −2 20
24 60 74
−65 −60 −32
With vocal responses, all stimulus modes showed similar results for the Simon effect. When the LR mapping was compatible, positive Simon effects of similar magnitude were obtained for physical locations (56 ms), arrow directions (61 ms), and location words (61 ms). In contrast, when the LR mapping was incompatible, the Simon effect showed a small reversal or was not signi1cant. For physical locations and arrows, the reverse effect (−22 and −17 ms, respectively) was smaller than the positive one obtained with the compatible LR mapping. For location words, the mean Simon effect was positive (8 ms) even when the LR mapping was incompatible, but was not signi1cant. These results suggest that when the response mode is verbal, a signi1cant portion of subjects name the stimulus before selecting the response. Thus, when either the stimuli or responses are verbal in nature, there is a tendency for a stimulus to directly activate its corresponding name. A repetition analysis of the vocal-RT data for the LI trials showed that for all stimulus types, responses were faster when the task was repeated than when it was not (see Table 22.5). For physical locations and arrow directions, there was a four-way interaction between repetition, stimulus position, response position, and compatibility. Both the positive Simon effect for compatible mappings and the negative Simon effect for incompatible mappings were smaller when the trial was a repetition than when it was a nonrepetition (compatible mappings: 45 vs. 94 ms for physical locations and 56 vs. 78 ms for arrows; incompatible mappings: −15 vs. −36 ms for physical locations and −2 vs. −30 ms
aapc22.fm Page 463 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
for arrows). The repetition trials were partitioned, as for the keypress responses. All three stimulus types showed similar results (see Table 22.6). For compatible mappings, the Simon effect was positive when following a corresponding trial and negative or reduced considerably when following a noncorresponding trial. For incompatible mappings, the Simon effect was positive when following a corresponding trial and negative when following a noncorresponding trial. For each stimulus type, the effect of the preceding trial being corresponding or noncorresponding was of similar magnitude for the compatible and incompatible LR mappings. In summary, with keypress responses, when location information is conveyed by arrow direction, the Simon effect is enhanced if the LR mapping is compatible and reversed if it is incompatible, as for physical locations. However, the reversed effect is not as large as the positive effect. The in2uence of an incompatible LR mapping is even less when the location information is conveyed by word, with there being little or no Simon effect. With vocal responses, the effects of mixing are similar for all three stimulus types: a large Simon effect is obtained when the LR mapping is compatible, and the effect is eliminated or reversed slightly when the mapping is incompatible. As with physical locations, for trials on which the LI task is a repetition, the Simon effect is positive if the previous trial was corresponding and typically negative if the previous trial was noncorresponding. Thus, the occurrence of the Simon effect only on those trials that follow a corresponding trial is a general phenomenon and not one that is restricted to physical locations.
22.5 Mixing modes of location information The reverse Simon effect for LI trials when the LR mapping is incompatible indicates that stimuli produce activation via the task-de1ned location–response associations even on the LI trials. These associations could be independent of, or dependent on, stimulus mode. In terms of connectionist models, the associations could involve links from concept nodes to output nodes (e.g. Zhang and Kornblum 1998) or links from mode-speci1c nodes to output nodes (e.g. Cohen, Dunbar, and McClelland 1990). We conducted experiments to evaluate these alternatives by using different modes for the location information on LR and LI trials (Proctor et al. 2000). In Proctor et al.’s (2000) study, all combinations of physical locations, arrows, and words were evaluated in which the location information was conveyed in different stimulus modes on the LR and LI trials (see Table 22.7). The LR mapping was incompatible for all conditions, and Simon effects of −5 to 33 ms were obtained. Clearly, none of the stimulus types showed a strong reversal of the Simon effect similar to that obtained when the LR mapping is incompatible and the two trial types are presented in the same mode. The possibility exists that the form distinction between the two stimulus modes is the factor that eliminated the reversal of the Simon effect. To explore this possibility, Proctor et al.’s Experiment 4 used form distinctions of stimuli within the same mode. That is, for physical positions, the stimuli were circles for one trial type and were squares for the other (large and small arrows were used for arrow stimuli and upper-case and lower-case words were used for location words). Reverse Simon effects of −65 ms, −44 ms, and −17 ms were obtained for physical locations, arrow directions, and location words, respectively. These effects are of similar magnitude to those obtained when the two trial types were conveyed by stimuli of the same shape and size. Thus, a form distinction is not suf1cient to eliminate the reverse Simon effect. Proctor and Vu (2001, Exp. 3) showed that mode differences reduce the effect of mixing on the SRC effect for LR trials. Four groups were tested with compatible or incompatible LR mappings, with physical position relevant and location word irrelevant, or vice versa. The SRC effect for physical
463
aapc22.fm Page 464 Wednesday, December 5, 2001 10:09 AM
464
Common mechanisms in perception and action
Table 22.7 Mean Simon effect for reaction time (in milliseconds) for location-irrelevant trials when location-relevant trials are incompatibly mapped in Proctor, Marble, and Vu’s (2000) study as a function of stimulus mode for each trial type Location–relevant mode
Location–irrelevant mode
Spatial compatibility effect
Physical locations Location words Location words Arrow directions Arrow directions Physical locations
Location words Physical locations Arrow directions Location words Physical locations Arrow directions
32 4 33 13 3 −5
locations was 100 ms, in contrast to the non-signi1cant −16 ms effect when LI stimuli were presented in the same mode. The SRC effect for location words was 110 ms, which is smaller than the 172 ms effect obtained when location-irrelevant stimuli were presented in the same mode, but considerably larger than the 21 ms effect obtained in pure blocks of compatibly or incompatibly mapped location words. Thus, whereas presenting the LI information in words eliminates the effect of mixing on the SRC effect for physical locations, presenting the LI information in physical locations only reduces the effect of mixing for words.
22.6 General discussion The studies investigating mixed presentation of compatible and incompatible LR mappings and of mixed LR and LI trials show that task set is a major determinant of the SRC effects that are obtained. The SRC effect for the LR trials varies systematically as a function of the nature of the intermixed tasks, as does the Simon effect for LI trials.
22.6.1 Summary of major 1ndings 22.6.1.1 Reduction, elimination, and enhancement of the SRC effect Shaffer (1965) showed that, for two-choice SRC tasks in which stimuli were left–right locations and responses left–right keypresses, mixing trials with compatible and incompatible mappings eliminated the SRC effect. We replicated Shaffer’s results using stimulus color, rather than a separate mapping signal, to designate the mapping (Vu and Proctor 2001). We also included conditions in which left– right keypresses were made to left–right arrow directions or location words. For arrows, mixing compatible and incompatible mappings eliminated the SRC effect, as for physical locations. However, for words, mixing enhanced the SRC effect. In contrast, when the responses were the spoken words ‘left’ and ‘right’, mixing tended to reduce, but not eliminate, the SRC effect for all three stimulus modes. Thus, mixing compatible and incompatible mappings in two-choice tasks does not always eliminate, or even reduce, the SRC effect. Our experiments in which compatibly or incompatibly mapped LR trials were mixed with LI trials showed that the SRC effect was eliminated for physical locations mapped to keypresses (Marble and Proctor 2000; Proctor and Vu 2001; Proctor et al. 2000; Proctor et al. in press). Proctor et al. (in
aapc22.fm Page 465 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
press) showed that the elimination of the SRC effect with physical locations and keypresses occurs primarily for situations in which the LR and LI trials share left–right spatial codes. When the LI stimuli are presented in a centered location, a normal SRC effect is obtained. However, when the LI and LR stimuli are presented in two rows, above or below 1xation, no SRC effect is evident. In both cases, the LI stimuli appear in locations that are distinct from those in which the LR stimuli occur. The major difference is that left and right codes are present for both trial types in the latter experiment, but not in the former. Finally, an SRC effect is obtained when the LI stimuli vary along the vertical dimension and the LR stimuli along the horizontal dimension. Thus, the elimination of the SRC effect under mixed presentation is apparently a consequence of uncertainty about whether the compatible response can be made to the location code that is formed upon stimulus presentation. With keypress responses, mixing LI and LR trials had different effects on SRC for arrow directions and location words than for physical locations (Proctor and Vu 2001). For arrows, the SRC effect was of similar magnitude to that obtained with pure presentation of LR trials, whereas for words, mixing increased the magnitude of the SRC effect. With vocal ‘left’–‘right’ responses, mixing LR and LI trials increased the SRC effect for all stimulus types. Thus, mixing LR and LI trials eliminated the SRC effect only for physical locations mapped to keypress responses. When either stimuli or responses were of a verbal nature, mixing the trial types increased the SRC effect. This outcome implies that response selection is mediated in these cases to a considerable extent by activation of the stimulus name.
22.6.1.2 Enhancement and reversal of the Simon effect With physical location stimuli and keypresses, the reverse Simon effect obtained when the LR mapping was incompatible was at least as large as the enhanced positive Simon effect obtained when the mapping was compatible (Marble and Proctor 2000; Proctor and Vu 2001; Proctor et al. 2000; Proctor et al. in press). When the location information was conveyed by arrow direction or location words and the responses were keypresses, mixing had the following effects. For arrows, mixed compatible LR trials enhanced the Simon effect and mixed incompatible LR trials reversed it. However, unlike physical location stimuli, the reversed effect was not as large as the positive Simon effect. For location words, a positive Simon effect was obtained when the LR mapping was compatible, but only a small negative or positive Simon effect when the LR mapping was incompatible. With vocal responses, physical location and arrow stimuli showed a small reverse Simon effect when LI trials were mixed with incompatibly mapped LR trials. For location words, a small positive Simon effect was obtained when the LR mapping was incompatible. When the LR mapping was compatible, all stimulus types showed a large positive Simon effect. Thus, the task-de1ned associations of location information to responses in2uence performance for symbolic and verbal modes, as well as for the physical mode. 22.6.1.3 Mixed location modes Presenting LR and LI trials in distinct modes reduces the in2uence of mixing on LR trials. When physical locations convey the LR information and location words the LI information, the SRC effect is reinstated, and possibly even enhanced (Proctor and Vu 2001). When words convey the LR information and physical locations convey the LI information, the SRC effect is obtained: it is reduced in magnitude compared with mixed presentation of location words for both trial types, but the effect is larger than that obtained with blocked presentation of LR trials. Presenting location information in distinct modes on LR and LI trials also reduces the impact of the LR mapping on the Simon effect (Proctor et al. 2000). With the incompatible mapping, the reverse Simon effect is
465
aapc22.fm Page 466 Wednesday, December 5, 2001 10:09 AM
466
Common mechanisms in perception and action
eliminated regardless of which stimulus mode is used for LR trials and which for LI trials, with a positive Simon effect being reinstated fully for some mode combinations. Thus, the results for both the Simon effect and the SRC effect imply that the task-de1ned associations of location information to responses are relatively mode speci1c.
22.6.1.4 Precuing and repetition effects When compatible and incompatible mappings of physical locations or arrows are mixed, precuing the mapping reinstates the SRC effect (De Jong 1995; Shaffer 1965): Subjects can prepare for the cued mapping in a way that allows the normal bene1t for the compatible mapping to occur. When incompatibly mapped LR trials are mixed with LI trials, precuing the trial type improves performance but does not eliminate the reverse Simon effect for the LI trials. Because only one location mapping (incompatible) is in effect for the block of trials, these S–R associations continue to be applied on the LI trials. The reverse Simon effect also is obtained when subjects practice with an incompatible spatial mapping prior to performing a Simon task (Proctor and Lu 1999). That the reverse Simon effect is obtained when location is known to be irrelevant indicates that the effect is not simply a consequence of uncertainty about which stimulus dimension is relevant. Task repetition typically produces a substantial RT bene1t compared to a task switch. When compatible and incompatible LR mappings are mixed, the compatible mapping bene1ts more from repetition than does the incompatible mapping (Vu and Proctor 2001). When LR and LI trials are mixed, bene1ts occur for repeating the same task and repeating the same S–R pair. The task repetition bene1t on LR trials is of similar magnitude for compatible and incompatible mappings (e.g. Proctor et al. 2001). Thus, although compatible trials receive an extra bene1t from repetition when mixed with incompatible trials, they do not receive an extra bene1t relative to the incompatible trials when either type is mixed with LI trials in separate blocks. Another way of describing the aforementioned relation for the mixed LR and LI conditions is that the SRC effect is present when an LR trial follows either an LR or LI trial. The positive and reverse Simon effects for compatible and incompatible LR mappings, respectively, also occurred regardless of whether the LI task followed an LR or LI trial. However, for task repetition trials, the Simon effect was dependent on the correspondence relation of the preceding trial. Regardless of the condition (i.e. pure Simon, mixed-compatible, or mixed-incompatible), the Simon effect was always positive when following a corresponding trial and negative following a noncorresponding trial, although the relative magnitudes of the positive and negative effects did differ across conditions. Thus, the repetition effects of the type reported by Mordkoff (1998), Leuthold et al. (1999), and Valle-Inclán et al. (2002) for pure Simon trials occur equally when LI and LR trials are mixed. However, the method of examining correspondence effects with respect to whether the previous trial was corresponding or not can be problematic because different combinations of trial types are collapsed together (Hommel 2000). We conducted an analysis of Marble and Proctor’s (2000) Experiment 1 that examined the same trials partitioned into conditions in which stimulus color (also response) and stimulus position were repeated or changed on consecutive trials. With pure LI trials, performance was better when both the stimulus color and position repeated or changed compared with conditions in which only one changed. With mixed LI and LR trials, collapsed across location mapping, repetition of the LI task bene1ted only when both the stimulus color and position were repeated and not when they both changed. For the Simon effect, when the mixed LR mapping was compatible, the effect was larger when both stimulus color and position were changed or repeated
aapc22.fm Page 467 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
than when only one was repeated. When the mixed LR mapping was incompatible, this relation held for the reverse Simon effect: the effect was larger when both stimulus color and position were changed or repeated than when only one was repeated. This pattern of repetition effects seems more consistent with an account in terms of integration of stimulus and response features from the previous trial, as Hommel (1998b) suggests, than with voluntary gating of the direct route.
22.6.2 Evaluation of the pure translation, two-step, and alternative routes accounts When two tasks or trial types, for which the same set of responses are mapped to stimuli in different manners, are mixed, the subject must maintain multiple intentions, or task sets. To respond with high accuracy, a determination must be made when the stimulus is presented as to which task set is appropriate, and this task set must ultimately control response selection. We evaluate the three accounts, described in the Introduction, that have been proposed to explain the effects on performance of mixing different mappings of stimuli and responses.
22.6.2.1 Pure translation (translation ef1ciency) Shaffer (1965) proposed that for the conditions of his study in which all S–R pairs were known in advance to be compatible, ‘the null transformation could be considered a special class and was easier to compute’ (p. 287). In current terms, this amounts to saying that the bene1t for the pure compatible conditions arises from use of an identity rule (i.e. respond with the corresponding response) that cannot be used when some mappings are incompatible. This identity rule is more ef1cient than searching for the assigned response or application of an opposite rule (i.e. respond with the noncorresponding response). Shaffer’s pure translation account has not received much attention, although it seems to account for several results. One 1nding that creates dif1culty for the account is that when incompatible and compatible mappings are mixed, errors tend to be the response that would be correct if the alternative mapping were appropriate for the stimulus that was presented (Duncan 1977a,b, 1978; Ehrenstein and Proctor 1998; Stoffels 1996b). This result implies that response selection occurs at least in part through application of mapping rules even when compatible and incompatible trials are mixed. The pure translation account also does not provide an adequate explanation for the greater repetition bene1t shown for the compatible mapping than for the incompatible mapping under mixed conditions. One would have to propose that the identity rule was applied after a compatible trial but not if the preceding trial was incompatible, which does not seem plausible. With mixed presentation of LR and LI trials, a pure translation model cannot account for the 1nding that repetition effects are equal for compatible and incompatible mappings. 22.6.2.2 Two-step response selection Duncan (1977a,b, 1978) proposed that with mixed lists of compatible and incompatible mappings, response selection proceeds in two steps: determination is made as to which of the mappings is applicable for the trial, and then that mapping rule is applied. A distinguishing feature of this model is that only one rule is applied on any given trial, with the rule being the appropriate one in most cases and the inappropriate one when an error is made in the 1rst step. The strongest support for this model is that errors are typically the correct response for the inappropriate rule for that trial (Duncan, 1977a,b, 1978; Ehrenstein and Proctor 1998; Stoffels 1996b). However, the two-step model alone cannot account readily for several 1ndings. The most straightforward prediction of the model
467
aapc22.fm Page 468 Wednesday, December 5, 2001 10:09 AM
468
Common mechanisms in perception and action
is that the effect of mixing should be additive with that of mapping because the major difference between mixed and pure blocks is the additional mapping-selection step required for all trials. The model can accommodate the more customary 1nding of larger mixing costs for compatible than incompatible mapping by assuming that the incompatible trials have a bene1t due to reduction in number of S–R alternatives that the compatible trials do not. However, even with this assumption, the model cannot explain the fact that a precue is more bene1cial for the compatible than incompatible mapping (Ehrenstein and Proctor 1998; Shaffer 1965; Stoffels 1996b). This is because the additional factor that produces the bene1t for the incompatible mapping should still be contributing to performance when the mappings are precued. With respect to mixing LR and LI trials, the two-step model loses more ground. It does not directly predict the elimination of the SRC effect when location is irrelevant on half of the trials. Moreover, the model does not differentiate between the experiments for which mixing eliminates the SRC effect (presenting the LR and LI stimuli in the same locations; presenting both trial types in left and right locations on different rows) and those for which the SRC effect is evident (presenting the LI stimuli in the center of the screen; presenting the LI stimuli along the vertical dimension and LR stimuli along the horizontal dimension). Also, because mapping selection is presumed to precede response selection within the mapping, the extent of the crosstalk effects of the LR mapping on the LI trials seems to greatly exceed the magnitude that would be expected. At a minimum, the two-step model must be combined with additional properties to explain a reasonable range of 1ndings. For example, Ehrenstein and Proctor (1998) suggested that when combined with the alternative routes account, the two-step model can account for most data from tasks in which compatible and incompatible mappings are mixed. Another possibility suggested by the considerable cross-activation for the LR and LI trial types is that activation proceeds initially with regard to the associations de1ned for both tasks, with the decision about which associations are appropriate for the current trial occurring relatively late in processing (e.g. Hommel, 1998a).
22.6.2.3 Alternative routes model Van Duren and Sanders (1988) proposed that compatible mappings bene1t from long-term S–R associations and thus from a direct response-selection route when it is known in advance that all trials will be compatible. However, when all trials are incompatibly mapped or the two trial types are mixed, the direct route cannot be used because the S–R mapping is based on task-de1ned associations that require search from memory. Another way to describe the alternative routes model is that the direct route is used only if all mappings are compatible because it leads to the correct response on all trials, and the direct route is suppressed when all mappings are incompatible or when mappings are mixed because it leads to the wrong response on many trials. The alternative routes model is the most popular explanation of mixing effects. Unlike the two-step model, it can account for the precuing bene1ts obtained with mixed compatible and incompatible mappings. In this case, the compatible mapping bene1ts more than the incompatible mapping because the precue allows the direct route to be used. If it is assumed that the direct route is suppressed following an incompatible trial but not following a compatible trial (e.g. Stoffels 1996a,b), then the fact that mixing primarily reduces the SRC effect for trials on which the mapping is not repeated can be accommodated as well. With regard to mixing LR and LI trials, the alternative routes model can explain the pattern of SRC effects obtained: the direct route is suppressed for situations in which the location codes cannot serve as unambiguous indicators of a compatible response but not for situations in which they
aapc22.fm Page 469 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
can. The alternative routes model does not make speci1c predictions regarding the Simon effects obtained on LI trials because the direct route should not be a contributing factor. However, the results are generally consistent with the model if it is assumed that the task-de1ned associations of locations in the indirect translation route produce activation on LI trials. The alternative routes model predicts that any in2uence of mixing on the SRC effect should be to reduce its magnitude. Therefore, it cannot account for the fact that, for location words mapped to keypress responses, the SRC effect increased with mixing of compatible and incompatible LR trials and mixing of LR and LI trials. It also cannot account for the increase in the SRC effect that occurred for mixed LI and LR trials for vocal responses, with all stimulus modes. However, the results can be accommodated by the model if it is modi1ed to allow stimuli to activate their corresponding names, regardless of mapping, for a signi1cant portion of trials in conditions for which either the stimulus or response is verbal. A more serious problem for the alternative routes model arises from the results of studies that used mixed mappings for digit naming: mixing reduces the advantage for the compatible mapping under conditions where the direct route should provide an appropriate basis for response selection. Van Duren and Sanders’s (1988) 1nding that the advantage for compatibly mapped digit stimuli and naming responses is eliminated when some digits are incompatibly mapped to digit names is consistent with the model, as they concluded. However, when shapes mapped to digit names are mixed with compatibly mapped digit stimuli, as in Morin and Forrin’s (1962; Forrin and Morin 1967) studies, the correct response to a digit is always its corresponding name; thus, the argument would have to be made that the direct route is suppressed in order to prevent the shapes from being named. Even more problematic, the advantage for digit naming is eliminated when the compatibly mapped digits are mixed with compatibly mapped letters (Forrin 1975), in which the direct route should yield the correct responses for all stimuli. Although the mixing effects in Forrin’s study were small, they suggest that mixing stimulus categories is suf1cient to reduce the bene1t for the easier task even when both tasks require compatible responses.
22.6.3 Alternative theoretical accounts Overall, the alternative routes model, modi1ed to accommodate the results obtained with verbal stimuli or responses, seems to explain more 1ndings than the two-step and pure translation models. However, all of the models, including the alternative routes model, cannot explain certain 1ndings easily. Los (1996) classi1es these models as structural models, for which the mixing costs are primarily strategic in nature. An alternative view is that mixing costs are stimulus-driven in nature, being a function primarily of the greater intertrial variability, which in turn necessitates more trial-to-trial adjustments. Los suggests that the effects of precuing and repetition are particularly diagnostic for deciding which class of models, strategic or stimulus-driven, can best account for mixing effects. Speci1cally, he argues that if precuing eliminates mixing costs and sequential effects are absent, a strategic explanation is implicated; whereas if sequential effects are found and precuing is ineffective, then a stimulus-driven account is favored. Several studies have shown that precuing a compatible or incompatible mapping largely eliminates the cost of mixing the two mappings (De Jong 1995; Ehrenstein and Proctor 1998; Forrin and Morin 1967; Shaffer 1965, 1966; Stoffels 1996a,b), consistent with a strategic account. However, precuing a trial as LI does little to reduce the effect of an incompatible LR mapping (Marble and Proctor 2000). Also, the mapping continues to be applied after the task has been changed to one for which
469
aapc22.fm Page 470 Wednesday, December 5, 2001 10:09 AM
470
Common mechanisms in perception and action
location is no longer relevant (Proctor and Lu 1999; Tagliabue et al. 2000). Thus, there clearly is not complete strategic control of response selection. Perhaps the most striking and consistent aspect of the mixing literature is that large, systematic repetition effects are obtained in all task variations. When repetition analyses have been performed, sequential effects were apparent that, in the case of mixed compatible and incompatible mappings, mimic the precuing results relatively closely. The possibility that subjects engage in trial-to-trial strategic alterations of task set cannot be ruled out, but, as Los (1996) stated, ‘Sequential effects should be considered as at least slightly embarrassing for a strategic account of mixing costs’ (p. 172). More speci1cally, Hommel (2000) showed that the pattern of repetition effects that Mordkoff (1998) and Leuthold et al. (1999) interpreted as indicating voluntary gating of the direct route is more consistent with a stimulus-driven account when repetition analyses do not collapse across the different combinations of stimulus color and position. Our analyses described earlier in the ‘General discussion’ support this point. Research conducted on task switching, in which two tasks are presented in a 1xed order (e.g. on alternate trials), has obtained results suggesting that there is a large component of changing task set that is not under voluntary control. Unlike the mixing studies described in this paper, because the sequence is orderly, subjects are aware of what the speci1c task will be on each trial. Even though the task sequence is known, the typical 1nding is that performance is worse when the task alternates between trials than when only single a task is presented for all trials (Jersild 1927). This switching cost is evident even when the interval between trials is suf1cient to allow preparation for the forthcoming task (Allport, Styles, and Hsieh 1994; Rogers and Monsell 1995), suggesting that there is a component that is not under the subject’s control. Allport et al. conducted an experiment using a Stroop color-naming task, which is a close relative of the Simon tasks used in our experiments, with either the color word or physical color named on alternate trials. The easier word-reading task was slowed considerably, compared to a baseline condition, when it followed the color-naming task, whereas the more dif1cult color-naming task was unaffected. This asymmetric effect was evident at response–stimulus intervals up to 1100 ms. Thus, costs for the easier task occur with the switching method much as with randomized presentation of the two task types, even when there is more than one second to prepare for the appropriate task. Allport et al. (1994) called the component of task switching that is not under the subject’s control task set inertia, and Rogers and Monsell (1995) called it exogenous task-set recon1guration. Regardless of the exact nature of this component, the point is that a signi1cant portion of the switching costs is not under strategic control. Rogers and Monsell (Exp. 6) used sequences of four task repetitions and then a switch to the alternate task for four consecutive trials, and so on, and found that the switch costs were eliminated after the 1rst trial of the new task. Their results imply that the distinction between task-repetition and switch trials should be of importance when trials with different mappings or relevance of the location dimension are randomly intermixed, as is in fact the case. Thus, a substantial portion of the mixing effects is stimulus driven.
22.7 Conclusion Donders (1868/1969) was the 1rst to recognize that performance of two-choice reaction tasks with different S–R sets and mappings afford considerable insight into response selection in particular and the interaction between perception and action in general. Although Donders did not speci1cally examine the role of task set in his experiments, two-choice tasks can also provide much valuable
aapc22.fm Page 471 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
information about this role. The experiments described in this paper demonstrate that SRC effects obtained with location information are quite susceptible to the in2uence of task set. When compatible and incompatible LR mappings are mixed, the bene1t for the compatible mapping typically is reduced or eliminated, but it is enhanced when locations words are mapped to keypresses. When LI and LR trials are mixed, the SRC effect for the LR trials is eliminated for physical locations mapped to keypresses, unaffected for arrows mapped to keypresses, and enhanced when the stimulus or response mode is verbal in nature. The mappings de1ned for the LR task are applied on trials for the LI task and, thus, largely determine the pattern of Simon effects. For physical locations assigned to keypress responses, the presence of codes along the same dimension on the intermixed trials (LI or incompatible spatial mapping) precludes rapid responding on the basis of these codes for trials on which the location mapping is compatible. When either the stimulus or response is verbal, a tendency for activating the stimulus name exists, regardless of LR mapping, which increases the bene1t for the compatible mapping under mixed conditions. The major accounts for the effects of mixing attribute them to controlled processing strategies. However, the systematic repetition effects obtained with mixed presentation of different types of trials imply that a substantial part of the mixing effects may be involuntary and stimulus driven. Regardless of the ultimate explanation, there is little doubt hat SRC effects involve much more than learned associations between stimuli and responses and that task set plays a signi1cant role.
References Allport, A., Styles, E.A., and Hsieh, H. (1994). Shifting intentional set: Exploring the dynamic control of tasks. In C. Umiltà and M. Moscovitch (Eds.), Attention and performance XV, pp. 421–452. Cambridge, MA: MIT Press. Baldo, J.V., Shimamura, A.P., and Prinzmetal, W. (1998). Mapping symbols to response modalities: Interference effects on Stroop-like tasks. Perception and Psychophysics, 60, 427–437. Barber, P. and O’Leary, M. (1997). The relevance of salience: Towards an activation account of irrelevant stimulus– response compatibility effects. In B. Hommel and W. Prinz (Eds.), Theoretical issues in stimulus–response compatibility, pp. 135–172. Amsterdam: North-Holland. Broadbent, D.E. and Gregory, M. (1962). Donders’ B- and C-reactions and S–R compatibility. Journal of Experimental Psychology, 63, 575–578. Cohen, J.D., Dunbar, K., and McClelland, J.L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332–361. De Jong, R. (1995). Strategical determinants of compatibility effects with task uncertainty. Acta Psychologica, 88, 187–207. Donders, F.C. (1868/1969). On the speed of mental processes. In W.G. Koster (Ed.), Acta Psychologica, 30, Attention and performance II, pp. 412–431. Amsterdam: North-Holland. Duncan, J. (1977a). Response selection errors in spatial choice reaction tasks. Quarterly Journal of Experimental Psychology, 29, 415–423. Duncan, J. (1977b). Response selection rules in spatial choice reaction tasks. In S. Dornic (Ed.), Attention and performance VI, pp. 49–61. Hillsdale, NJ: Erlbaum. Duncan, J. (1978). Response selection in spatial choice reaction: Further evidence against associative models. Quarterly Journal of Experimental Psychology, 30, 429–440. Ehrenstein, A. and Proctor, R.W. (1998). Selecting mapping rules and responses in mixed compatibility fourchoice tasks. Psychological Research, 61, 231–248. Eimer, M. (1995). S–R compatibility and automatic response activation: Evidence from psychophysiological studies. Journal of Experimental Psychology: Human Perception and Performance, 21, 837–854. Fitts, P.M. (1964). Perceptual–motor skill learning. In A.W. Melton (Ed.), Categories of human learning pp. 243–285. New York: Academic Press.
471
aapc22.fm Page 472 Wednesday, December 5, 2001 10:09 AM
472
Common mechanisms in perception and action
Fitts, P.M. and Deininger, R.L. (1954). S–R compatibility: Correspondence among paired elements within stimulus and response codes. Journal of Experimental Psychology, 48, 483–492. Fitts, P.M. and Seeger, C.M. (1953). S–R compatibility: Spatial characteristics of stimulus and response codes. Journal of Experimental Psychology, 46, 199–210. Forrin, B. (1975). Naming latencies to mixed sequences of letters and digits. In P.M.A. Rabbitt and S. Dornic (Eds.), Attention and performance V, pp. 345–356. New York: Academic Press. Forrin, B. and Morin, R.E. (1967). Effects of context on reaction time to optimally coded signals. Acta Psychologica, 27, 188–196. Greenwald, A.G. and Rosenberg, K.E. (1978). Sequential effects of distraction stimuli in a selective attention reaction time task. In J. Requin (Ed.), Attention and performance VII, pp. 487–504. Hillsdale, NJ: Erlbaum. Hedge, A. and Marsh, N.W.A. (1975). The effect of irrelevant spatial correspondences on two-choice response time. Acta Psychologica, 39, 427–439. Hommel, B. (1994). Spontaneous decay of response-code activation. Psychological Research, 56, 261–268. Hommel, B. (1998a). Automatic stimulus–response translation in dual-task performance. Journal of Experimental Psychology: Human Perception and Performance, 24, 1368–1384. Hommel, B. (1998b). Event 1les: Evidence for automatic integration of stimulus–response episodes. Visual Cognition, 5, 183–216. Hommel, B. (2000). Stimulus and response feature integration masquerading as information gating and route suppression. Manuscript submitted for publication. Hommel, B. and Prinz, W. (Eds.) (1997). Theoretical issues in stimulus–response compatibility. Amsterdam: North-Holland. Jersild, A.T. (1927). Mental set and shift. Archives of Psychology, Whole No. 89. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility: A model and taxonomy. Psychological Review, 97, 253–270. Leuthold, H., Stürmer, B., Soetens, E., Schroter, H., and Sommer, W. (1999). Suppression of location-based priming in the Simon task: Behavioral and electrophysiological evidence. Manuscript submitted for publication. Logan, G.D. (1980). Attention and automaticity in Stroop and priming tasks: Theory and data. Cognitive Psychology, 12, 523–553. Logan, G. and Zbrodoff, N.J. (1979). When it helps to be misled: Facilitative effects of increasing the frequency of con2icting stimulus in a Stroop-like task. Memory and Cognition, 7, 166–174. Los, S.A. (1996). On the origin of mixing costs: Exploring information processing in pure and mixed blocks of trials. Acta Psychologica, 94, 145–188. Lu, C.-H. and Proctor, R.W. (1995). The in2uence of irrelevant location information on performance: A review of the Simon effect and spatial Stroop effects. Psychonomic Bulletin and Review, 2, 174–207. Lu, C.-H. and Proctor, R.W. (2001). In2uence of irrelevant information on human performance: Effects of S-R association strength and relative timing. Quarterly Journal of Experimental Psychology, 54A, 95–136. Marble, J.G. and Proctor, R.W. (2000). Mixing location-relevant and location-irrelevant choice-reaction tasks: In2uences of location mapping on the Simon effect. Journal of Experimental Psychology: Human Perception and Performance, 26, 1515–1533. Marcel, T. and Forrin, B. (1974). Naming latency and the repetition of stimulus categories. Journal of Experimental Psychology, 103, 450–460. Mordkoff, T. (1998). The gating of irrelevant information in selective-attention tasks (Abstract). Abstracts of the Psychonomic Society, 3, 193. Morin, R.E. and Forrin, B. (1962). Mixing of two types of S–R associations in a choice reaction time task. Journal of Experimental Psychology, 64, 137–141. Prinz, W. (1997). Why Donders has led us astray. In B. Hommel and W. Prinz (Eds.), Theoretical issues in stimulus–response compatibility, pp. 247–267. Amsterdam: North-Holland. Proctor, R.W. and Lu, C.-H. (1999). Processing irrelevant location information: Practice and transfer effects in choice–reaction tasks. Memory and Cognition, 27, 63–77. Proctor, R.W. and Vu, K.-P.L. (2001). Mixing location irrelevant and relevant trials: In2uence of stimulus mode on spatial compatibility effects. Manuscript submitted for publication. Proctor, R.W. and Wang, H. (1997). Differentiating types of set-level compatibility. In B. Hommel and W. Prinz (Eds.), Theoretical issues in stimulus–response compatibility, pp. 11–37. Amsterdam: North-Holland. Proctor, R.W., Marble, J.G., and Vu, K.-P.L. (2000). Mixing incompatibly mapped location-relevant trials with location-irrelevant trials: Effects of stimulus mode on the reverse Simon effect. Psychological Research/ Psychologische Forschung, 64, 11–24.
aapc22.fm Page 473 Wednesday, December 5, 2001 10:09 AM
Eliminating, magnifying, and reversing spatial compatibility effects
Proctor, R.W., Vu, K.-P.L., and Marble, J.G. (in press). Mixing location-relevant and irrelevant tasks: Spatial compatibility effects eliminated by stimuli that share the same spatial codes. Visual Cognition. Rogers, R.D. and Monsell, S. (1995). Cost of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124, 207–231. Shaffer, L.H. (1965). Choice reaction with variable S–R mapping. Journal of Experimental Psychology, 70, 284–288. Shaffer, L.H. (1966). Some effects of partial advance information on choice reaction with 1xed or variable S–R mapping. Journal of Experimental Psychology, 72, 541–545. Simon, R.J. (1990). The effects of an irrelevant directional cue on human information processing. In R.W. Proctor and T.G. Reeve (Eds.), Stimulus–response compatibility: An integrated perspective, pp. 31–86. Amsterdam: North-Holland. Stoffels, E.J. (1996a). On stage robustness and response selection routes: Further evidence. Acta Psychologica, 91, 67–88. Stoffels, E.J. (1996b). Uncertainty and processing routes in the selection of a response: An S–R compatibility study. Acta Psychologica, 94, 227–252. Tagliabue, M., Zorzi, M., Umiltà, C., and Bassignani, F. (2000). The role of LTM links and STM links in the Simon effect. Journal of Experimental Psychology: Human Perception and Performance, 26, 648–670. Toth, J.P., Levine, B., Stuss, D.T., Oh, A., Winocur, G., and Meiran, N. (1995). Dissociation of processes underlying spatial S–R compatibility: Evidence for the independent in2uence of what and where. Consciousness and Cognition, 4, 483–501. Umiltà, C. and Nicoletti, R. (1990). Spatial stimulus–response compatibility. In R.W. Proctor and T.G. Reeve (Eds.), Stimulus–response compatibility: An integrated perspective, pp. 89–116. Amsterdam: North-Holland. Umiltà, C. and Zorzi, M. (1997). Commentary on Barber and O’Leary: Learning and attention on S–R compatibility. In B. Hommel and W. Prinz (Eds.), Theoretical issues in stimulus-response compatibility, pp. 173–178. Amsterdam: North-Holland. Valle-Inclán, F., Hackley, S.A., and de Labra, C. (2002). Does stimulus-driven response activation underlie the Simon effect? This volume, Chapter 23. Van Duren, L. and Sanders, A.F. (1988). On the robustness of the additive factors stage structure in blocked and mixed choice reaction designs. Acta Psychologica, 69, 83–94. Vu, K.-P.L. and Proctor, R.W. (2001). Mixing compatible and incompatible mappings: Elimination, reduction, and enhancement of spatial compatibility effects. Manuscript submitted for publication. Wang, H. and Proctor, R.W. (1996). Stimulus–response compatibility as a function of stimulus code and response modality. Journal of Experimental Psychology: Human Perception and Performance, 22, 1201–1217. Zhang, H. and Kornblum, S. (1998). The effects of stimulus-response mapping and irrelevant stimulus– response and stimulus–stimulus overlap in four-choice Stroop tasks with single-carrier stimuli. Journal of Experimental Psychology: Human Perception and Performance, 24, 3–19. Zorzi, M. and Umiltà, C. (1995). A computational model of the Simon effect. Psychological Research, 58, 193–205.
473
aapc23.fm Page 474 Wednesday, December 5, 2001 10:10 AM
23 Does stimulus-driven response activation underlie the Simon effect? Fernando Valle-Inclán, Steven A. Hackley, and Carmen de Labra Abstract. The most in2uential explanations of the Simon effect assume that a spatial stimulus code automatically activates the spatially compatible response. However, the degree to which activation of the compatible response should be considered as an automatic, stimulus-driven process is uncertain. To assess this presumed automaticity we conducted two experiments to analyze the in2uence on the Simon effect of repetitions versus alternations of stimulus, response, and stimulus–response compatibility (SRC). The results indicate that the Simon effect is produced only when the previous trial is compatible. Furthermore, the lateralized readiness potential showed clear signs of incorrect response activation on incompatible trials when previous trial was compatible, but not when the previous trial was incompatible. These results indicate that SRC sequences are critical for the appearance of the Simon effect and that the stimulus spatial code alone does not automatically activate the spatially corresponding response.
23.1 Introduction In their seminal 1967 report, Simon and Rudell showed that spatial stimulus–response compatibility (SRC) effects could be obtained even when stimulus location was task irrelevant. They presented the word ‘left’ or ‘right’ randomly at either ear and required subjects to press a key located on the left or right side depending on the meaning of the word. Although the spatial source of the sound was irrelevant, performance deteriorated when the stimulus and response were on opposite sides. Since then, this special case of SRC has been clearly established in different sensory modalities and in a variety of experimental tasks. The general pattern is that reaction time (RT) is shorter and response accuracy is greater when stimulus and response are on the same side (compatible trials) than when they are on opposite sides (incompatible trials). Research on this phenomenon, known as the Simon effect, has established the following points: (1) the Simon effect cannot be explained by commisural crossing delays (Simon, Hinrichs, and Craft 1970); (2) it is not restricted to a particular sensory modality and can be obtained with crossmodal stimulation (Simon 1982; Simon and Acosta 1982); (3) the size of the effect tends to decrease from fast to slow responses (De Jong, Liang, and Lauber 1994; Hommel 1993a); and (4) at least one locus of the effect is the response selection stage, a proposition that receives support from studies of movement-related brain potentials (de Jong, et al. 1994; Valle-Inclán 1996a, 1996b). The emergent picture is that of a highly automatic bottom-up process: the brain produces a spatial code for the stimulus location that, in turn, activates the spatially corresponding response (see the reviews by Lu and Proctor 1995, and Simon 1990). It is controversial, however, how the spatial code is formed (Valle-Inclán, Hackley, and de Labra 2001) and the degree to which the activation of the compatible response should be considered an obligatory stimulus-driven process. The latter issue is explored in the following experiments.
aapc23.fm Page 475 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
Central to many accounts of the Simon effect is the notion of a transient automatic activation of the spatially compatible response triggered by stimulus onset (Hommel 1993a; Kornblum, Hasbroucq, and Osman 1990; Zorzi and Umiltà 1995) or by stimulus identi1cation (Kornblum et al. 1999). The term automatic is intended to mean that this process is stimulus driven, unintentional, and unavoidable, although it might be modulated by other processes (Kornblum et al. 1990). This automatic route works in parallel with a second route which is governed by the S–R mapping de1ned in the experiment. The interaction of these two routes at the response selection stage causes the Simon effect. These models can account for much of the existing data, but there are several lines of evidence that cast doubt on their generality. The stimulus-driven hypothesis predicts facilitation on compatible trials and interference on incompatible trials with respect to neutral trials (i.e. nonlateralized stimuli). This pattern of results is usually obtained when compatible, neutral, and incompatible trials are mixed in the same block of stimuli (Hommel 1993b; Simon and Craft 1970; Umiltà, Rubichi, and Nicoletti 1999). However, when neutral trials are presented in a separate block there is no facilitation; RT on neutral trials is the same as on compatible trials (Simon and Acosta 1982) or even faster (Craft and Simon 1970; Simon and Small 1969). The appearance and disappearance of facilitation as a function of the blocking scheme is not what would be expected, a priori, of an automatic bottom-up process. The strongest doubts about stimulus-driven explanations are fed by reversals of the Simon effect (i.e. compatible trials slower than incompatible trials), something that should not occur if stimulus presentation automatically activates the spatially corresponding response. Hedge and Marsh (1975) were the 1rst to obtain a reverse Simon effect by instructional manipulation. They used colored keys and instructed subjects to press the key of the same color as the stimulus (identity mapping), or to press the key with the alternative color (alternate mapping). The Simon effect appeared with identity mapping and was reversed when the alternate mapping was used. Hedge and Marsh proposed that the Simon effect and its reversal were due to the unintentional application of mapping rules (identity and alternate) to the irrelevant stimulus dimension (location), a notion that is known as the ‘logical recoding’. More complex models can account for this reversal of the Simon effect and still maintain the automatic route linking stimulus and response codes. For example, De Jong et al. (1994) proposed a dual-process model that incorporates the automatic activation of the compatible response and the idea of logical recoding. Another type of Simon effect reversal is obtained when responses can be coded in different spatial positions. An example of this type of reversal was provided by Hommel (1993b; see Riggio, Gawryszewski, and Umiltà 1986, for related 1ndings). The stimuli were tones presented on left or right side. The response keys were also located on the left and right side and were connected to lights in a parallel fashion (i.e. left–left, right–right) or in a crossed way (left–right, right–left). Hommel instructed subjects to react to the tones either by pressing the keys or by turning on the lights. He found a normal Simon effect when subjects were instructed to press the keys. Under instructions to turn on the lights, however, the Simon effect reversed when keys and lights were connected in a crossed way. Hommel interpreted these data to indicate that the Simon effect re2ected the spatial correspondence of stimulus and goal location. A slightly different interpretation would be that subjects coded the light locations as response locations, as people usually do when playing computer games, for example. This type of coding will produce a reverse Simon effect in the cross connected condition. What these experiments indicate is that the spatial relationship between stimulus and response codes depends on factors other than a direct route.
475
aapc23.fm Page 476 Wednesday, December 5, 2001 10:10 AM
476
Common mechanisms in perception and action
The Simon effect also reverses or disappears when subjects practice with spatially incompatible S–R mappings during the previous days (Proctor and Lu 1999; Tagliabue, Zorzi, Umiltà, and Bassignani 2000). Proctor and colleagues have also shown in a recent series of experiments that mixing location-relevant trials in an otherwise typical Simon task enhances the Simon effect when location-relevant trials are spatially compatible, and reverses it when location-relevant trials are incompatible (Proctor and Vu, this volume, Chapter 22). These results, as noted by the authors, are problematic for the direct or automatic route accounts of the Simon effect. From the evidence just reviewed, it is clear that the Simon effect can be reversed by task instructions, presence of alternative spatial codes, practice with incompatible SR mappings, or mixing location-relevant trials with location-irrelevant trials. All these manipulations share in common their strategic nature, and suggest that top-down control can reverse the Simon effect. It is an open question whether the effect itself is also a product of strategy-driven processes. Strategic control has been demonstrated in several con2ict tasks by manipulating the proportion of compatible and incompatible trials. The spatial Stroop effect (Logan 1980, 1985; Logan and Zbrodoff 1979), the 2anker compatibility effect (Gratton, Coles, and Donchin 1992), and the Simon effect (Hommel 1994; Stürmer and Leuthold 1998; Toth et al. 1995) all decrease their size, or even reverse, when compatible trials become infrequent. The interpretation of these results, however, poses a problem, since when the proportion of compatible and incompatible trials is quite different (20/80, for example), the stimulus irrelevant dimension is only formally irrelevant. Consequently, it could be expected that subjects would pay attention to it, in addition to or instead of, the nominally relevant dimension. Whether or not this is the case, these results indicate that variations in global probability (i.e. subjects’ expectancies) do exert a strong in2uence on the Simon effect and other con2ict tasks. Another way to look for contextual in2uences in con2ict tasks has been to study 1rst-order sequential dependencies (i.e. the in2uence of trial N-1 on the performance of trial N). First-order compatibility dependencies have been found in the Stroop (Verleger 1991) and in the 2anker compatibility effects (Botvinick, et al. 1999; Gratton et al. 1992). Recently, the 1rst-order sequential effects in the Simon effect have begun to receive attention. It has been reported that the Simon effect reverses (Valle-Inclán, Hackley, and McClay 1998), vanishes (Stürmer and Leuthold 1998), or decreases (Praamstra, Kleine, and Schnitzler 1999) when the previous trial is incompatible. Accounting for these sequential effects would seem to require a revision of the automatic link between spatially compatible S–R pairs. Stürmer and Leuthold (1998) proposed that the automatic route was inhibited following an incompatible trial (see also Stoffels 1996). Obviously, this interpretation could explain the reduction of the Simon effect after an incompatible trial, but it is dif1cult to see how it could account for the reversal found by Valle-Inclán et al. (1998). Hommel (2000) explained these sequential effects as a result of repetitions and alternations of stimulus and response feature conjunctions (see below). Thus, there are data that challenge the purely stimulus-driven explanations of the Simon effect. If these results are not considered as aberrations, then strong modulations of the automatic route (leading even to abolition and reversal of the Simon effect) would have to be allowed in stimulus-driven models to account for the 1ndings. The consequence is that much of the variance in the Simon effect would be accounted for by the operation of those modulating mechanisms. The next two experiments present behavioral and physiological data on the in2uence of sequential dependencies on the Simon effect. First-order dependencies in choice reaction time tasks are ubiquitous and can take the form of repetition effects (faster RT when consecutive trials are identical in
aapc23.fm Page 477 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
some respect) or alternation effects (faster reaction when the preceding trial is different). Repetition effects are regarded as a manifestation of automatic facilitation, or priming, whereas alternations are considered to re2ect strategic behavior (an example of the gambler’s fallacy, Kirby 1980). These sequential effects are dependent on the response-to-stimulus interval (RSI), the S–R compatibility, and the categorizability of the stimulus set. In the studies to be described, the RSI was long (> 1 s), the S–R compatibility was low, and the stimuli were not categorizable. Under these conditions, stimulus repetition effects are expected (e.g. Bertelson, 1963). Response repetition effects, however, are not expected since they tend to appear when stimuli and responses are easily categorizable (e.g. letters assigned to one response and digits assigned to the other response, Campbell and Proctor 1993; Pashler and Baylis 1991). By contrast, response alternation effects have been shown to appear with long RSIs and noncategorizable stimuli (as in Experiment 2). Another sequential effect that is relevant to our research is the SRC repetition effect (faster RT for repetitions than for alternations of stimulus–response compatibility or incompatibility). Duncan (1977) proposed that this repetition effect indicated that SR mapping rules, not just the physical correspondence of stimulus and response, are selected during the response selection processes. Hommel (2000) has proposed that S and R features that co-occur in time become temporally associated (bound). As a result, facilitation would appear when the previous binding can be used, and interference would be evident when previous associations trigger the incorrect response. According to this proposal, the fastest RT should be found on complete repetitions and complete alternations (because previous associations are not triggered), and the slowest responses would appear in partial repetitions (previous associations would interfere with current processing). This last prediction is contradicted by the results in two-to-one mappings assignments in which stimuli are not categorizable. In these cases, the typical 1nding (e.g. Campbell and Proctor 1993) is that partial repetitions (i.e. response repetition and stimulus alternation) are faster than complete alternations. The generality of Hommel’s account is undermined, but it remains an adequate model for most data generated in Simon tasks. In such tasks, complete repetitions and complete alternations (the fastest RT according to the theory) are to be found in CC (compatible–compatible) and II (incompatible–incompatible) sequences, while partial repetitions (the slowest RT) occur in the IC and CI sequences.
23.2 Experiment 1 The behavioral data to be reported were collected in an experiment with electrophysiological measures, but these measures will not be considered here (see Valle-Inclán et al. 1998). The experiment aimed to explore the in2uence of sequential dependencies in a Simon task. Given the similarities of the Simon, Stroop, and 2anker compatibility effects, it was predicted that the Simon effect should decrease, or even reverse, if the previous trial was incompatible (see Gratton et al. 1992; Verleger 1991). The experiment comprised a cross-modal Simon task with visual imperative stimuli presented at 1xation and irrelevant noise bursts (i.e. accessory stimuli). Accessory stimuli and responses were aligned along the vertical meridian, as described below. Subjects. Sixteen students (19–25 years) volunteered for the experiment and received academic credits for their participation. Procedure. Visual targets (the letters S and T subtending 1°) were presented in the center of a VGA monitor simultaneously with a noise burst (65 dB, 100 ms) coming from a speaker located about 1 m above or below the monitor. The noise location and letter were selected randomly on
477
aapc23.fm Page 478 Wednesday, December 5, 2001 10:10 AM
478
Common mechanisms in perception and action
every trial. Subjects placed their hands on an inclined keyboard slanted perpendicularly towards the screen. Thus, the hand closer to the screen operated the upper keys and the other hand operated the lower keys. Subjects reacted to the letter by pressing a sequence of three keys with the index, ring, and middle 1ngers. The upper keys were ‘8’, ‘i’, and ‘k’ and the lower keys were ‘3’, ‘w’, and ‘a’. We used the vertical meridian, instead of the more common horizontal S–R arrangement, to avoid contamination of motor potentials by lateralized visual ERPs (see Valle-Inclán 1996a). The three-key sequence for responding was used because Hackley and Miller (1995) have shown that the lateralized readiness potential (LRP) amplitudes are larger with complex movements. RT was measured from visual stimulus onset to the 1rst keypress. The experiment consisted of six blocks of 112 trials each. The intertrial interval varied randomly between 500 ms and 4000 ms in 250 ms steps. The assignment of letter to hand and the placement of hands (above/below) were counterbalanced across subjects. Trials with an incorrect response or with RT greater than 2000 ms or less than 100 ms were excluded from the analysis. The accepted trials were classi1ed according to the compatibility of previous and current trial, generating four sequences: compatible–compatible (CC), compatible– incompatible (CI), incompatible–compatible (IC), and incompatible–incompatible (II). Mean RT for each of the 1rst and higher-order sequences was calculated including only the cases with correct responses on all trials. For the accuracy analysis, only cases in which the previous trials were accurate were included. The data were analyzed with 2 × 2 repeated measures MANOVAs with factors of compatibility on previous trial and compatibility on current trial.
23.2.1 Results The overall results showed a small (7 ms) but not signi1cant Simon effect. Since our interest concerned sequential dependencies in the Simon effect, we selected those subjects (n = 8) who did show
Fig. 23.1 Experiment 1. Mean RT for the four possible compatibility sequences within the whole group (left panel) and for the subset of subjects who showed a Simon effect (right panel). The 1gures in parenthesis are the percentage of errors on each condition.
aapc23.fm Page 479 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
a Simon effect and analyzed them separately. Figure 23.1 plots the mean RT as a function of previous and current trial compatibility for all subjects (left panel) and for those subjects who showed a Simon effect (right panel). In the analysis of the whole group, there was a 21 ms Simon effect when previous trial was compatible (CI–CC trials) and a reverse Simon effect (–14 ms) when previous trial was incompatible (II–IC trials). CC trials were 18 ms faster than IC trials, and II trials were 17 ms faster that CI trials. These results indicate a clear SRC repetition effect, and were con1rmed by a strong interaction between previous and current compatibility, F(1, 15) = 52.55, p < 0.0001. Further comparisons showed that all the differences among the four conditions were signi1cant. The right panel of Fig. 23.1 contains the mean RTs (and proportion of errors) for those subjects who did exhibit a Simon effect. A normal Simon effect of 32 ms appeared between CI and CC trials. An 11 ms reversal of the Simon effect was found between II and IC trials. There was also an SRC repetition effect with similar sizes for compatible trials (IC–CC trials, 19 ms) and for incompatible trials (CI–II trials, 14 ms). These results yielded a signi1cant main effect for current compatibility, F(1, 7) = 10.26, p < 0.01, and an interaction between previous and current compatibility, F(1, 7) = 57.89, p < 0.0001. The percentage of errors (see Fig. 23.1) was low for all conditions, and the results for this measure should be regarded cautiously. The Simon effect (i.e. less errors on compatible than on incompatible trials) was not signi1cant for the whole group, nor for the subgroup of subjects who had a Simon effect. As in the RT results, the previous-by-current compatibility interaction was signi1cant for the whole group, F(1,15)=11.28, p < 0.004, and also for those subjects with a Simon effect, F(1,7) = 9.05, p < 0.02. The SRC sequences were categorized according to stimulus (and response) repetition and alternation sequences. For example, half of CC and II sequences consisted of repetition of the visually presented letter, the location of the auditory accessory stimulus, and the key-press response (a complete repetition), and the other half consisted of complete alternations. Therefore, distinct measurements of the SRC repetition effect (CI + IC – II – CC) were obtained when stimulus and response were repeated and when they alternated. The results showed no signi1cant differences; thus, it seems that this SRC repetition effect is independent of other sequential effects.
23.2.2 Discussion The experiment showed that the sign of the Simon effect depends on previous trial compatibility. There was a normal Simon effect when the previous trial was compatible (the difference between CC and CI trials) and a reverse Simon effect when the previous trial was incompatible (the difference between IC and II trials). Performance was better when compatibility stayed the same (CC and II trials) than when compatibility changed from one trial to the next (IC and CI trials), suggesting the presence of an SRC repetition effect. SRC repetition effects have been described by Duncan (1977) in a four-choice RT task, but in that experiment the SRC sequences were confounded with the stimulus and response sequences. In our experiment, the SRC repetition effect had the same magnitude for stimulus repetitions and alternations, which suggests that these sequential effects were independent. First-order compatibility effects in the Simon effect have been reported by Stürmer and Leuthold (1998) and Praamstra et al. (1999). These authors’ interpretation was that the automatic S–R route was inhibited after an incompatible trial and was active after a compatible trial. This interpretation allows for decrements in size of the Simon effect, but further assumptions would be needed to
479
aapc23.fm Page 480 Wednesday, December 5, 2001 10:10 AM
480
Common mechanisms in perception and action
account for an actual reversal, as was found in the present study. One possibility would be to assume that the SRC repetition effect can override the automatic tendency to make a compatible response, thereby yielding a reversal of the Simon effect when previous trial was incompatible. This assumption, however, is inconsistent with the assumption of some authors (e.g. Kornblum et al. 1990) that the automatic tendency to react compatibly cannot be overridden. Note that while the SRC repetition effect is an empirical fact, the notion of automatic activation of the compatible response is an explanatory concept which implies the existence of long-term associative pathways between spatially compatible stimulus and response codes. Such long-term S-R connections should presumably be dif1cult to override; however, Proctor and Lu (1999) and Tagliabue and colleagues (2000) have shown that even a relatively short practice session with incompatible S–R mapping reverses the Simon effect. The implication is that it is not the current spatial S–R relationship that is the primary determinant of behavior, but rather, its interaction with previously learned spatial S–R associations. These effects could re2ect the in2uence of subjects’ expectancies about the proportion of compatible and incompatible trials. Consider now that in everyday life the vast majority of spatial S–R associations are compatible or neutral. Prying with a lever, steering a boat, and avoiding objects on the side of the road while driving are three examples of incompatible mapping in natural environments, but there are not many more. Therefore, it is tempting to think that subjects tend to expect compatible trials. A variant of this idea would be that the mapping subjects have most recently encountered before entering the lab was presumably compatible, and this temporary SR association would be responsible for the overall advantage of compatible trials (see Proctor and Lu 1999; Tagliabue et al. 2000). In functional terms, the effects of the biased expectancy toward compatible trials are very similar to an automatic activation of the compatible response. The critical difference is that the automatic route would be 1xed (even hardwired in some accounts) whereas expectancies are 2exible. In conclusion, we propose that the present results are best explained by a two-factor model comprised of SRC repetition effects and biased expectancies about the proportion of compatible trials. This two-factor account would predict the fastest RT when both factors activate the correct response (CC trials) and slowest when both factors activate the incorrect response (CI trials). Intermediate RT would be expected for IC and II trials, since the correct and incorrect response would both be activated, one by expectancy, the other by SRC repetition.
23.3 Experiment 2 The second experiment was intended to isolate the SRC sequences from S and R sequences (which were partially confounded in the previous experiment). To this end, we adapted a Simon task to a design originally developed by Bertelson (1965) for localizing 1rst-order sequential dependencies. Bertelson (1965) used a two-choice RT task in which two stimuli were assigned to each response. This so-called many-to-one mapping yields three types of transitions on consecutive trials. The two trials can be Identical (I, same stimulus and response), Equivalent (E, different stimuli but same response), or Different (D, different stimuli and responses). Soetens (1998, Experiment 2) has previously used this approach to study the Simon effect. He concluded that subjects build up expectancies across trials concerning the irrelevant stimulus dimension that correspond with the spatial arrangement for the responses. He found that the size of Simon effect was larger on Equivalent and Different than on Identical sequences. This suggests that the Simon effect is due, at least in part, to the in2uence of strategic behavior on the response side of the S–R pathway.
aapc23.fm Page 481 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
In Experiment 2, we also studied the higher-order SRC sequences. Typically, long RSIs such as those employed in this study can produce both higher-order repetition and higher-order alternation effects (see Soetens 1998). The higher-order alternation effects are considered to re2ect expectancies, while higher-order repetitions represent automatic facilitation and, possibly, expectancies. We analyzed fourth-order SRC sequences following the procedure of Soetens et al. 1985. In addition to behavioral measures, the electroencephalogram (EEG) was registered and eventrelated potentials (ERP) were extracted from these recordings. ERP research on the Simon effect has consistently shown that P300 latency increases on incompatible trials. This 1nding was 1rst described by Ragot, Renault, and Rémond (1980) and has been replicated many times since then. P300 amplitude is typically smaller on incompatible trials (e.g. Valle-Inclán 1996a). In recent years, the Simon effect has been investigated using the lateralized readiness potential (LRP). This is a physiological index of response preparation that was developed independently by Coles and Gratton (1986) and Smid, Mulder, and Mulder (1987). The LRP is a measure of the differential activation of the two hemispheres as recorded from electrodes placed over the motor cortex on each side. Preceding a hand or 1nger movement, a bilateral and initially symmetrical scalp negativity is recorded over the motor cortex (Kornhuber and Deecke 1965). Then, several hundreds of milliseconds before response execution, the voltages at sites contralateral to the side of the intended movement become more negative than those measured at ipsilateral sites (Kutas and Donchin 1974). These potentials are largest for recordings at sites near C3 and C4 of the 10/20 system. When these recordings are subtracted, the sign of the difference corresponds with the side of the intended movement. Speci1cally, the subtraction C3–C4 results in a positive de2ection for left-hand movements and a negative de2ection for right-hand movements. Subtracting trials with right-hand responses from those with left-hand responses yields the LRP. Computed in this manner ([(C3–C4 left response)–(C3–C4 right response)]/2), the correct response activation is manifested as a positive de2ection and the incorrect response activation as a negative de2ection (see Osman, Bashore, Coles, Donchin, and Meyer 1992). In functional terms, the LRP offers a ms-by-ms index of the preferential activation of one response over the other (Coles 1989; Miller and Hackley 1992). Using the LRP, activation of the incorrect response on incompatible trials has been demonstrated in both the 2anker compatibility (Gratton et al. 1988, 1990, 1992; Smid et al. 1987, 1990) and Simon tasks (de Jong et al. 1994; Valle-Inclán 1996a, 1996b). These effects are generally restricted to stimulus-locked LRP, but have recently been reported on response-locked LRP (Masaki, Takasawa, and Yamazaki, 2000; van der Lubbe and Woestenburg 1999). The LRP signature of incorrect response activation on incompatible trials has been interpreted as strong support for stimulus-driven models (De Jong et al. 1994). However, what these LRP results strictly indicate is that the incorrect response is activated on incompatible trials. They do not necessarily imply that the response activation is automatically triggered by stimulus presentation, per se. Incorrect response activation has also been found on slow reaction trials in a choice RT task that involved centrally presented stimuli and no response con2ict (Smulders, Kenemans, and Kok 1996). In accordance with the results and interpretation of the previous experiment, we predicted that LRP signs of incorrect response activation would be large on incompatible trials preceded by a compatible trial and small when preceded by an incompatible trial. In addition, the LRP onset latencies can be used to localize the SRC sequential effects. Effects at loci prior to response selection should produce latency differences on the time interval extending from stimulus onset to LRP onset and no effects on the interval from LRP onset to keypress. By contrast, late motoric effects would be re2ected only within the LRP-to-keypress interval (see Hackley and Valle-Inclán 1998).
481
aapc23.fm Page 482 Wednesday, December 5, 2001 10:10 AM
482
Common mechanisms in perception and action
23.3.1 Subjects Sixteen women (aged 19–29 years, all right-handed, with normal or corrected-to-normal vision) volunteered for the experiment and were given academic credits. All subjects were naive as to the purpose of the experiment.
23.3.2 Stimuli and EEG recording and processing The task was a many-to-one mapping with four stimuli and two responses. Subjects viewed the display at a distance of 60 cm, with their chins on a chin rest. Their index 1ngers rested on two keys placed perpendicularly to the screen. These keys were labeled during the instructions and at the beginning of each block as the ‘upper key’ (the key closer to the monitor) and the ‘lower key’ (the key closer to the subject). The task was divided into 20 blocks of 50 stimuli each with short breaks between two blocks. On each trial, a white character on dark background, randomly chosen among X, H, 3, and 6, was presented for 112 ms (above or below a 1xation point in the midline of a VGA monitor). One key was assigned to the stimuli X and 3, and the other to H and 6, counterbalanced across subjects. The interstimulus interval was 2225 ms. The EEG was recorded from an array of 29 tin scalp electrodes referred to left ear lobe. Eye movements and blinks were recorded from two pairs of electrodes above and below the eye and near the external canthi of each eye. Signals were ampli1ed with a 0.01–100 Hz bandpass 1lter and digitized at 250 Hz. Electrode impedance was below 5 kΩ. Eye movements and blinks artifacts were corrected with the procedure of Gratton, Coles, and Donchin (1983). Trials with incorrect responses or that were preceded by an incorrect reaction, trials with RT larger than 2000 ms or smaller than 100 ms, and trials with EEG values larger than 75 uV were excluded from the analysis. The EEG epochs were low-pass 1ltered to 8 Hz before averaging. The LRP was calculated as described above. LRP deviations from zero, indicating the preferential activation of one response over the other, were tested for each condition using a t-test on each digitized point. Differences in LRP onset were tested using the jackknife procedure developed by Miller, Patterson, and Ulrich (1998).
Table 23.1 MANOVA results for RT and accuracy results from Experiment 2. ‘Transition’ refers to identical, equivalent, and different sequences. ‘SRC repetition’ refers to repetition/alternation of compatibility from trial N-1 to trial N. ‘Compatibility’ refers to compatibility on trial N Source
Transition SRC repetition Compatibility Transition × SRC repetition Transition × Compatibility SRC repetition × Compatibility Transition × SRC rep × Comp.
RT
Accuracy
Df
F value
p<
F value
p<
2,30 1,15 1,15 2,30 2,30 1,15 2,30
51.55 64.16 90.09 8.96
.0001 .0001 .0001 .001 ns ns ns
26.89 27.78 22.25 8.60 18.15 16.45
.0001 .0001 .0001 .001 .0001 .001 ns
aapc23.fm Page 483 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
23.3.3 Results 23.3.3.1 First-order sequences Only those sequences with correct response on both trials were included in the RT analyses. For the accuracy analyses, the 1rst trial in the sequence had to be correct. Trials were sorted according to the transition (Identical, Equivalent, Different; I, E, D), the SRC repetition/alternation from previous to current trial, and the compatibility on the current trial. These data were analyzed using a 3 × 2 × 2 repeated measures MANOVA. Table 23.1 contains the summary of the statistical results for RT and the percentage of errors. Figure 23.2 shows the group means for RT (left panel) and the percentage of errors (right panel). There was a Simon effect for both RT and accuracy (C = 556 ms, 4.5% errors; I = 588 ms, 9% errors), as indicated by the main effect of compatibility. The Simon effect on trial n was 75 ms (9.30% errors) when trial n–1 was compatible, and reversed (not statistically signi1cant) to –10 ms (– 0.30% errors) when trial n–1 was incompatible. The SRC repetition effect (i.e. faster RT for SRC repetitions than for SRC alternations) found in the previous experiment, was replicated (SRC repetitions = 551 ms, 4.34% errors, SRC alternations = 593 ms, 9.15% errors), as re2ected in the main effect for this factor. The type of transition had a large effect on both RT and accuracy. The fastest and most accurate reactions were those on Identical transitions (538 ms, 3% errors) followed by Different (574 ms, 5.7% errors) and by Equivalent transitions (604 ms, 11.54% errors), as re2ected by the main effect of type of transition. The differences between the three conditions were all signi1cant (p < 0.001). This RT pattern (I < E > D) would be expected with long RSI intervals and noncategorizable S and R sets (e.g. Smith 1968). This pattern is also congruent with the integration of stimulus and response
Fig. 23.2 Experiment 2. Mean RT (left panel) and percentage of errors (right panel) as a function of compatibility repetition or alternations in the three types of transitions (Identical, Equivalent, and Different; C = Compatible, I = Incompatible). The two letters close to each point indicate compatibility on the previous trial (1rst letter) and on the current trial (second letter).
483
aapc23.fm Page 484 Wednesday, December 5, 2001 10:10 AM
484
Common mechanisms in perception and action
features proposed by Hommel (2000, reviewed above) which predicts fastest RT on Identical and Different transitions and slowest on Equivalent. The type of transition in2uenced the SRC sequential effects but not the size of the Simon effect. Although the Simon effect was larger on Different (41 ms) than on Identical and Equivalent transitions (28 ms and 27 ms, respectively), these differences did not attain signi1cance (i.e. there was no interaction between type of transition and compatibility). The SRC repetition effect signi2cantly decreased from Identical (61 ms) to Equivalent (39 ms) and Different transitions (26 ms), as indicated by the interaction between SRC repetition and transition. It should be noticed that SRC sequences are confounded with stimulus location sequences on Identical and Equivalent transitions (i.e. SRC repetition/alternation goes with stimulus location repetition/alternation). This is not the case on Different transitions, in which SRC repetition/alternation implies stimulus location alternation/repetition. The 1nding of an SRC repetition effect on Different transitions, F(1, 15) = 15.03, p < 0.001, indicates that this sequential effect can be obtained in the absence of other repetition effects (as suggested in the previous experiment). Finally, the interaction between SRC repetition and compatibility was not signi1cant when the three types of transitions were considered together. Inspection of Fig. 23.2 suggests that this interaction was absent on Identical and Equivalent transitions, but might be present on Different transitions. Separate analyses con1rmed that the SRC repetition × Compatibility interaction was signi1cant on Different transitions, F(1, 15) = 10.26, p < 0.006, re2ecting the fact that the SRC repetition effect was larger for compatible than for incompatible trials.
23.3.4 Higher-order sequences Trials were classi1ed according to SRC repetition and alternation on the previous four trials (16 different sequences). Only those sequences in which all trials had correct responses were included in the RT analysis. In the accuracy analysis, all trials but the last one had to be correct. The mean RT on the last trial of each sequence was computed and the results plotted as a function of the SRC sequence (following the procedure of Soetens et al. 1985). Figure 23.3 contains the group means for each SRC sequence. The left branch of the plot corresponds to those sequences ending with an SRC repetition, and the right branch consists of those ending with an SRC alternation. The statistical analysis was a 2 × 8 repeated measures MANOVA with factors of 1rst-order sequence (the last trial in the sequence could be a repetition or an alternation) and higher-order sequence (see Soetens 1998). The RT data showed signi1cant effects for 1rst-order, F(1, 15) = 51.87, p < 0.0001, and for the interaction between 1rst- and higher-order sequences, F(7, 105) = 5.34, p < 0.0001, suggesting a cost–bene1t pattern. Analyses performed separately on each branch demonstrated that there was a signi1cant linear trend for the repetition branch, F(1, 15) = 18.08, p < 0.0001, and also for the alternation branch, F(1, 15) = 6.68, p < 0.01. The accuracy data showed main effects of 1rst-order sequences, F(1, 15) = 18.34, p < 0.001, and higher-order sequences, F(7,105) = 2.76, p < 0.01. The interaction was also signi1cant, F(7, 105) = 5.49, p < 0.0001. The linear trend was absent on the repetition branch and was signi1cant on the alternation branch, F(1, 15) = 23.83, p < 0.0001.
23.3.4.1 LRP results Figure 23.4 contains the stimulus-locked LRP grand average for the 1rst-order compatibility sequences. Incorrect response activation is indicated by negative de2ections, which are only present
aapc23.fm Page 485 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
Fig. 23.3 Experiment 2. Fourth-order compatibility sequences. Series of 1ve trials were classi1ed by the SRC repetition/alternation. RT on the 1fth trial was averaged separately for repetitions and alternations. brie2y on CI trials (signi1cantly different from zero in the interval 188 – 304 ms, t(1, 15) = 1.87 – 2.63, p < 0.01 – 0.01). The LRP onset latencies were estimated at 40% of peak amplitude and analyzed with a 2 × 2 within-subject design with factors SRC repetition (2) and compatibility (2) using the jackknife procedure (Miller et al. 1998). SRC repetitions had earlier onsets than SRC alternations (292 ms vs 354 ms), t (1, 15)= 4.88, p < 0.01; compatible trials had earlier latencies than incompatible trials (303 ms vs 354 ms), t(1,15) = 3.43, p < 0.01, and the SRC repetition effect was much larger for compatible than for incompatible trials (110 ms vs. 36 ms), t (1, 15) = 2.61, p < 0.01. Behavioral results showed that the Simon effect reversed after an incompatible trial (the comparison of IC and II), although this reversal did not attain signi1cance. The reversal of the effect was more reliable in the stimulus-locked LRP latencies (II trials = 336 ms, IC trials = 358 ms), t(1, 15) = 1.95, p < 0.05. Figure 23.5 shows the response-locked LRPs, an index of the response preparation and execution. The results also show incorrect response preparation on CI trials. The negative dip was signi1cant from 444 ms to 236 ms before response execution t(1, 15) = 1.77 – 2.53, p < 0.05 – 0.01. The responselocked LRP onset latencies did not show any signi1cant differences.
485
aapc23.fm Page 486 Wednesday, December 5, 2001 10:10 AM
486
Common mechanisms in perception and action
Fig. 23.4 Experiment 2. Stimulus-locked lateralized readiness potential (LRP). The left panel shows the results when the previous trial was compatible and the right panel, when previous trial was incompatible. The two letters labeling each trace indicate the compatibility on the previous trial (the 1rst letter) and on the current trial (the second letter). (C = compatible; I = incompatible.)
Fig. 23.5 Experiment 2. Response-locked lateralized readiness potential (LRP). The left panel shows the results when the previous trial was compatible and the right panel, when previous trial was incompatible. Unpublished data from our laboratories indicate that the LRP amplitude decreases when responses are repeated from trial to trial. Consequently, it could well be that small effects like the negative dip on incompatible trials would be obscured when responses are repeated. An implication is that incorrect response activation might be manifested on II sequences when responses alternate (such activation was not apparent in the overall analysis; see Fig. 23.4). To test this hypothesis, the LRP was computed separately for those sequences in which the responses were repeated and those in which the response alternated. A related issue is whether LRP
aapc23.fm Page 487 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
Fig. 23.6 Experiment 2. Stimulus-locked lateralized readiness potential (LRP) for response repetitions (Identical and Equivalent transitions) and for alternations (Different transitions). The terms of the subtractions used to obtain the LRP were de1ned with respect to the last trial in the 2-trial sequence. Thus, the main LRP de2ection on the second trial is always positive. On the 1rst trial, the LRP is also positive when responses are repeated, but it is negative if it involved the hand opposite to that used on the second trial. amplitude differences could be artifactually produced by different baselines in the two consecutive trials. To assess this possibility, the LRP was computed for the 2-trial sequence using the baseline of the 1rst trial. The calculations were done for the last trial in the sequence. If the response were repeated, then both trials would have large positive LRPs. By contrast, if the response alternates, then the 1rst trial would have an LRP that would be of opposite polarity to that of the second. Assuming that cortical activation persists across the intertrial interval, this could dramatically shift the baseline. Figure 23.6 shows the stimulus-locked LRP for response repetitions (Identical and Equivalent transitions pooled) and for response alternations (Different transitions). These recordings show that the LRP does not return to baseline after about 2000 ms, and that LRP amplitudes are smaller for response repetition than for response alternation. As a consequence of the slow recovery of the LRP, baselines on the second trial are different for repetition and alternation sequences. Under normal (1-trial) analytic methods, the effects of this baseline shift would be to artifactually enlarge the LRP for alternations and reduce it for repetitions of the response. As shown in Fig. 23.6, incorrect response
487
aapc23.fm Page 488 Wednesday, December 5, 2001 10:10 AM
488
Common mechanisms in perception and action
activation was not present on II sequences when responses alternated, but it was evident when responses were repeated.
23.3.5 Discussion In this experiment, the Simon effect reversed on Identical and Equivalent transitions when the previous trial was incompatible (i.e. the II trials were faster than the IC trials when responses repeated). By contrast, when the previous trial was compatible, the Simon effect was large. These results fully agree with those of the previous experiment and those of other authors (Proctor and Vu, this volume, Chapter 22; Stürmer and Leuthold 1998). The SRC repetition effect (faster RT when SRC is repeated than when it alternates) was confounded with stimulus location repetition in Identical and Equivalent transitions: CC and II sequences imply repetitions of stimulus location, while IC and CI sequences imply alternations of stimulus location. On Different transitions, the opposite is true: SRC repetition/alternation goes with stimulus location alternation/repetition, and vice versa. Therefore, there are SRC repetition effects even when all the other components of the trial alternate. For this reason, we assume that on Identical and Equivalent transitions stimulus location and SRC repetition effects act jointly to reverse the Simon effect. Previous LRP research has demonstrated incorrect response activation on incompatible trials, a 1nding that has been attributed to the automatic activation of the compatible response (De Jong et al. 1994; Valle-Inclán 1996a). The present results (Figs 23.4 and 23.5), however, indicate that incorrect response activation is critically dependent on the most recently used SR spatial transformation, not just on the spatial relationship between stimulus and response on the current trial. These results conflict with the widely accepted notion of automatic activation of the spatially compatible response (De Jong et al. 1994; Hommel 1993a; Kornblum et al. 1990). This assumption was also challenged by Valle-Inclán and Redondo (1998). They used a Simon task in which the assignment of stimuli to response keys was changed on every trial and was presented to the subjects either before or after the imperative stimulus. Contrary to the automaticity view, there were no LRP signs of response activation in the interval between the stimulus presentation and the S–R mapping instructions. Another notable 1nding in the present study was the incorrect response preparation in the response-locked LRPs for CI trials (see Masaki et al. 2000, and Van der Lubbe and Woestenburg 1999, for previous 1ndings of incorrect response activation in the response-locked LRP results). This result suggests that incorrect responses were aborted relatively late, and explains why CI trials had the largest proportion of errors. It could also indicate that processes during or subsequent to response selection do in2uence the Simon effect and, as suggested by Shiu and Kornblum (1999), they might even generate it. Note, however, the close overlap of the solid and dashed lines in the 200 ms interval preceding response onset in our data. This pattern of results indicates that late motoric processes do not manifest or contribute to the Simon effect. SRC repetition effects have been interpreted to support the assumption that the cognitive system uses SR rules, not just associations between stimulus and response pairs (Duncan 1977). Alternatively, Stoffels (1996) proposed that SRC sequential effects were due to the blocking of the automatic route after incompatible trials, and its opening after a compatible trial. A similar interpretation has been proposed by Stürmer and Leuthold (1998) and Praamstra et al. (1999). A third alternative has been put forward by Hommel (2000). According to this view, stimulus and response features of one trial become associated, and this temporal binding can enhance or impair the processing in the next trial. In principle, this feature-integration account is not the same as the application of an SR translation
aapc23.fm Page 489 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
rule, although it could be argued that these transient associations correspond to the implementation of SR rules. Rule-based accounts have dif1culties explaining the different patterns of results obtained for the three type of transitions (the prediction would presumably be the same pattern for all transitions; see Stoffels 1996). On the other hand, the clear reversal of the Simon effect on Identical and Equivalent trials (the difference between IC and II trials) can not be explained merely by assuming that the putative automatic route is attenuated after an incompatible trial. At the very least, stimulus-driven models would have to admit that any such automatic route linking compatible S–R pairs could be completely overridden by a combination of sequential repetition effects. The feature-integration hypothesis correctly predicts faster RT for Identical and Different than for Equivalent transitions. This hypothesis could also explain the reversal of the Simon effect if it is assumed that under some conditions, the associations built up in the previous trial can be stronger than the tendency to execute the spatially compatible response. As noted above, though, this hypothesis has the same dif1culties as the rule-selection account with regard to explaining the different behavioral patterns across transitions. Our LRP data (see Fig. 23.6) allow one to compare predictions from the rule-selection and the feature-integration approaches, under the common assumption that the incorrect response is brie2y activated on incompatible trials. If it is assumed that abstract S–R mapping rules mediate the SRC repetition effect, then the incorrect response should be absent when compatibility is repeated, since subjects are applying the correct S–R transformation. This prediction is in overt contradiction with the results in Fig. 23.6 (II sequences). Repetition of incompatibility when responses were the same produced noticeable LRP signs of incorrect response activation. This intriguing result could be explained extending the idea of integration to all stimulus and all response features activated on a given trial. It is known that relevant and irrelevant stimulus features are integrated (Hommel 1998) and perhaps irrelevant response activation (i.e. subthreshold activation of the incorrect response) also becomes part of the episodic memory structure left behind a trial. It follows that, when stimulus or responses are repeated, the correct and incorrect responses are both activated, as shown on II sequences for response repetitions in Fig. 23.6. The analyses of higher-order SRC sequential effects suggest that continuations of SRC repetition runs, and to a lesser extent SRC alternation runs, were expected by the participants. The effects of repeating SRC on several trials could originate from traces left by previous trials. The SRC alternation effects, although modest, indicate the presence of expectancies about SRC in Simon tasks, as previously found by Soetens (1998).
23.4 Conclusion The two experiments show that sequential effects determine the size and sign of the Simon effect. The studies also show that, overall, compatible trials are faster than incompatible trials, indicating that under the conditions of these experiments there is some factor priming the compatible response. One possible candidate would be a direct route linking spatially compatible stimuli and responses. The main problem with this is that it cannot account for the various reversals of the Simon effect described in the literature. Specially relevant to this discussion are the 1ndings of Proctor and Lu (1999) and Tagliabue et al. (2000). These investigators showed that practice with an incompatible S–R mapping eliminates or reverses the Simon effect, depending on the amount of practice and on the time gap between practice sessions and the Simon task. In other words, subjects build up expectancies about the relative proportions of compatible and incompatible trials based on their experience with
489
aapc23.fm Page 490 Wednesday, December 5, 2001 10:10 AM
490
Common mechanisms in perception and action
similar tasks in similar contexts. In functional terms, expecting a compatible response is very similar to an automatic activation of the compatible response. However, the critical difference between the two alternatives is that the route linking stimulus and response is 1xed for the stimulus-driven models (hard-wired even in some accounts) while expectancies are much more 2exible. In favor of this expectancy mechanism, the higher-order SRC sequential effects in our study showed a cost– bene1t pattern (indicative of strategic behavior) and the Simon effect tended to be larger on Different sequences (as in Soetens 1998).
Acknowledgments This research was 1nanced by the Spanish Ministry of Culture (PB96-1077). We thank Barbara McLay for assistance with Experiment 1 and William Gehring for providing the computer program to correct eye movements and blinks artifacts. We also thank two anonymous reviewers for their helpful suggestions for improving the paper.
References Bertelson, P. (1963). S–R relationships and reaction times to new versus repeated signals in a serial task. Journal of Experimental Psychology, 65, 478–484. Bertelson, P. (1965). Serial choice reaction time as a function of response versus signal-and-response repetition. Nature, 206, 217–218. Botvinick, M., Nystrom, L.E., Fissell, K., Carter, C.S., and Cohen, J.D. (1999). Con2ict monitoring versus selection-for-action in anterior cingulate cortex. Nature, 402, 179–181. Campbell, K.C. and Proctor, R.W. (1993). Repetition effects with categorizable stimulus and response sets. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1345–1362. Coles, M.G.H. (1989). Modern mind–brain reading: Psychophysiology, physiology, and cognition. Psychophysiology, 26, 251–269. Coles, M.G.H. and Gratton, G. (1986). Cognitive psychophysiology and the study of states and processes. In G.R. Hockey, A.W Gaillard, and M.G. Coles (Eds.), Energetics and human information processing, pp. 409–424. Dordrecht, The Netherlands: Nijhoff. Craft, J.L. and Simon, J.R. (1970). Processing symbolic information from a visual display: Interference from an irrelevant directional cue. Journal of Experimental Psychology, 83, 415–420. De Jong, R., Liang, C., and Lauber, E. (1994). Conditional and unconditional automaticity: A dual-process model of effects of spatial stimulus–response correspondence. Journal of Experimental Psychology: Human Perception and Performance, 20, 731–750. Duncan, J. (1977). Response-selection errors in spatial choice–reaction tasks. Quarterly Journal of Experimental Psychology, 29, 415–423. Gratton, G., Coles, M.G.H., and Donchin, E. (1983). A new method for off-line removal of ocular artifacts. Electroencephalograhy and Clinical Neurophysiology, 55, 468–484. Gratton, G., Coles, M.G.H., Sirevaag, E.J., Ericksen, C.W., and Donchin, E. (1988). Pre- and post-stimulus activation of response channels: A psychophysiological analysis. Journal of Experimental Psychology: Human Perception and Performance, 14, 331–344. Gratton, G., Bosco, C.M., Kramer, A.F., Coles, M.G.H., Wickens, C.D., and Donchin, E. (1990). Event-related brain potentials as indices of information extraction and response priming. Electroencephalograhy and Clinical Neurophysiology, 75, 419–432. Gratton, G., Coles, M.G.H., and Donchin, E. (1992). Optimizing the use of information: Strategic control of activation of responses. Journal of Experimental Psychology: General, 121, 4480–4506. Hackley, S.A. and Miller, J.O. (1995). Response complexity and precue interval effects on the lateralized readiness potential. Psychophysiology, 32, 230–241.
aapc23.fm Page 491 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
Hackley, S.A. and Valle-Inclán, F. (1998). Automatic alerting does not speed late motoric processes in a reaction-time task. Nature, 391, 786–788. Hasbroucq, T. and Guiard, Y. (1991). Stimulus–response compatibility and the Simon effect: Toward a conceptual clari1cation. Journal of Experimental Psychology: Human Perception and Performance, 17, 246–266. Hedge, A. and Marsh, N.W.A. (1975). The effects of irrelevant spatial correspondences on two-choice response-time. Acta Psychologica, 39, 427–439. Hommel, B. (1993a). The relationship between stimulus processing and response selection in the Simon task: Evidence for a temporal overlap. Psychological Research/Psychologische Forschung, 55, 280–290. Hommel, B. (1993b). Inverting the Simon effect by intention: Determinants of direction and extent of effects of irrelevant spatial information. Psychological Research/Psychologische Forschung, 55, 270–279. Hommel, B. (1993c). The role of attention for the Simon effect. Psychological Research/Psychologische Forschung, 55, 208–222. Hommel, B. (1994). Spontaneous decay of response-code activation. Psychological Research/Psychologische Forschung, 56, 261–268. Hommel, B. (1995). Stimulus–response compatibility and the Simon effect: Toward an empirical clari1cation. Journal of Experimental Psychology: Human Perception and Performance, 21, 764–775. Hommel, B. (1998). Event 1les: Evidence for automatic integration of stimulus–response episodes. Visual Cognition, 5, 183–216. Hommel, B. (2000). A feature-integration account of sequential effects in the Simon task. Manuscript submitted for publication. Kirby, N.H. (1980). Sequential effects in choice reaction time. In A.T. Weldford (Ed.), Reaction times, pp. 129–172. London: Academic Press. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility: A model and taxonomy. Psychological Review, 97, 253–270. Kornblum, S., Stevens, G.T., Requin, J., and Whipple, A. (1999). The effects of irrelevant stimuli: 1. The time course of stimulus–stimulus and stimulus–response consistency effects with Stroop-like stimuli, Simon-like tasks, and their factorial combinations. Journal of Experimental Psychology: Human Perception and Performance, 25, 688–714. Kornhuber, H.H. and Deecke, L. (1965). Hirnpotentialänderungen bei Willkürbewegungen und passiven Bewegungen des Menschen: Bereitschaftspotential und reafferente Potentiale. P2ügers Archive, 284, 1–17. Kutas, M. and Donchin, E. (1974). Studies of squeezing: Handedness, responding hand, response force, and asymmetry of readiness potential. Science, 186, 545–548. Lamberts, K., Tavernier, G., and D’Ydewalle, G. (1992). Effects of multiple reference points in spatial stimulus– response compatibility. Acta Psychologica, 79, 115–130. Logan, G.D. (1980). Attention and automaticity in Stroop and priming tasks: Theory and data. Cognitive Psychology, 12, 523–553. Logan, G.D. (1985). Executive control of thought and action. Acta Psychologica, 70, 193–210. Logan, G.D. and Zbrodoff, N.J. (1979). When it helps to be misled: Facilitative effects of increasing the frequency of con2icting stimuli in a Stroop-like task. Memory and Cognition, 7, 166–174. Lu, C.H. and Proctor, R.W. (1995). The in2uence of irrelevant location information on performance: A review of the Simon and spatial Stroop effects. Psychonomic Bulletin and Review, 2, 174–207. Masaki, H., Takasawa, N., and Yamazaki, K. (2000). An electrophysiological study of the locus of the interference in a stimulus–response compatibility paradigm. Psychophysiology, 37, 464–472. Miller, J.O. and Hackley, S.A. (1992). Electrophysiological evidence for temporal overlap among contingent mental processes. Journal of Experimental Psychology: General, 121, 195–209. Miller, J.O., Patterson, T., and Ulrich, R. (1998). Jackknife-based method for measuring LRP onset latency differences. Psychophysiology, 35, 99–115. Osman, A., Bashore, T.R., Coles, M.G.H., Donchin, E., and Meyer, D.E. (1992). On the transmission of partial information: Inferences from movement-related brain potentials. Journal of Experimental Psychology: Human Perception and Performance, 18, 217–232. Pashler, H. and Baylis, G. (1991). Procedural learning: 2. Intertrial repetition effects in speeded choice tasks. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 33–48. Praamstra, P., Kleine, B.-U., and Schnitzler, A. (1999). Magnetic stimulation of the dorsal premotor cortex modulates the Simon effect. NeuroReport, 10, 3671–3674. Proctor, R.W. and Lu, C.-H. (1999). Processing irrelevant location information: Practice and transfer effects in choice–reaction tasks. Memory and Cognition, 27, 63–77.
491
aapc23.fm Page 492 Wednesday, December 5, 2001 10:10 AM
492
Common mechanisms in perception and action
Proctor, R.W., Lu, C.H., and Van Zandt, T. (1992). Enhancement of the Simon effect by response precuing. Acta Psychologica, 81, 53–74. Ragot, R., Renault, B., and Rémond, A. (1980). Hemispheric involvement during a bimanual RT task: P300 and motor potential. In H.H. Kornhuber and L. Deecke (Eds.), Motivation, motor and sensory processes of the brain: Electrical potentials, behaviour and clinical use, pp. 736–741. Amsterdam: Elsevier/North Holland. Ratcliff, R. (1979). Group reaction time distribution and an analysis of distribution statistics. Psychological Bulletin, 86, 446–461. Riggio, L., Gawryszewski, L., and Umiltà, C. (1986). What is crossed in crossed-hand effects? Acta Psychologica, 62, 89–100. Shiu, L.-P. and Kornblum, S. (1999). Stimulus–response compatibility effects in go–no–go tasks: A dimensional overlap account. Perception and Psychophysics, 61, 1613–1623. Simon, J.R. (1982). Effect of an auditory stimulus on the processing of a visual stimulus under single- and dualtask conditions. Acta Psychologica, 51, 61–73. Simon, J.R. (1990). The effects of an irrelevant directional cue on human information processing. In R.W. Proctor and T.G. Reeve (Eds.), Stimulus–Response compatibility: An integrated perspective, pp. 31–88. Amsterdam: North-Holland. Simon, J.R. and Acosta, E. Jr. (1982). Effect of irrelevant information on the processing of relevant information: Facilitation and/or interference? The in2uence of experimental design. Perception and Psychophysics, 31, 383–388. Simon, J.R. and Craft, J.L. (1970). Effects of an irrelevant auditory stimulus on visual choice reaction time. Journal of Experimental Psychology, 86, 272–274. Simon, J.R. and Rudell, A.P. (1967). Auditory S–R compatibility: The effect of an irrelevant cue on information processing. Journal of Applied Psychology, 51, 300–304. Simon, J.R. and Small, A.M. Jr. (1969). Processing auditory information: Interference from an irrelevant cue. Journal of Applied Psychology, 53, 433–435. Simon, J.R., Hinrichs, J.V., and Craft, J.L. (1970). Auditory S–R compatibility: Reaction time as a function of ear–hand correspondence and ear–response–location correspondence. Journal of Experimental Psychology, 86, 97–102. Smid, H.G.O.M., Mulder, G., and Mulder, L.J.M. (1987). The continuous 2ow model revisited: Perceptual and motor aspects. In R.E. Johnson Jr., J.W. Rohrbaugh, and R. Parasuraman (Eds.), Current trends in eventrelated potentials research. Electroencephalography and Clinical Neurophysiology (Suppl. 40), pp. 270–278. Amsterdam: Elsevier. Smid, H.G.O., Mulder, G., and Mulder, L.J.M. (1990). Selective response activation can begin before stimulus recognition is complete: A psychophysiological and error analysis of continuous 2ow. Acta Psychologica, 74, 169–201. Smith, M.C. (1968). Repetition effect and short-term memory. Journal of Experimental Psychology, 77, 435–439. Smulders, F.T.Y., Kenemans, J.L., and Kok, A. (1996). Effects of task variables on measures of the mean onset latency of LRP depend on the scoring method. Psychophysiology, 33, 194–205. Soetens, E., Boer, K.C., and Hueting, J.E. (1985). Expectancy or automatic facilitation? Separating sequential effects in two-choice reaction time. Journal of Experimental Psychology: Human Perception and Performance, 11(5), 598–616. Soetens, E. (1998). Localizing sequential effects in serial choice reaction time with the information reduction procedure. Journal of Experimental Psychology: Human Perception and Performance, 24, 547–568. Stoffels, E.J., van der Molen, M.W., and Keuss, P.J.G. (1989). An additive factors analysis of the effect(s) of location cues associated with auditory stimulation on stages of information processing. Acta Psychologica, 70, 161–197. Stoffels, E.J. (1996). Uncertainty and processing routes in the selection of a response: An S–R compatibility study. Acta Psychologia, 94, 227–252. Stürmer, B. and Leuthold, H. (1998). Suppression of response priming in a Simon task. XII Evoked Potentials International Conference (EPIC XII), (pp. P2–07). Tagliabue, M., Zorzi, M., Umiltà, C., and Bassignani, F. (2000). The role of long-term-memory and short-termmemory links in the Simon effect. Journal of Experimental Psychology: Human Perception and Performance, 26, 648–670. Toth, J.P., Levine, B., Stuss, D.T., Oh, A., Winocur, G., and Meiran, N. (1995). Dissociation of processes underlying spatial S–R compatibility: Evidence for the independent in2uence of What and Where. Consciousness and Cognition, 4, 483–501.
aapc23.fm Page 493 Wednesday, December 5, 2001 10:10 AM
Does stimulus-driven response activation underlie the Simon effect?
Umiltà, C. and Nicoletti, R. (1992). An integrated model of the Simon effect. In J. Alegría et al. (Eds.), Analytic approaches to human cognition, pp. 331–350. Amsterdam: Elsevier. Umiltà, C., Rubichi, S., and Nicoletti, R. (1999). Facilitation and interference components in the Simon effect. Archives Italiennes de Biologie, 137, 139–149. Valle-Inclán, F. (1996a). The locus of interference in the Simon effect: An ERP study. Biological Psychology, 43, 147–162. Valle-Inclán, F. (1996b). The Simon effect and its reversal studied with ERPs. International Journal of Psychophysiology, 23, 41–53. Valle-Inclán, F. and Redondo, M. (1998). On the automaticity of ipsilateral response activation in the Simon effect. Psychophysiology, 35, 366–371. Valle-Inclán, F., Hackley, S.A., and McClay, B. (1998). Sequential dependencies with respect to the Simon effect. Journal of Psychophysiology, 12, 404. Valle-Inclán, F., Hackley, S.A., and de Labra, C. (2001). Spatial compatibility effects between the stimulated eye and response location (submitted). Van der Lubbe, R.H.J. and Woestenburg, J.C. (1999). The in2uence of peripheral precues on the tendency to react towards a lateral relevant stimulus with multiple-item arrays. Biological Psychology, 51, 1–21. Verleger, R. (1991). Sequential effects on response times in reading and naming colored words. Acta Psychologica, 77, 167–189. Zorzi, M. and Umiltà, C. (1995). A computational model of the Simon effect. Psychological Research/Psychologische Forschung, 58, 193–205.
493
aapc24.fm Page 494 Wednesday, December 5, 2001 10:11 AM
24 Activation and suppression in conflict tasks: empirical clarification through distributional analyses K. Richard Ridderinkhof
Abstract. The purpose of the present study was to explore and clarify the role of inhibitory processes in correspondence effects in con2ict tasks (and in the Simon task in particular), in which responses are typically slowed when an irrelevant stimulus feature is associated with the incorrect response. An activation-suppression hypothesis, describing a pattern of direct activation followed by the selective suppression of that activation, was developed and applied to the Simon task. Distributional analyses (in particular, delta plots for both response speed and accuracy) were argued to reveal the dynamics of these inhibitory processes. Three different empirical approaches provided evidence for differential patterns of selective suppression in Simon tasks. First, the results of an experimental manipulation, designed explicitly to vary the need to suppress (the context in which the Simon task appeared either emphasized or opposed the need to suppress the task-irrelevant location of the stimulus), provided independent evidence that differential patterns of suppression of location-driven direct activation showed up in diverging delta plot patterns. The delta plots for RT and accuracy revealed further that (1) the suppression of direct activation was more ef1cient for individuals who showed relatively small correspondence effects in overall RT, and (2) the operation of selective suppression of direct activation was much more relaxed after trials in which task-irrelevant stimulus features corresponded to the correct (as compared to the incorrect) response. These results were consistent with the predictions derived from the activation-suppression hypothesis, and point to the major role of suppression processes in correspondence effects. The distributional analyses were shown to be crucial in highlighting that role, since the dynamics of the direct-activation and selectivesuppression patterns were lost in the overall scores. The delta plot technique may be used generally to examine the ef1ciency of suppression processes between experimental conditions as well as between groups that are suspected to perform de1ciently in inhibitory control.
In the con2ict paradigm, represented by well-known tasks such as the Stroop, Simon, and Eriksen tasks, the typical observation is that responses are slowed when some irrelevant feature of the stimulus is associated with the response opposite to that associated with the relevant stimulus feature. By the term ‘stimulus–response correspondence effects’ we refer generally to the effects on response speed of the correspondence relations as they exist between stimulus aspects and response aspects. If one stimulus attribute is designated the target attribute (with each possible value of that attribute associated with one particular response), other stimulus attributes are designated irrelevant, but may nonetheless also have a correspondence relation with the required response. Under some conditions, relevant and irrelevant stimulus aspects have correspondence relations not only with the response, but also with each other. For example, response speed is facilitated when target and irrelevant stimulus aspects are inherently congruent compared with when their identities differ. For reasons of convenience, correspondence and non-correspondence relations will be denoted as CR and NCR, respectively.
aapc24.fm Page 495 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
In the Simon task, the subject’s task is to issue a discriminative response based on the identity of a target stimulus attribute, and to ignore irrelevant spatial information (such as the location in visual space of the target stimulus, or the side to which an irrelevant accessory tone is presented). Responses are typically slowed when the irrelevant spatial attribute corresponds to the side opposite to, rather than the same side as the response designated by the target feature (say, the color of the stimulus). Thus, the requirement to respond with the effector on the side opposite to, rather than on the side that corresponds spatially to the side of stimulation yields a substantial increase in reaction time (RT). In explaining such correspondence effects, many theorists have invoked (either explicitly or implicitly) some concept of suppression of the activation induced by the irrelevant stimulus location. It has proven dif1cult, however, to provide independent evidence for the role and nature of such inhibition. The main goal of the present study is to explore and clarify the role of activation suppression empirically. I will examine closely the distribution of response times (and the associated accuracy levels) obtained in a Simon task, and demonstrate that distributional analyses reveal temporal characteristics of suppression processes. To that end, I will compare distribution functions (cumulative density functions, conditional accuracy functions, and the associated delta plots for RTs and accuracy) obtained under conditions that were designed to vary the strength of suppression in a Simon task. One approach we’ll take is to examine individual differences in Simon effect size, under the assumption that individuals who display larger interference effects are less ef1cient in suppressing location-based activation. A second approach is to examine sequential effects, under the assumption that the presence/absence of incorrect activation on a preceding trial might in2uence the level of selective suppression on the current trial. As a 1nal approach, I will compare the results of identical Simon tasks that are embedded in two different contexts: one context emphasizing the need to suppress location-driven activation, the other context opposing this requirement. The results of these three approaches will converge on the conclusion that the ef1ciency of the selective suppression of direct activation forms a major factor in determining the presence and magnitude (and even direction) of correspondence effects. Moreover, it will be demonstrated that it is imperative to go beyond mean reaction time and overall accuracy in order to appreciate the signi1cance and the dynamics of this factor.
24.1 Direct activation in conflict tasks In recent years, dual-process conceptions of how perceptual codes lead to activation of the correct response have become increasingly popular. In such conceptions, perception–action coupling can be established via two parallel routes, one controlled and deliberate, the other fast, direct, and more or less automatic. Kornblum, Hasbroucq, and Osman (1990) set the stage with their dual-route model for S–R correspondence effects. Although dual-route models had been formulated previously (e.g. Frith and Done 1986; Sanders 1967), the Kornblum et al. model has served as a signi1cant impetus for subsequent research into S–R correspondence effects. Conceived on the basis of theoretical considerations rather than empirical tests, the model contains a number of discrete stages of processing, arranged partly in parallel. Basically, upon identi1cation, a stimulus is thought to deliberately activate the correct response code (S–R corresponding or non-corresponding, depending on instruction) via the controlled route, and to activate the S–R corresponding response code (independent of the S–R correspondence instruction) and the corresponding motor program via the direct route. If the two
495
aapc24.fm Page 496 Wednesday, December 5, 2001 10:11 AM
496
Common mechanisms in perception and action
Fig. 24.1 Elementary architecture of the dual-process model. No assumptions are made concerning the nature of processing (e.g. discrete vs. continuous) in the processes denoted by the boxes. response codes match, the motor program already activated via the direct route can be carried out quickly; if they mismatch, this motor program must be aborted in favor of the alternative motor program, whose retrieval and execution costs extra time. The rudimentary dual-route architecture of the Kornblum et al. (1990; cf. Kornblum and Stevens, this volume, Chapter 2) model has been embraced by many authors in the 1eld (e.g. de Jong, Liang, and Lauber 1994; Eimer, Hommel, and Prinz 1995; Proctor, Lu, Wang, and Dutta 1995; Ridderinkhof, van der Molen, and Bashore 1995; Stoffels 1996; for an overview see Ridderinkhof 1997). For instance, dual-process models have been proposed explicitly for S–R correspondence effects in the Simon task (de Jong et al. 1994) and arrow varieties of the Eriksen 2anker task (Ridderinkhof et al., 1995). A schematic representation of this type of model is depicted in Fig. 24.1. Most signi1cant, the controlled process of S–R translation (cf. Sanders 1980; Welford 1968) is bypassed by a direct activation route (Hommel 1993; Ridderinkhof et al. 1995); the two routes converge at the level of response activation processes. Direct activation effects are unconditional, in the sense that the response activated via the direct route is independent of S–R mapping instructions: a left-pointing irrelevant arrow activates the left-hand response, even when instructions require a right-hand response to a left-pointing target arrow. In event-related brain potential studies, so-called Lateralized Readiness Potentials (LRPs) re2ect the balance between activity recorded over the ipsilateral and contralateral primary motor cortex (for a comprehensive introduction of LRPs and other event-related brain potentials see Ridderinkhof and Bashore 1995). LRP results have supported the prediction, derived from dual-process models, that distractor features actually yield activation of the corresponding response in motor cortex, regardless of S–R mapping instructions, both in the Simon task (de Jong et al. 1994) and in the Eriksen 2anker task (Ridderinkhof, Lauer, and Geesken 1996).
24.2 Dynamics of direct activation in conflict tasks: distributional analyses The time to encode and identify stimulus features and to select appropriate responses on the basis of target features typically varies from trial to trial. For reasons that are not well understood to date, this variability is best described by ex-Gaussian distribution models (cf. Luce 1986). Let us, for the sake of argument, work on the assumption that the time to encode and identify the target and
aapc24.fm Page 497 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
non-target features is 1xed rather than variable; thus, both direct activation and deliberate response decision processes have a 1xed onset time. Now it can be argued that the slower the processing in the deliberate decision route, the more time there is for response activation along the direct activation route. On NCR trials, slower deliberate response decision processes would thus allow for more incorrect direct activation and, hence, slower correct responses. If deliberate response decision processes were to proceed relatively fast, then the effects of direct activation should be short-lived; the build-up of activation for the incorrect response along the direct-activation route would not be able to reach high amplitudes before activation based on the deliberate route took over. As a consequence, the correct response could be activated relatively fast. If deliberate response decision processes were to proceed relatively slowly, then the effects of direct activation should last longer; the build-up of activation for the incorrect response along the directactivation route could attain higher amplitudes before the correct response was activated along the deliberate route. As a result, activation for the correct response would start relatively late. If deliberate response decision processes were too slow, then the activation for the incorrect response along the direct-activation route could transgress the threshold at which an overt response is emitted. Note that the result is a relatively fast error; by contrast, if direct activation for the incorrect response were to stay just below the threshold for responding, the result would be a relatively slow correct response. On CR trials, these effects should work in the opposite direction, although compared with NCR trials the effects on CR trials are typically much less pronounced. If deliberate response decision processes proceed relatively fast, then the build-up of activation for the (correct) response along the directactivation route would be only small by the time the deliberate route produced its output. Thus, fast CR responses bene1t little from direct activation. If deliberate response decision processes were to proceed relatively slowly, then there would be more direct activation for the correct response; thus, slow CR responses would bene1t more from direct activation. To my knowledge, no data have been published that allow us to verify this set of predictions independently (but see Ridderinkhof and van der Molen 1993). In the present study the dynamics of direct activation and selective suppression will be examined using a special set of analytical tools: distributional analyses. These analyses will be conducted on behavioral (RT and accuracy) data obtained using Simon tasks. Before turning to the predictions derived from the activationsuppression hypothesis and the empirical results, the relevant distributional analyses will be introduced concisely in relation to the predictions derived above concerning natural variability in processing speed. Several tools are available for distributional analyses. I will focus on cumulative density functions (CDFs) and conditional accuracy functions (CAFs), and then turn to delta plots, which provide a convenient simpli1cation of the information present in CDFs and CAFs.
24.2.1 Cumulative density functions CDFs plot the cumulative probability of responding as a function of response speed. Fig. 24.2 (left panel) shows the sigmoid-shaped CDFs associated with ex-Gauss distributed RTs (here, we plotted RT decile scores) for two hypothetical conditions X and Y where one condition is associated with slower RTs than the other. The typical pattern is that the proportional difference in RT between the two conditions is similar across response speed quantiles; as a result, the absolute difference in RT between the two conditions increases from fast to slower quantiles (cf. Luce 1986).
497
aapc24.fm Page 498 Wednesday, December 5, 2001 10:11 AM
498
Common mechanisms in perception and action
Fig. 24.2 Left panel: Cumulative density functions (CDFs) for two hypothetical conditions X and Y as well as for hypothetical corresponding (CR) and non-corresponding (NCR) trials in a Simon task. Conditions X and Y are two arbitrary conditions, with condition Y associated with slower RTs than condition X. Conditions CR and NCR resemble conditions X and Y, respectively, but they also re2ect the additional direct-activation effects of irrelevant location in a Simon task. Slow NCR trials are affected more by the negative effects of direct activation for the incorrect response than fast NCR trials. Slow CR trials bene1t more from the positive effects of direct activation for the correct response than fast CR trials. CDFs were approximated by plotting, for each condition separately, the cumulative probability of responding as a function of mean RT for each of ten response speed deciles. Right panel: Delta plots for response speed for the hypothetical X and Y conditions and the hypothetical correspondence effects as derived from the cumulative density functions plotted in the left panel. Delta plots plot effect size as a function of response speed. Response speed is expressed in RT decile scores.
In addition to these ‘standard’ differences in CDFs between faster and slower conditions, CR and NCR conditions will display further differences according to the predictions outlined above. It was argued that slow NCR trials will be affected (i.e. slowed) more by the negative effects of direct activation for the incorrect response than fast NCR trials. Likewise, it was argued that slow CR trials will bene1t (i.e. speed up) more from the positive effects of direct activation for the correct response than fast CR trials. These patterns are illustrated in the CDFs in the left panel of Fig. 24.2.
24.2.2 Conditional accuracy functions CAFs plot accuracy of responding as a function of response speed (see Fig. 24.3, left panel; here, accuracy is plotted as a function of RT decile scores). If responses are so fast that they could not be
aapc24.fm Page 499 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
based on information available in the stimulus display, then the result is a fast guess with nearchance accuracy. The slower the response, the greater the chance of it being correct, reaching asymptote accuracy for the slowest responses. The smaller the incidence of fast guesses, the 2atter the CAFs. Figure 24.3 (left panel) shows the CAF patterns for the two hypothetical conditions X and Y from Fig. 24.2, where one condition is associated with slower RTs and higher error rates than the other. The typical pattern is that asymptote accuracy is attained for slow responses in both conditions, whereas faster responses are associated with more errors in the more dif1cult condition. According to the predictions outlined above, CR and NCR conditions will display differences in CAFs in addition to these ‘standard’ differences between faster and slower conditions. It was argued that NCR trials would be characterized by relatively many fast location-driven errors. No such argument could be made for CR trials. These patterns are illustrated in the CDFs in the left panel of Fig. 24.3 (using the RT decile scores plotted in Fig. 24.2).
Fig. 24.3 Left panel: Conditional accuracy functions (CAFs) for two hypothetical conditions X and Y as well as for hypothetical corresponding (CR) and non-corresponding (NCR) trials in a Simon task. Conditions X and Y are two arbitrary conditions, with condition Y associated with slower RTs and more errors than condition X. Conditions CR and NCR resemble conditions X and Y, respectively, but they also re2ect the additional direct-activation effects of irrelevant location in a Simon task. NCR trials (more than CR trials) are characterized by relatively many fast locationdriven errors. CAFs were approximated by plotting, for CR and NCR conditions separately, accuracy as a function of mean RT for each of ten response speed deciles. Right panel: Delta plots for response accuracy for the hypothetical X and Y conditions and the hypothetical correspondence effects as derived from the conditional accuracy functions plotted in the left panel. Delta plots plot effect size as a function of response speed. Response speed is expressed in RT decile scores.
499
aapc24.fm Page 500 Wednesday, December 5, 2001 10:11 AM
500
Common mechanisms in perception and action
24.2.3 Delta plots Distributional plots or delta plots are used to plot effect size as a function of response speed. They can be derived directly from the CDFs (when plotting RT effects) or the CAFs (when plotting accuracy effects). For each RT quantile, the difference in RT or accuracy between conditions A and B is plotted on the Y-axis against the mean of the RTs of conditions A and B in that quantile. The right panel of Fig. 24.2 shows delta plots for correspondence effects on RT, as derived from the CDFs in the left panel. The right panel of Fig. 24.3 shows delta plots for correspondence effects on accuracy, as derived from the CAFs in the left panel. De Jong et al. (1994) introduced the use of delta plots in Simon tasks, and asserted that the slopes between quantile points in delta plots for RT re2ect the relative time course of unconditional and conditional automatic activations, which they argued to occur in the reversal of the Simon effect reported by Hedge and Marsh (1975). Kornblum and his co-workers (Kornblum, Stevens, Whipple, and Requin 1999; Zhang and Kornblum 1997) disputed this position and showed how the slopes between quantile points in delta plots for RT are determined primarily by differences between CR and NCR conditions in terms of the variability in processing speed at several stages of processing. The slope of the delta plot re2ects the relationship between the variability parameters of the underlying CDFs, and positive and negative delta plot slopes can be produced merely by varying these parameters. As a result, they argued that the absolute slope between quantile points cannot be used to draw direct inferences about relative time courses. Two issues are important in evaluating the use of delta plots in con2ict tasks. First, one must be able to explain why correspondence effects involve the variability effects leading to the observed delta plot slopes, and formulate lucid predictions about the effects of experimental conditions on delta plots. Without a model that generates such a priori predictions, the interpretation of delta plots is post hoc and vulnerable to alternative interpretations in terms of factors that were not necessarily under experimental control. Second, we need to consider carefully which inferences can (and which cannot) be drawn validly from the slopes of delta plots. The interpretation of absolute delta plot slopes requires caution, as argued by Kornblum and co-workers. However, the objections that apply to interpreting absolute values of delta plot slopes do not necessarily apply to the interpretation of relative values of delta plot slopes. If the delta plot slope is more negative in one condition or group relative to another, this difference can be interpreted in a meaningful way. Similarly, we can validly explore processing dynamics by examining the points in time where delta plots converge and diverge between conditions. In the present work, I will consider a theoretical framework that generates unique predictions concerning delta plots for RT and accuracy in Simon tasks. This framework capitalizes on the suppression of location-driven direct activation. To circumvent the problems identi1ed by Kornblum and co-workers, I will consider the differences in delta plot slopes between conditions that are thought to differ in terms of inhibitory demands, rather than inspect the absolute slopes themselves. In the next section, I will clarify how the distributional analyses can be used to explore the dynamics of direct activation and selective suppression.
24.3 Selective suppression of direct activation The activation of responses along the direct route may be subject not only to spontaneous decay, but also to more active forms of inhibition. For instance, several authors have speculated about the role of
aapc24.fm Page 501 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
active inhibition in overcoming interference effects in the context of the Eriksen task (e.g. Eriksen and Schultz 1979; Ridderinkhof and van der Molen 1995a, 1997). Band and van Boxtel (1999) reviewed the cognitive-neuroscience literature on response inhibition and took the compiled evidence to support the notion that responses are held in check through inhibitory control, exercised by an executive system (located in prefrontal cortex) that supervises the 2ow of information through subordinate mechanisms (cf. Logan and Cowan 1984; Norman and Shallice 1986; Shimamura 1995). Manifestations of inhibitory control can occur anywhere in the system (for instance in primary motor cortex, but also upstream from it, or downstream). Response inhibition can be general (serving to inhibit any on-going motor activity, such as in stop tasks; Logan and Cowan 1984) or selective (serving to inhibit the activation for one response but not the other), depending on where in the system the effect is exerted. Behavioral evidence for the involvement of a central response supression mechanism in con2ict tasks came from a recent study using the Eriksen 2anker task in combination with a stop task, in which mutual in2uences were observed between the correspondence effects associated with NCR trials and the non-selective response suppression mechanism involved in stopping (Ridderinkhof, Band, and Logan 1999). Although independent evidence for such inhibition has not been delivered yet, LRP studies with con2ict tasks suggest that the initial direct activation subsequently undergoes selective suppression. For instance, Eimer and Schlaghecken (1998) observed that the initial response activation induced by a response-irrelevant precue arrow was followed by inhibition of that response. This selective suppression was so strong that it rendered a reversal of LRP lateralization, indicating the relative facilitation of the opposite response. Further LRP and behavioral work provided additional evidence for this pattern of ‘facilitation-followed-by-inhibition’ (Eimer 1999; Eimer and Schlaghecken 1998). In Fig. 24.4, the schematic representation of the dual-process model is extended to incorporate these selective suppression processes. Note that the central response suppression mechanism is different from the automatic decay of response code activation. The latter is a process that is (a) automatic, and (b) an inherent property of activation of response codes (cf. Hommel 1994; for a review, see Lu and Proctor 1995); whereas the central suppression mechanism is (a) active and non-automatic, and (b) externally imposed (presumably originating from PFC) upon activation in e.g. primary motor cortex (cf. Band and van Boxtel 1999).
Fig. 24.4
Extension of the dual-process model with selective suppression processes.
501
aapc24.fm Page 502 Wednesday, December 5, 2001 10:11 AM
502
Common mechanisms in perception and action
24.4 The dynamics of direct activation and selective suppression On the basis of the pattern of initial direct activation of a response followed by selective suppression of that activation, we now turn to predictions as to the effects of variability in the strength (or onset time) of selective suppression. Let us consider the activation/suppression patterns that would be predicted when the process of selective suppression of direct activation were to operate more strongly or more weakly than on average. In the previous section I discussed how natural variability in processing speed would be expressed in the distribution functions for CR and NCR trials (i.e. variability within conditions); in the present section I discuss how these patterns of natural variability are altered as a function of inhibitory demands (i.e. differences in variability between conditions). I 1rst consider NCR trials.
24.4.1 CDFs and delta plots for RT In conditions where selective suppression is relatively strong, the effects of direct activation should be shorter-lived than in conditions where selective suppression is relatively weak; the build-up of activation for the incorrect response along the direct-activation route would be able to attain a lesser magnitude before being corrected by selective suppression processes. As a consequence, activation for the correct response should be initiated earlier in strong-inhibition compared with weak-inhibition conditions. Thus, with weak inhibition only the slow NCR responses bene1t from selective suppression; the stronger the inhibition, the earlier in the RT distribution will responses bene1t from selective suppression. This is illustrated in the hypothetical CDFs in Fig. 24.5 (left panel). It can readily be seen that the same effect would be manifest when the suppression processes were to have an earlier onset than on average, or when suppression processes were to operate both earlier and more strongly than on average. On CR trials, these effects work in the opposite direction, although once more the effects on CR trials are typically much less pronounced compared with NCR trials. If deliberate response decision processes proceed relatively fast, then the build-up of activation for the (correct) response along the direct-activation route would be only small by the time the deliberate route produced its output. Thus, with stronger inhibition, fast CR responses would bene1t less from direct activation; with weaker inhibition, fast responses would bene1t more, and slower responses would also begin to bene1t somewhat from direct activation (see Fig. 24.5, left panel). The right panel of Fig. 24.5 displays the manifestations of weaker versus stronger selective suppression in delta plots for RT. Most noteworthy, the slopes between quantile points turn from positive to more negative relatively late when suppression is weak and progressively more early when suppression is stronger. The point of divergence between two delta plots (representing two different levels of inhibitory strength) is the critical variable in these comparisons.
24.4.2 CAFs and delta plots for accuracy It was shown that, due to natural variability in response speed, NCR trials yield more fast errors than CR trials. That is, deliberate response decision processes will sometimes be so slow that the activation for the incorrect response along the direct-activation route may exceed the threshold at which an overt response is emitted. It can be argued that in conditions where selective suppression is weak (or slow, or both), the activation for the incorrect response along the direct-activation route will exceed the
aapc24.fm Page 503 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
Fig. 24.5 Left panel: Cumulative density functions (CDFs) for two hypothetical conditions, one involving strong inhibition and one onvolving weak inhibition (see text). CDFs plot the cumulative probability of responding as a function of response speed. Response speed is expressed here in RT decile scores. Right panel: Delta plots for response speed for the hypothetical correspondence effects as derived from the CDFs in the left panel. Delta plots plot effect size as a function of response speed.
response threshold sooner than when selective suppression is strong (in other words, under a stronger inhibition regime, one may more often prevent incorrect activation from resulting in an overt response). Thus, in going from weak (or slow) to strong (or early) suppression, fewer fast NCR errors would occur. This is illustrated in the hypothetical CAFs in Fig. 24.6 (left panel). Straightforward effects of inhibition strength on conditional accuracy in CR trials are not anticipated. The right panel of Fig. 24.6 displays the manifestations of weaker versus stronger selective suppression in delta plots for accuracy. Most noteworthy, only the slopes between the earliest early quantile points differ signi1cantly between strong and weak suppression conditions, whereas at later quantiles these slopes differ less and approach zero.
24.5 The present study The present study was set up to explore the dynamics of direct activation and selective suppression, using the distributional analyses described above, under conditions that vary the strength of suppression in a Simon task. If suppression plays a role in the Simon effect according to the hypothesized mechanism, then experimentally induced differences in the degree of suppression should show up in diverging delta plots (in the direction speci1ed above). So if (and only if) it can reasonably be
503
aapc24.fm Page 504 Wednesday, December 5, 2001 10:11 AM
504
Common mechanisms in perception and action
Fig. 24.6 Left panel: Conditional accuracy functions (CAFs) for two hypothetical conditions, one involving strong inhibition and one involving weak inhibition (see text). CAFs plot the probability of correct responding as a function of response speed. Response speed is expressed here in RT decile scores. Right panel: Delta plots for accuracy for the hypothetical correspondence effects as derived from the CAFs in the left panel. Delta plots plot effect size as a function of response speed. argued that two Simon task conditions differ in terms of the degree of suppression, then the central prediction (i.e. that differential involvement of suppression shows up in differential delta plots) can be tested. Such a test would produce independent evidence that differential delta plots do signify differential involvement of a suppression mechanism in the Simon task. Once such independent evidence has been delivered, the delta-plot procedure can be used to address the question of whether experimental conditions (or groups) differ in terms of the degree of suppression in con2ict tasks. Three different analyses were conducted on the data from an experiment reported below. First, an experimental manipulation was designed explicitly to vary the strength of suppression across conditions that were held equal as much as possible in all other respects. That is, a regular Simon task was embedded in two different contexts: one context emphasizing the need to suppress location-driven activation as in the Simon, the other context opposing this requirement. The dynamics of direct activation and selective suppression were compared between these two conditions using delta plots. To the extent that the two conditions do indeed differ in terms of the strength of suppression of direct location-based activation, this difference should be expressed in differential delta plots; absence of such differential effects would argue against a role for suppression in the Simon effect. Second, the delta-plot procedure as validated in the 1rst analysis was used to examine (in a separate portion of the experimental data) whether individuals who differ in terms of sensitivity to correspondence effects in con2ict tasks differ also in terms of their capability to suppress direct activation as based on irrelevant stimulus features. It was hypothesized that those subjects who have
aapc24.fm Page 505 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
more ef1cient suppression capabilities would show smaller interference effects. Thus, subjects with smaller Simon effects were compared with those with larger Simon effects; their delta plots were then examined to establish whether indeed the ef1ciency of suppression processes differed between the groups. Third, it was hypothesized that monitoring the pattern of activation on the preceding trial (via bottom-up or top-down mechanisms) might have some remedial effect on performance in the current trial. The presence of incorrect activation on a preceding trial might enhance selective suppression on the current trial, irrespective of the (chances of) incorrect activation on the current trial. The delta plots of trials preceded by CR versus NCR trials were examined to establish whether the expected difference in the size of the Simon effect between those conditions could be attributed (at least in part) to differences in the patterns of selective suppression. It should be noted that several factors other than suppression of direct activation are likely to play a role in the occurrence of Simon effects. Such factors (including automatic decay, S–R binding, spatial referencing, etc.) were not brought under experimental control in the present study. Most importantly, however, from the hypothesized activation-suppression mechanism predictions were derived concerning differential delta plots; automatic decay and other factors do not give rise to such predictions. That does not imply that these factors do not contribute to the Simon effects in this study; it only implies that the delta plots are non-revealing with respect to these factors.
24.6 Analysis 1: context effects From the hypothesized activation-suppression mechanisms, two experimental conditions were derived that were meant to differ in terms of the involvement of suppression. An experimental manipulation of context was designed explicitly to vary the requirement to suppress direct-activation effects across conditions of the Simon task that were held equal as much as possible in all other respects. That is, a regular Simon task was embedded in two different contexts: one context emphasizing the need to suppress location-driven activation, and the other context opposing this requirement. The dynamics of direct activation and selective suppression were compared between these two conditions using delta plots. Con1rmation of the predictions outlined above can then be taken as independent evidence that differential delta plots do signify differential involvement of a suppression mechanism in the Simon task. Kramer, Humphrey, Larish, Logan, and Strayer (1994) and Ridderinkhof et al. (1999) used an Eriksen task in which they intermixed regular trials with trials on which an auditory stop signal was presented in addition to the visual array. Both studies reported modulations of the size of the correspondence effect by the context that required responses to be suppressed. Proctor and Vu (this volume, Chapter 22) intermixed a Simon task with trials requiring a CR response to stimulus location (or, in other conditions, with trials requiring an NCR response to stimulus location), and observed that the size of the overall Simon effect was modulated by the presence of the additional task. Independently, I developed an experimental design reminiscent of that used by Proctor and Vu. In the present design, the context in which the Simon task was embedded could either require location to be used as the basis for responding, as in the Proctor and Vu study, or require location to be ignored, as in regular Simon task trials. The intention was to explore the extent to which the level of suppression of location-driven activation in regular Simon trials was in2uenced by these contexts, using the delta plot methods.
505
aapc24.fm Page 506 Wednesday, December 5, 2001 10:11 AM
506
Common mechanisms in perception and action
The context manipulation was as follows. Context trials were similar to regular Simon trials, but now color could not be used as the imperative stimulus feature since all context stimuli were gray. In one context condition, response side was designated by stimulus shape (circle or square), and stimulus location was to be ignored; in the other context condition, it was the location (left or right from 1xation) rather than the shape of the same stimuli that indicated which response was to be given, while shape was to be ignored. As a result, the ‘regular’ Simon trials were embedded in context trials that either required activation of responses based on stimulus location, or suppression of locationbased activation. I hypothesized that these two different contexts would in2uence the level of selective suppression in the regular Simon trials (which were completely equal in all other respects).
24.6.1 Methods 24.6.1.1 Subjects The participants in this experiment were 24 1rst-year Psychology students (12 female, 12 male) who received course credits in return for their participation. All participants reported to be healthy and to have normal or corrected-to-normal vision. Subjects were tested individually in a quiet university chamber. 24.6.1.2 Stimuli and apparatus Subjects were seated 60 cm in front of an Apple Macintosh Plus ED computer that was used for stimulus presentation and response registration. All stimuli were presented against a light-gray background. A small black square contour (0.5 * 0.5 cm) was presented throughout an experimental block in the center of the computer screen and served as a 1xation point. Two larger black square contours (3.0 * 3.0 cm) were presented laterally, one to the left and one to the right of the central 1xation square, such that the centers of the central and lateral squares were 2.25 cm apart. The stimulus on each trial was either a black or a white diamond (1.5 * 1.5 cm), a gray circle (1.25 * 1.25 cm), or a gray square (1.06 * 1.06 cm), which was presented in the center of the square to the left or right of 1xation. On each trial, the color (black or white, in case of diamond stimuli) or shape (circle or square, in case of gray stimuli) was determined randomly, and the location of the stimulus was determined randomly, but with the restriction that each stimulus appeared equally often on each side. Subjects indicated their response by pressing either the ‘z’ or the ‘/’ key of the computer keyboard with their left or right index 1nger, respectively. A feedback stimulus was presented in the form of a digit (0, 5, or 9; 0.3 cm vertically and 0.2 cm horizontally) that was presented at the center of the central 1xation square. A trial started with the presentation of a stimulus inside one of the lateral squares. The stimulus was removed as soon as the subject responded, or after two seconds if the subject had not responded by then. As soon as the stimulus disappeared, a feedback digit appeared in the central square. The feedback stimulus disappeared after 750 ms, at which time a new trial started with the presentation of a stimulus.
24.6.1.3 Task and procedure Several conditions were discerned. In the 1rst condition, a trial block contained only diamonds; the subject’s task was to make a rapid discriminative response on the basis of the color of the diamond. Half of the subjects gave a left-hand response to a white diamond and a right-hand response to a black stimulus; this mapping was reversed for the other half of the subjects. Subjects were
aapc24.fm Page 507 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
instructed to ignore the location of the stimulus and to base their response exclusively on its color. It was explained to them that the to-be-ignored location of the stimulus would correspond in half of the trials to the side of the correct response, as designated by stimulus color (CR trials), and in the other trials to the side opposite to the designated response (NCR trials). In the second condition, a trial block contained only gray shapes; subjects were to respond to circles with their left hand and to squares with their right hand, while stimulus location was to be ignored. In the third condition, a trial block also contained only gray shapes, but now subjects were to respond with their left hand to stimuli presented to the left of 1xation and vice versa; shape was to be ignored. A fourth condition consisted of black and white diamonds intermixed with gray shapes (75% diamonds, 25% gray shapes); diamonds required responses as before (see condition 1), whereas shapes required responses as in the second condition (circles: left hand; squares: right hand). In these mixed blocks, location could always be ignored. I will refer to this condition as the ‘Context in which Location is Irrelevant’ (CLI) condition. A 1fth condition consisted of exactly the same mix of stimuli; diamonds required responses as before, whereas shapes now required responses as in the third condition (left location: left hand; right location: right hand). In these mixed blocks, location could not always be ignored, since location formed the basis for responding on 25% of the trials. I will refer to this condition as the ‘Context in which Location is a Target’ (CLT) condition. In all conditions, responses were to be given as fast as possible while keeping error rates below 15% on average. A feedback procedure served to optimize performance in terms of speed and accuracy. Participants could earn points by performing fast and accurately. The feedback digit stimulus re2ected the number of points gained in each trial. Five points were gained for a response with the correct hand and zero points for a response with the incorrect hand. If a response was correct and the response time was faster than the subject’s average response time (calculated as a running average, updated on every trial) the subject earned 9 points. Since responses in NCR trials tend to be slower than in CR trials, running averages were computed separately for these two types of trials. At the end of a trial block, the subject was shown his or her total score for that block; this score had no further consequences. Task instructions were given 1rst for the 1rst condition, the ‘diamonds only’ condition. After the experimenter had veri1ed that all instructions were well understood, subjects 1rst performed three practice blocks to familiarize them with the task and procedure and to allow them to optimize and stabilize their performance. Each practice block consisted of 32 trials. Next, four experimental blocks (diamonds only) were presented, each consisting of 100 trials. The 1rst four trials in each experimental block were considered as warm-up trials. Responses for the next 96 trials were stored on disk for later analysis. Blocks of trials were separated by two-minute intermissions. Next, after a break of 1ve minutes, task instructions and practice blocks were given for the second and third conditions. These conditions, with gray circles and squares, in fact served merely as practice conditions for the CLI and CLT conditions, in which gray shapes were presented as context trials. Subjects 1rst performed three practice blocks with the shape instruction (the second condition, in which shape was relevant and location irrelevant) and then three practice blocks with the location instruction (the third condition, in which location was relevant and shape irrelevant), to familiarize them with the new tasks. Each practice block consisted of 32 trials. Finally, task instructions and practice blocks were given for the fourth and fifth conditions. Half of the subjects 1rst performed three practice blocks followed by six experimental blocks in the CLI condition; next, they performed three practice blocks followed by six experimental blocks in the
507
aapc24.fm Page 508 Wednesday, December 5, 2001 10:11 AM
508
Common mechanisms in perception and action
CLT condition. For the other half of the subjects, this order was reversed. Each experimental block consisted of 100 trials, the 1rst four trials of which were considered as warm-up trials. Blocks of trials were separated by two-minute intermissions; this intermission was extended to a ten-minute break during the transition from CLI to CLT (or vice versa).
24.6.2 Analytical design Analyses on the 1rst condition (diamonds only, a regular Simon task) are reported in subsequent sections (Analyses 2 and 3). The second and third conditions (gray shapes only) were used for practice purposes only. Thus, the present analyses focus on the CLI and CLT conditions. For each subject, mean RT and overall accuracy were determined for the CR and NCR conditions of the 75% regular color-Simon trials, separately for CLI and CLT blocks. Initial analyses of variance were conducted on mean RTs and accuracy scores using SPSS’s GLM feature. The analyses included the effects of the within-subjects factors Correspondence between Stimulus Location and Designated Response Side (henceforth referred to as Correspondence; CR vs. NCR) and Context (CLI vs. CLT). Next, for each subject, reaction times of all responses (including both correct and incorrect responses; response omissions were not observed) were rank-ordered (for CR and NCR trials separately) and then divided into 1ve equal-size speed bins (quintiles). Mean RT and accuracy were determined for each quintile in each condition (as determined by factorial combinations of the Correspondence and Context factors) separately. Delta plots for RT were constructed by plotting effect size (mean RT in the NCR condition minus mean RT in the CR condition) as a function of response speed (the average of mean RTs in the CR and NCR conditions per quintile). Likewise, delta plots for accuracy were constructed by plotting effect size (accuracy in the NCR condition minus accuracy in the CR condition) as a function of response speed (the average of mean RTs in the CR and NCR conditions per quintile). Overall mean RT and accuracy are mathematically equal to the average of the mean RTs and accuracies of the 1ve quintiles. Slopes were computed for the delta plot segments connecting the data points of quintile 1 and 2, quintile 2 and 3, quintile 3 and 4, and quintile 4 and 5. A second set of ANOVAs was conducted on these slopes (separately for RT and accuracy) and included the within-subjects factors Context (CLI versus CLT) and Quantile (q1–2, q2–3, q3–4, q4–5).
24.6.3 Results The 1rst set of ANOVAs focused on the effects of Correspondence and Context on mean RTs and accuracy scores. As is typical in Simon tasks, CR responses were faster, F(1,23) = 68.97, p < 0.001, and more accurate, F(1,23) = 32.54, p < 0.001, than NCR responses. Responses in the two context conditions were equally fast, F(1,23) = 1.41, but less accurate in the CLI compared with the CLT context, F(1,23) = 15.59, p < 0.001. Most important, and as anticipated, the effect of Correspondence on RT was reduced substantially in the CLI compared with the CLT context (see Fig. 24.7, upper left panel; F(1,23) = 48.10, p < 0.001). Likewise, the substantial effect of Correspondence on accuracy in the CLT context was abolished completely in the CLI context (see Fig. 24.7, lower left panel; F(1,23) = 105.59, p < 0.001). The direction of the accuracy effects discounted interpretations of the RT 1ndings in terms of speed/accuracy trade-off. The second set of ANOVAs focused on the effects of Context on the slopes of the delta plots for RT and accuracy. As shown in the upper right panel of Fig. 24.7, for RT, the slopes differed
aapc24.fm Page 509 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
Fig. 24.7 Results of Analysis 1. CLI refers to the context where location could always be ignored; CLT refers to the context where location was the target aspect of the stimulus. Upper left panel: Overall response times (RT) for corresponding (CR) and non-corresponding (NCR) conditions. Upper right panel: Delta plots for response speed for correspondence effects in the two contexts. Delta plots plot effect size as a function of response speed (as expressed in RT quintile scores). Lower left panel: Overall response accuracy for corresponding CR and NCR conditions. Right panel: Delta plots for accuracy for correspondence effects in the two contexts.
signi1cantly between contexts at all segments of the delta plot: q1–2, F(1,23) = 39.10, p < 0.001; q2–3, F(1,23) = 40.48, p < 0.001; q3–4, F(1,23) = 11.40, p < 0.003; q4–5, F(1,23) = 5.77, p < 0.025. For accuracy, by contrast, the slopes did not differ signi1cantly between contexts at any segment of the delta plot (see Fig. 24.7, lower right panel): q1–2, F(1,23) = 2.91; q2–3, F(1,23) = .74; q3–4, F(1,23) = 2.27; q4–5, F(1,23) = .46.
509
aapc24.fm Page 510 Wednesday, December 5, 2001 10:11 AM
510
Common mechanisms in perception and action
24.6.4 Discussion From Fig. 24.7 it is evident that the Simon effect (on RT as well as accuracy) is in2uenced heavily by the context in which the Simon task appears. The overall Simon effect is attenuated substantially when the context is such that location-driven direct activation should always be suppressed, compared with the context in which location serves as the basis for responding. It was predicted that the differential inhibitory demand between the contexts would be captured by the delta plot dynamics. The results of the analyses con1rmed this prediction: the slopes of the delta plots for RT diverged right from the outset, suggesting that inhibitory control was exerted much more forcefully in the CLI compared with the CLT context. The delta plots for accuracy did not corroborate this pattern, indicating that the inhibitory effects were expressed in the speed of responding rather than in the incidence of fast errors. The pattern of 1ndings could not be explained by overall differences in response speed between the contexts, since this difference was only small (8 ms) and not signi1cant. Other alternative explanations of the diverging patterns are also not likely, as all factors were held constant between contexts. One possible factor (the need to maintain spatial information in CLT relative to CLI conditions) can be ruled out unless it can be argued that this factor could produce the differential delta plot patterns. Thus, I consider the results to provide support for the activation-suppression hypothesis; the dynamics of selective suppression predicted by this hypothesis were captured nicely by the delta plots. The differences in these dynamics between the contexts, as expressed in the diametrically opposed patterns in the delta plots for RT, would have been overlooked altogether if we had con1ned our analyses to the traditional analysis of mean RT. Variations in the mechanism of suppressing direct activation were implemented operationally in terms of variations in the need to ignore location. Even though the operationalization was derived directly from the presumed mechanism, it might be argued that the former does not map one-to-one onto the latter. The evidence in support of a suppression mechanism (as discussed in the ‘Suppression of Direct Activation’ section) notwithstanding, it is conceivable that ignoring location can be achieved without suppressing location-based direct activation. If so, then negative results (i.e. delta plots that would not diverge between the two context conditions) would not have allowed one to draw conclusive inferences: a negative 1nding might have resulted from the absence of involvement of a suppression mechanism in the Simon task, but it might also have resulted from a mis-operationalization (in which the need to ignore location did not involve the suppression of location-based direct activation). However, the results yielded positive results in accordance with the speci1c and unique predictions derived from the activation-suppression hypothesis. Thus, the differential delta-plot results are taken as independent evidence in favor of the existence of a suppression mechanism and its speci1c role in the Simon task.
24.7 Analysis 2: overall effect size The main intention in the following analyses was to examine the extent to which suppression of direct activation plays a role in factors (post-hoc classi1cations or experimental manipulations) that in2uence the size of the Simon effect. One approach was based on a simplistic notion related to inter-subject variability. Some subjects experience larger interference effects than others. Although the reasons underlying these individual differences may be manifold, one intuitively attractive
aapc24.fm Page 511 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
explanation is that subjects differ in susceptibility to interference because of their differential capacity to suppress location-based direct activation. If this were the case, then these individual differences should show up in the dynamics of direct activation and selective suppression, as captured by the distributional analyses: compared with subjects with larger Simon effects, subjects with smaller Simon effects were predicted to display stronger suppression effects, as expressed in the delta plots for RT (diverging slopes) and accuracy (fewer fast errors). To verify these predictions, two groups of subjects were formed using a median split based on their performance in a basic Simon task (the initial part of the experiment, the condition in which color was always the target feature and gray shapes did not occur).
24.7.1 Analytical design For each subject, mean RT and overall accuracy were determined for CR and NCR conditions of the color-Simon task (the 1rst experimental condition). The Simon effect sizes on RT (computed as RT(NCR) – RT(CR)) were rank-ordered across subjects; a median-split method was used to classify the subjects into two groups (one group with the smaller, the other with the larger Simon effects). Initial analyses of variance were conducted on mean RTs and accuracy scores, and included the effects of the between-subjects factor Group (great vs. small Simon effect) and the within-subjects factor Correspondence (CR vs. NCR). Next, for each subject, reaction times were rank-ordered per condition and then divided into quintiles. Mean RT and accuracy were determined for each quintile in each condition (CR, NCR) separately. Delta plots for RT and accuracy were constructed as before. A second set of ANOVAs was conducted on the slopes of each of the delta plot segments (q1–2, q2–3, q3–4, q4–5; separately for RT and accuracy). These analyses included the between-subjects factor Group (great vs. small Simon effect) and the within-subjects factor Quintile (q1–2, q2–3, q3–4, q4–5).
24.7.2 Results The 1rst set of ANOVAs focused on the effects of Group and Correspondence on mean RTs and accuracy scores. On average, both groups were equally fast, F(1,22) = 0.84, and equally accurate, F(1,22) = 0.84. CR responses were faster, F(1,22) = 195.81, p < 0.001, and more accurate, F(1,22) = 10.50, p < 0.004, than NCR responses. As anticipated, the effect of Correspondence on RT differed reliably between groups (see Fig. 24.8, upper left panel; F(1,22) = 47.30, p < 0.001); the effect of Correspondence on accuracy differed marginally but not signi1cantly between groups (see Fig. 24.8, lower left panel; F(1,22) = 3.23, p < 0.086). The direction of the accuracy effects precludes an interpretation in terms of speed/accuracy trade-off. The second set of ANOVAs focused on the effects of Group on the slopes of the delta plots for RT and accuracy. As can be seen in the upper right panel of Fig. 24.8, for RT, the Group difference was signi1cant for q4–5: F(1,22) = 18.41, p < 0.001, marginally signi1cant for q3–4: F(1,22) = 3.53, p < 0.074, and not signi1cant for q2–3: F(1,22) = 0.45, and q1–2: F(1,22) = .18. As shown in the lower right panel of Fig. 24.8, for accuracy, the Group difference was signi1cant for q1–2: F(1,22) = 5.53, p < 0.028, but not for q2–3: F(1,22) = .16, q3–4: F(1,22) = 1.27, and q4–5: F(1,22) = .58.
511
aapc24.fm Page 512 Wednesday, December 5, 2001 10:11 AM
512
Common mechanisms in perception and action
Fig. 24.8 Results of Analysis 2. SMALLER refers to the group of subjects with relatively small Simon effects; LARGER refers to the group of subjects with relatively large Simon effects. Upper left panel: Overall response times (RT) for corresponding (CR) and non-corresponding (NCR) conditions. Upper right panel: Delta plots for response speed for correspondence effects in the two groups. Delta plots plot effect size as a function of response speed (as expressed in RT quintile scores). Lower left panel: Overall response accuracy for CR and NCR conditions. Right panel: Delta plots for accuracy for correspondence effects in the two groups.
24.7.3 Discussion As depicted in Fig. 24.8, the two groups differed not only in terms of the size of the overall Simon effect on RT, but also in terms of the how this effect depends on processing speed. The delta plot shows that the subjects with smaller overall Simon effects have reversed Simon effects at slow response quintiles, whereas slow response quintiles show the largest effect for subjects with larger
aapc24.fm Page 513 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
overall Simon effects. The analyses indicated that the slopes of the delta plot differed signi1cantly between the two groups after quintile 4, and marginally after quintile 3. Thus, in accordance with the predictions, it can be concluded that individuals with smaller Simon effects display selective suppression of location-based direct activation either more strongly or earlier (or both) than individuals with larger Simon effects. Figure 24.8 also shows that the two groups differed also with respect to the Simon effect on accuracy. In particular, the delta plot shows that the subjects with larger overall Simon effects on RT show more fast NCR errors, a 1nding corroborated by the analyses. Thus, the accuracy 1ndings also support the prediction that individuals with smaller Simon effects display selective suppression of location-based direct activation more ef1ciently than individuals with larger Simon effects. One could question whether the pattern of divergence, produced by a median split on the sample of subjects on the basis of Simon effect size, would not have been produced by any arbitrary sorting of subjects. Several different arbitrary splits (based on random sortings of subject-number) all yielded highly similar results: delta plots in which the two groups fell approximately on top of each other. An additional sorting of subjects was based on overall mean RT: again, delta plots resulted in which the two groups approximately overlapped. None of the alternative sortings produced a pattern of divergence, neither in the predicted nor in the opposite direction. Thus, it was only the median split based on Simon effect size that did bring about the expected dissociation. It should be noted that the two groups might differ in respects other than the ef1ciency of selective suppression. For instance, the observed group differences in overall Simon effects might be related to group differences in overall processing speed or accuracy, or to differences in factors that in2uence the Simon effect but do not involve the suppression mechanism. However, unlike the activation-suppression mechanism, no differential delta-plot predictions could be derived from these alternative factors. Thus, I conclude that individual differences in the strength of suppression of direct activation (perhaps among other factors not examined here) contribute to between-subject variability in the size of the Simon effect.
24.8 Analysis 3: sequential effects A further approach to exploring the dynamics of direct activation and selective suppression was to examine sequential effects. A number of authors (e.g. Proctor and Vu, this volume, Chapter 22; Valle-Inclán et al., this volume, Chapter 23) have suggested that the correspondence condition on the trial preceding may in2uence the pattern of responding on the current trial. The recurrent 1nding is that the Simon effect on RT is reduced on trials that are preceded by NCR compared with CR trials. Monitoring the pattern of activation on the preceding trial, either via bottom-up mechanisms (e.g. Los 1996) or via top-down mechanisms (e.g. Stoffels 1996), might have some remedial effect on performance in the current trial. I speculate that the presence of incorrect activation on a preceding trial might enhance (the onset time, build-up rate, and/or strength of) inhibition on the current trial, irrespective of the probability or actual presence of incorrect activation on the current trial. If this were the case, then these sequential effects should show up in the distributional analyses: compared with trials preceded by CR trials, trials preceded by NCR trials were predicted to display stronger suppression effects, as expressed in the delta plots for RT (diverging slopes) and accuracy (fewer fast errors). To verify these predictions, the data from the basic Simon task (the initial part of the experiment, in which color was always the target feature and gray shapes did not occur) were reanalyzed, now focusing on the sequential effects.
513
aapc24.fm Page 514 Wednesday, December 5, 2001 10:11 AM
514
Common mechanisms in perception and action
24.8.1 Analytical design For each subject, mean RT and overall accuracy were determined separately for trials preceded by CR trials and trials preceded by NCR trials. Initial analyses of variance were conducted on mean RTs and accuracy scores, and included the effects of the within-subjects factors Correspondence (CR vs. NCR) and Sequence (preceded by CR vs. NCR trials, henceforth referred to as <
24.8.2 Results The 1rst set of ANOVAs focused on the effects of Correspondence and Sequence on mean RTs and accuracy scores. As before, CR responses were faster, F(1, 23) = 62.66, p < 0.001, and more accurate, F(1, 23) = 8.58, p < 0.008, than NCR responses. In addition, responses were slower, F(1, 23) = 16.10, p < 0.001, and slightly less accurate, F(1, 23) = 3.09, p < 0.092, when they were preceded by NCR compared with CR trials. Most importantly, and as anticipated, the effect of Correspondence on RT was reduced substantially after NCR compared with CR trials (see Fig. 24.9, upper left panel; F(1, 23) = 34.47, p < 0.001). Likewise, the substantial effect of Correspondence on accuracy in trials preceded by CR trials was abolished in trials preceded by NCR trials (see Fig. 24.9, lower left panel; F(1, 23) = 30.23, p < 0.001). The direction of the accuracy effects again rendered an interpretation of the RT 1ndings in terms of speed/accuracy trade-off unlikely. The second set of ANOVAs focused on the effects of Sequence on the slopes of the delta plots for RT and accuracy. As shown in the upper right panel of Fig. 24.9, for RT, the Sequence effect was signi1cant for q1–2: F(1, 23) = 10.66, p < 0.003, and q2–3: F(1, 23) = 5.82, p < 0.024, marginally signi1cant for q3–4: F(1, 23) = 3.83, p < 0.063, and not signi1cant for q4–5: F(1, 23) = 0.01. As depicted in the lower right panel of Fig. 24.9, for accuracy, the Sequence effect was signi1cant for q1–2: F(1, 23) = 9.95, p < 0.004, and for q3–4: F(1, 23) = 5.54, p < .028, but not for q2–3: F(1, 23) = 1.68, and q4–5: F(1, 23) = .64.
24.8.3 Discussion Figure 24.9 depicts that the Simon effect (on RT as well as accuracy) depends on what happened on the preceding trial: if the preceding trial was an NCR trial, the Simon effect was substantially attenuated. This 1nding is in accordance with 1ndings reported elsewhere (e.g. Leuthold and Stürmer 2000; Proctor and Vu, this volume, Chapter 22; Soetens 1998; Valle-Inclán et al., this volume, Chapter 23). In the present analysis, I examined whether this phenomenon could be explained at least in part by trial-by-trial 2uctuations in the ef1ciency of selective suppression. This notion was based on the speculation that the presence of incorrect activation on a preceding trial might tune up the level of inhibitory control on the subsequent event, whereas this level could be reduced if incorrect activation was absent. Indeed, the slopes of the delta plots for RT diverged early on, suggesting that trials preceded by NCR trials displayed stronger suppression effects. The parallel slopes at the slowest
aapc24.fm Page 515 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
Fig. 24.9 Results of Analysis 3. <
515
aapc24.fm Page 516 Wednesday, December 5, 2001 10:11 AM
516
Common mechanisms in perception and action
Fig. 24.9, upper left panel). However, the activation-suppression account does not necessarily make this prediction. It asserts that one factor playing a role in sequential effects in the Simon task is inhibition; the hypothesized effects of inhibition are that the delta plots will diverge (and at the overall RT level, this effect will show up as a reduction in the Simon effect in <
24.9 General discussion The main thrust of the present study was to explore and clarify the role of suppression processes in the presence, magnitude, and direction of correspondence effects in con2ict tasks. Distributional analyses were argued to provide insights that are not available from the analysis of mean RT and overall accuracy alone. Delta plots were derived from cumulative density functions and conditional accuracy functions, and the slopes of the line segments connecting quintile points in the delta plots were compared between conditions. Although the absolute values of the slopes may be in2uenced by factors other then the direct activation and selective suppression factors that we were interested in, slopes can be compared between conditions thought to differ in the ef1ciency of inhibition. The point of divergence in delta plots is revealing with respect to the timing and intensity of inhibitory processes. Three different empirical approaches provided evidence for differential patterns of selective suppression in Simon tasks. First, the results of an experimental manipulation, designed explicitly to vary the need to suppress (location could or could not be suppressed ab initio, depending on the context in which a regular Simon task was presented), provided independent evidence that differential patterns of suppression of location-driven direct activation showed up in diverging delta plot patterns. The delta plots for RT and accuracy revealed further that (1) the suppression of direct activation was more ef1cient for individuals who showed relatively small correspondence effects in overall RT, and (2) the operation of selective suppression of direct activation was much more stringent after NCR trials compared with CR trials. These results were in agreement with the predictions derived from the activation-suppression hypotheses. At present, it is dif1cult to see how the results of the experiment could be accounted for by alternative hypotheses about correspondence effects. That is not to say that the activation-suppression hypothesis replaces other accounts; by contrast, I take it to complement other hypotheses, and the present results do not speak for or against speci1c theoretical positions. Dimensional overlap, perceptual con2ict, response selection, response-code decay, shift of attentional reference, S–R binding, and other factors may play their role in the occurrence of correspondence effects. The present results merely indicate that selective suppression of direct activation may play a role in addition to these other processes, and that sometimes the role of suppression processes is a major one. The distributional analyses were shown to be crucial in
aapc24.fm Page 517 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
highlighting that role, since the dynamics of the direct-activation and selective-suppression patterns were lost in the overall scores. It should be noted that direct activation and, as a consequence, selective suppression are not necessarily always present. In an experiment where all the possible alphanumerical characters were mapped randomly onto either the left or the right hand, the stimulus–response associations would not likely be strong enough to produce direct activation effects. With more natural or over-learnt stimulus–response relations, direct activation would be more likely. Direct activation is not a necessary condition for correspondence effects to occur; for instance, ERP evidence indicates that perceptual factors contribute to correspondence effects in Eriksen (e.g. Gratton, Coles, and Donchin 1992), compound-letter (e.g. Ridderinkhof and van der Molen 1995b), and Simon tasks (e.g. Valle-Inclán et al., this volume, Chapter 23). Often, however, there is opportunity for direct activation to occur, and in that case the effects of direct activation are dissociable from the perceptual effects (see Ridderinkhof and van der Stelt 2000). To the extent that stimulus features can exert direct activation effects, selective suppression of that activation serves as a control instrument in the coordination of correct and incorrect response activations. The activation-suppression hypothesis, which received initial support from the LRP studies with the masked-priming task (Eimer 1999; Eimer and Schlaghecken 1998) and with the Eriksen task (Ridderinkhof et al. 1996), generates lucid and empirically testable predictions about differences in (the dynamics of) suppression processes between conditions and between groups. Thus, the delta plot technique may be used to examine, for instance, the ef1ciency of suppression processes in groups that are suspected to perform de1ciently in inhibitory control (e.g. schizophrenic patients, ADHD children, older adults). This methodology may also prove useful in examining the brain structures involved in response suppression (e.g. by comparing fast and slow NCR trials in eventrelated fMRI). These promises remain to be veri1ed. However, the usefulness of the technique may be inferred from examples that are already in the literature. De Jong, Berendsen, and Cools (1999), for instance, reported CDFs based on RT data obtained in the Stroop color-word task under two conditions. The conditions differed in terms of response–stimulus interval, which was either long (2000 ms) or brief (200 ms). Stroop interference was substantially larger in the slow- compared to the fast-pace condition, interpreted by the authors as re2ecting more frequent failures to concentrate on target processing and inhibit responses to to-be-ignored stimulus aspects in the slow-pace condition. An examination of their CDFs in the way promoted in the present paper supports this interpretation: subjects appeared to be more ef1cient in suppressing the word-based direct activation of the incorrect response in the fast-pace condition. Thus, distributional analyses (including re-analyses of existing data) may help uncover or clarify patterns relevant for the interpretation of correspondence effects in con2ict tasks. In the present study, such analyses provided strong support for the activation-suppression hypothesis as developed here to explain a major component of the Simon effect.
Acknowledgement The research of Dr Ridderinkhof was supported by a grant from the Royal Netherlands Academy of Arts and Sciences. The assistance of Joppe Deen in setting up the experiments and collecting the data is gratefully acknowledged. Comments by Bernhard Hommel, Pierre Jolicoeur, and an anonymous reviewer on a previous draft were most helpful in improving the quality of this paper.
517
aapc24.fm Page 518 Wednesday, December 5, 2001 10:11 AM
518
Common mechanisms in perception and action
Correspondence address: Department of Psychology, University of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam, The Netherlands (e-mail:
[email protected]).
References Band, G.P.H. and van Boxtel, G.J.M. (1999). Inhibitory motor control in stop paradigms: Review and re-interpretation of neural mechanisms. Acta Psychologica, 101, 179–211. de Jong, R., Liang, C.-C., and Lauber, E. (1994). Conditional and unconditional automaticity: A dual-process model of effects of spatial stimulus–response correspondence. Journal of Experimental Psychology: Human Perception and Performance, 20, 731–750. de Jong, R., Berendsen, E., and Cools, R. (1999). Goal neglect and inhibitory limitations: Dissociable causes of interference effects in con2ict situations. Acta Psychologica, 101, 379–394. Eimer, M. (1999). Facilitatory and inhibitory effects of masked prime stimuli on motor activation and behavioral performance. Acta Psychologica, 101, 293–314. Eimer, M. and Schlaghecken, F. (1998). Effects of masked stimuli on motor activation: Behavioral and electrophysiological evidence. Journal of Experimental Psychology: Human Perception and Performance, 24, 1737–1747. Eimer, M., Hommel, B., and Prinz, W. (1995). S–R compatibility and response selection. Acta Psychologica, 90, 301–313. Eriksen, C.W. and Schultz, D.W. (1979). Information processing in visual search: A continuous 2ow conception and experimental results. Perception and Psychophysics, 25, 249–263. Frith, C.D. and Done, D.J. (1986). Routes to action in reaction time tasks. Psychological Research, 48, 169–177. Gratton, G., Coles, M.G.H., and Donchin, E. (1992). Optimizing the use of information: Strategic control of activation of responses. Journal of Experimental Psychology: General, 121, 480–506. Hedge, A. and Marsh, N.W. (1975). The effect of irrelevant spatial correspondences on two-choice reaction time. Acta Psychologica, 39, 427–439. Hommel, B. (1993). The relationship between stimulus processing and response selection in the Simon task: Evidence for a temporal overlap. Psychological Research, 55, 280–290. Hommel, B. (1994). Spontaneous decay of response-code activation. Psychological Research, 56, 261–268. Kornblum, G.T. and Stevens, S. (2002). Sequential effects of dimensional overlap: findings and issues. This volume, Chapter 2. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility—A model and taxonomy. Psychological Review, 97, 253–270. Kornblum, S., Stevens, G.T., Whipple, A., and Requin, J. (1999). The effects of irrelevant stimuli: 1. The time course of stimulus–stimulus and stimulus–response consistency effects with Stroop-like stimuli, Simon-like tasks, and their factorial combinations. Journal of Experimental Psychology: Human Perception and Performance, 25, 688–714. Kramer, A.F., Humphrey, D.G., Larish, J.F., Logan, G.D., and Strayer, D.L. (1994). Aging and inhibition: Beyond a unitary view of inhibitory processing in attention. Psychology and Aging, 9, 491–512. Leuthold, H. and Stürmer, B. (2000). Dual-routes of sensorimotor processing and selection-for-action: Behavioral and electrophysiological evidence. Poster presented at Attention and Performance XIX, Kloster Irsee, Germany, July 2000. Logan, G.D. and Cowan, W.B. (1984). On the ability to inhibit thought and action: A theory of an act of control. Psychological Review, 91, 295–327. Los, S. (1996). On the origin of mixing costs: Exploring information processing in pure and mixed blocks of trials. Acta Psychologica, 94, 145–188. Lu, C-H. and Proctor, R.W. (1995). The in2uence of irrelevant location information on performance: A review of Simon and spatial Stroop effects. Psychonomic Bulletin and Review, 2, 174–207. Luce, R.D. (1986). Response times. Their role in inferring elementary mental organization. New York, NY: Oxford Science Publications. Norman, D.A. and Shallice, T. (1986). Attention in action: Willed and automatic control of behavior. In R.J. Davidson, G.E. Schwartz, and D. Shapiro (Eds.), Consciousness and self-regulation, Vol. 4, pp. 1–18. New York: Plenum Press.
aapc24.fm Page 519 Wednesday, December 5, 2001 10:11 AM
Activation and suppression in conflict tasks
Proctor, R.W. and Vu, K.-P.L. (2001). Eliminating, magnifying, and reversing spatial compatibility effects. This volume, Chapter 22. Proctor, R.W., Lu, C-H., Wang, H., and Dutta, A. (1995). Activation of response codes by relevant and irrelevant stimulus information. Acta Psychologica, 90, 275–286. Ridderinkhof, K.R. (1997). A dual-route processing architecture for stimulus–response correspondence effects. In B. Hommel and W. Prinz (Eds.), Theoretical issues in stimulus–response compatibility, pp. 119–131. Amsterdam: Elsevier Science. Ridderinkhof, K.R. and Bashore, T.R. (1995). Using event-related brain potentials to draw inferences about human information processing. In P.A. Allen and T.R. Bashore (Eds.), Age differences in word and language processing, pp. 294–313. Amsterdam: Elsevier. Ridderinkhof, K.R. and van der Molen, M.W. (1993). What makes slow responses slow? On the role of response competition in within-subject variability in response speed. Psychophysiology, 30, 54 (abstract). Ridderinkhof, K.R. and van der Molen, M.W. (1995a). A psychophysiological analysis of developmental differences in the ability to resist interference. Child Development, 66, 1040–1056. Ridderinkhof, K.R. and van der Molen, M.W. (1995b). When global information and local information collide: A brain-potential analysis of the locus of interference effects. Biological Psychology, 41, 29–53. Ridderinkhof, K.R. and van der Molen, M.W. (1997). Mental resources, processing speed, and inhibitory control: A developmental perspective. Biological Psychology, 45, 241–261. Ridderinkhof, K.R. and Van der Stelt, O. (2000). Attentional selection in the growing child: An overview derived from developmental psychophysiology. Biological Psychology, 51, 245–299. Ridderinkhof, K.R., van der Molen, M.W., and Bashore, T.R. (1995). Limits on the application of additive factors logic: Violations of stage robustness suggest a dual-process architecture to explain 2anker effects on target processing. Acta Psychologica, 90, 29–48. Ridderinkhof, K.R., Lauer, E.R., and Geesken, R.H.J. (1996). LRP evidence for direct response-activation effects of to-be-ignored arrow stimuli. Psychophysiology, 33, 3–4 (abstract). Ridderinkhof, K.R., Band, G.P.H., and Logan, G.D. (1999). A study of adaptive behavior: Effects of age and irrelevant information on the ability to inhibit one’s actions. Acta Psychologica, 101, 315–337. Sanders, A.F. (1967). Some aspects of reaction processes. Acta Psychologica, 27, 115–130. Sanders, A.F. (1980). Stage analysis of reaction processes. In G.E. Stelmach and J.Q. Requin (Eds.), Tutorials in motor behavior. Amsterdam: North-Holland. Shimamura, A.P. (1995). Memory and frontal lobe function. In M.S. Gazzaniga (Ed.), The cognitive neurosciences, pp. 803–813. Cambridge, MA: MIT Press. Soetens, E. (1998). Localizing sequential effects in serial choice reaction time with the information reduction procedure. Journal of Experimental Psychology: Human Perception and Performance, 24, 547–568. Stoffels, E.J. (1996). On stage robustness and response selection routes: Further evidence. Acta Psychologica, 91, 67–88. Valle-Inclán, F., Hackley, S.A., and de Labra, C. (2002). Does stimulus-driven response activation underlie the Simon effect? This volume, Chapter 23. Welford, A.T. (1968). Fundamentals of skill. London: Methuen. Zhang, J. and Kornblum, S. (1997). Distributional analyses and De Jong, Liang, and Lauber’s (1994) dual-process model of the Simon effect. Journal of Experimental Psychology: Human Perception and Performance, 23, 1543–1551.
519
aapc25.fm Page 520 Wednesday, December 5, 2001 10:11 AM
25 Response-evoked interference in visual encoding Jochen Müsseler and Peter Wühr Abstract. When two speeded response tasks are performed in close succession, performance on the second task is usually impaired. Recently, an impairment has also been observed when the second task required only the visual identi1cation of a stimulus. Thus, visual encoding seems to suffer from the need to share limited processing capacities with some processes in the 1rst task. Yet, it is still unclear which particular processes in the 1rst task compete with stimulus encoding in the second task. The present experiments further examined the in2uence of response planning and execution on visual encoding and found both content-nonspeci1c and content-speci1c interference. An event-coding account is proposed that posits structural and procedural overlap of responseplanning and stimulus-encoding mechanisms.
When a traf1c light turns from green to amber and then to red, an automobile driver will immediately either brake or accelerate. It is obvious that the perceived event has elicited or modi1ed the driver’s action. The present paper addresses the less obvious question of whether or not action processes are able to affect perceptual processes. It has already been argued that, in order to produce goal-directed behavior, an actor selects action-relevant information from the environment and ignores irrelevant information. In this sense, actions exert an important in2uence on what information is selected— and, thus, on what information is perceived (see, for an overview, Neumann 1996). However, a possibly more striking impact of action on perception can be observed in situations where action processes do not guide perception. An everyday example is the driver who is engaged in conversation and therefore misses the stop signal (cf. Kahneman, Beatty, and Pollack 1967). The interesting question in that case is whether the conversation affected the perception of the stop sign or, alternatively, interfered with responding to it. These and related questions are addressed empirically in dual-task experiments, when observers have to produce a response while simultaneously identifying a stimulus. There are basically two research approaches to studying the impact of response production on perceptual encoding. These approaches do not only contrast in use of different dual-task paradigms, but also in their theoretical background. The 1rst approach investigated whether the production of a speeded response affects concurrent processing in a perceptual task at all. This research originated from an observed decrement in performance when two speeded responses were performed at the same time (cf. the PRP paradigm, see below; Pashler 1994; Welford 1980). This decrement was commonly attributed to the stages of response selection and response execution, which can only deal with one task at a time and, therefore, constitute a processing bottleneck. Only recently, researchers began to investigate the question of whether this bottleneck also includes stages of perceptual analysis. For this purpose, tasks were used in which the to-be-produced response and the to-be-encoded stimulus showed no feature overlap. Thus, experiments examined nonspeci1c interferences between response production and stimulus encoding.
aapc25.fm Page 521 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
The second approach investigated whether or not the execution of a response speci1cally impairs the concurrent processing in a perceptual task. This search for speci1c interference effects originated in theoretical notions of action planning. While formulating concepts of how voluntary actions may come into being, ideas were developed about the possible properties of the perception–action interface (e.g. Hommel, Müsseler, Aschersleben, and Prinz, in press; Prinz 1997). One such idea was that perception and action control share codes in the same representational domain. As a result, action-control processes should be capable of affecting and modifying visual processes in a speci1c, content-dependent manner. Experiments in this research tradition have mainly examined identi1cation processes during the execution of an unspeeded and well-prepared motor response. In the following presentation, the two different approaches to analyzing the impact of response planning and execution on perceptual encoding are reviewed and two experiments designed to address open questions from both research traditions are reported. Finally, a tentative theoretical framework is proposed to accommodate the empirical 1ndings from both traditions of research.
25.1 Nonspecific interference of response planning and execution in visual encoding This section outlines research devoted to the question of whether the production of a speeded response can affect concurrent processing in a perceptual task at all. This research has mainly used variants of an experimental paradigm that is well known as the Psychological-Refractory-Period paradigm (PRP paradigm).
25.1.1 Interference between two response tasks in the PRP paradigm In the PRP paradigm, participants respond as quickly as possible to two different stimuli presented in close succession. The stimuli are usually denoted as S1 and S2, and the corresponding responses are denoted as R1 and R2. Accordingly, the 1rst task is to respond to S1, and the second task is to respond to S2. With short Stimulus Onset Asynchronies (SOAs), processing on the two tasks overlaps in time, and R2 is delayed compared with a situation in which the second task is performed alone.1 Telford (1931) compared this ‘transient unreadiness for response’ with the refractory period of neurons, and coined the term ‘Psychological Refractory Period’. According to the predominant explanation of the PRP effect, the observed costs originate from a limited capacity of the responseselection stage; that is, from the inability to simultaneously select two responses in parallel (e.g. McCann and Johnston 1992; Pashler 1994; however, see Meyer and Kieras 1997, for an alternative account). In other words, the response-selection process of the 1rst task is assumed to interfere with the response-selection process of the second task exclusively, whereas the remaining processes (or stages) are not assumed to affect each other mutually. However, there are at least two observations that cast doubts on this explanation. First, interference in the second task occurred even when the 1rst task did not require a response. For example, when the 1rst task consisted of a go–nogo task, response latencies in the second task were still increased at short SOAs in the nogo condition, that is, when no response had to be selected at all (e.g. Bertelson and Tisseyere 1969; Smith 1967). Second, several studies revealed that processing the 1rst task also affected mental operations other than response selection. For example, it was shown that generating a two-choice response in the 1rst task impaired the ability to mentally rotate
521
aapc25.fm Page 522 Wednesday, December 5, 2001 10:11 AM
522
Common mechanisms in perception and action
objects (Ruthruff, Miller, and Lachmann 1995), and the ability to retrieve information from memory (Carrier and Pashler 1995). Correspondingly, it was no longer assumed that only the response selection stage suffers from capacity limitations. Instead the conclusion was that ‘it is reasonable to suspect that the bottleneck will be required for most if not all dif1cult operations’ (Ruthruff et al. 1995, p. 554). Thus, a nonspeci1c ‘central’ mechanism with limited processing capacity (a nonspeci1c central bottleneck) was postulated that is also needed for (dif1cult) perceptual operations. Consequently, the focus of research shifted also towards second tasks with perceptual content.
25.1.2 Nonspeci1c interference between a response task and visual encoding: the assumption of a nonspeci1c central bottleneck Since Telford’s paper in 1931, most of the relevant studies have investigated PRP interference between two speeded response tasks. During the last decade, however, an increasing number of authors became interested in how the processing of a response task might also in2uence the processing of a perceptual task. These authors analyzed the ability to encode an S2, when participants concurrently responded to S1. Indeed, they also found impairments when the second task was an identi1cation task (Arnell and Duncan 1998; De Jong and Sweet 1994; Dell’Acqua, Turatto, and Jolicœur, in press; Jolicœur 1999; Ruthruff et al. 1995; Wühr and Müsseler, in press). For example, De Jong and Sweet (1994) used an auditory–manual task and a perceptual task, in which the observers had to identify the highest digit in an array of brie2y presented digits. The authors found a consistent de1cit in the digit-identi1cation task when it temporally overlapped with the 1rst task. But because the perceptual task used by De Jong and Sweet (1994) was rather complex, it is possible that the 1rst task did not affect digit encoding, but did in2uence other (subsequent) processes involved in determining the highest digit. However, Jolicœur (1999) obtained similar 1ndings with a much simpler perceptual task. He reported that the identi1cation of a pattern-masked letter (or a random-sided polygon) was impaired when an observer simultaneously responded manually to an auditory stimulus. De Jong and Sweet (1994) proposed that at least two factors limit perceptual performance under dual-task conditions: the inability to fully prepare two tasks simultaneously, and the interference between the processing of each task. Jolicœur (1999; see also Jolicœur, Tombu, Oriet, and Stevanovski, this volume, Chapter 28) presented a more elaborate account of interference from a 1rst response task on visual encoding in a second response task. He assumed that the central bottleneck mechanism, which is needed to select a motor response in the S1–R1 task, is also required to transfer visual information into short-term memory. As a consequence, the concurrent processing of the 1rst task interferes with the ‘short-term consolidation’ of visual information at short SOAs in the second task, whereas the same consolidation is unaffected at long SOAs. The studies of De Jong and Sweet (1994) and of Jolicœur (1999) demonstrated that the processing of a S1–R1 task interfered with the concurrent encoding of a visual stimulus in the S2–R2 task. However, it is still open to question which process(es) of the 1rst task interfere(s) with perceptual encoding. Jolicœur (1999) found that the perceptual de1cit increased with the dif1culty of the S1–R1 task. In particular, the perceptual de1cit was larger when the 1rst task was a four-choice task than when it was a two-choice task. This suggests that the source of interference is response selection, because it is more dif1cult to select a response out of four possible alternatives than from two. However, it can also be argued that, when there are four possible alternatives, it is more dif1cult to identify a
aapc25.fm Page 523 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
response signal S1 than when there are only two alternatives. Thus, the identi1cation of S1 could also have contributed to the observed interference. Therefore, interference might originate from an overlap of perceptual processes in both tasks (i.e. from processing S1 and S2, cf. Wühr and Müsseler, in press) or it might originate from generating R1 and perceiving S2. To examine the question of whether or not response processes are actually able to affect visual encoding, the number of motor components in the S1–R1 task can be varied. For example, a go task obviously contains more motor components than a nogo task: a response has to be selected, initiated, and executed in the go condition, but not in the nogo condition. Hence, if these response processes exert an in2uence on the identi1cation of S2, the perceptual de1cit at short SOAs should be larger in the go condition than in the nogo condition. This prediction was tested in the following experiment.
25.1.3 Experiment 1 25.1.3.1 Visual encoding during processing a go–nogo task In the 1rst experiment, we used a PRP paradigm with an identi1cation task similar to those mentioned in the previous section. However, instead of a choice-response task, a go–nogo task was used as the S1–R1 task. In a go–nogo task, a motor response has to be produced in response to a certain S1, whereas no response is required for another S1. Initially, the participants were presented with one of four letters S1 that either called for a manual keypress or not. The mapping is illustrated in Table 25.1. Letters ‘m’ or ‘b’ indicated a to-be-executed R1 (go trial), whereas an ‘x’ or an ‘o’ indicated that no keypress was required (nogo trial). In go trials, participants responded to an ‘m’ by pressing a key with the middle 1nger of their right hand. They responded to a ‘b’ by simultaneously pressing keys with the index and ring 1ngers of their right hand. The perceptual
Exp. 1
Table 25.1 The different S1–R1 mappings in the keypress tasks and the S2–R2 mapping in the identi1cation task
keypress task S1 →R1
Stimuli S1 /S2
Responses R1 /R2
m b
middle key left + right key
x
Exp. 2
o keypress task
m
middle key
S1 →R1
b
left + right key
l
left key
r
right key
identi1cation task S2 →R2 (both experiments)
<
‘left’ judgment
>
‘right’ judgment
Condition
go trial
nogo trial
neutral
compatible/ incompatible
523
aapc25.fm Page 524 Wednesday, December 5, 2001 10:11 AM
524
Common mechanisms in perception and action
task was to identify a masked left- or right-pointing arrow S2 displayed at varying SOAs after presentation of S1. An observation that the perceptual de1cit at short SOAs is larger in the go condition than in the nogo condition would provide evidence that the production of a motor response R1 interferes with the perceptual encoding of S2. The amount of perceptual analysis devoted to S1, as well as the amount of preparation for the perceptual second task, can be assumed to be equal both in the nogo condition and in the go condition. However, the go condition additionally requires selection, initiation, and execution of a response. Thus, a larger dif1culty to encode S2 in the go condition can be attributed to the additional processes necessary to perform the response. Note, because there is neither feature overlap between S1 and S2, nor between R1 and S2, only nonspeci1c interference effects are to be expected.
25.1.3.2 Method The stimuli were presented on a 17″ color monitor (75 Hz refresh rate with a luminance of 40 cd/m). The viewing distance was 50 cm. Responses were recorded with the computer keyboard. All visual stimuli were displayed in black-on-white projection. Letters S1 (‘b,’ ‘m,’ ‘o,’ and ‘x’) were presented approximately 1.5° above screen center. The to-be-identi1ed S2 was presented at the screen center and consisted of a left-pointing (‘<’) or a right-pointing arrowhead (‘>’, subtending 0.8 × 1.6° of visual angle). The mask consisted of randomly arranged lines with the same left or right orientation as the components of the arrowheads (1.0° × 2.0° of visual angle). The go and nogo conditions of the keypress task were crossed with the three SOAs and the left/ right arrowheads of the identi1cation task. Each participant completed a total of 30 blocks consisting
Fig. 25.1 The series of events in a go trial of Experiment 1. In the speeded keypress task, participants pressed both keys simultaneously in response to a ‘b’ (or a middle key in response to an ‘m’). While doing this, a masked left- or right-pointing S2 was presented in the identi1cation task with different SOAs (here 400 ms). The trial was completed with an unspeeded judgment R2.
aapc25.fm Page 525 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
of 24 trials each. All trials started with the presentation of a blank screen (Fig. 25.1). Then, a short beep occurred and one of the four letters was presented for 107 ms. The letter identity unequivocally signaled the required response R1 (see Table 25.1). The instructions stressed the importance of responding quickly to the letter and urged participants not to wait for S2 to appear before executing R1. At an SOA of either 200, 400, or 1000 ms after S1, the left- or right-pointing S2 was presented tachistoscopically (see below) and then replaced by the mask. A judgment screen with the letters ‘L’ and ‘R’, which changed their relative positions randomly from trial to trial, appeared 1.5 s after the onset of the mask. Participants had to indicate the identity of S2 by clicking on the corresponding letter (‘L’ for ‘<’ and ‘R’ for ‘>’) with the computer mouse. An intertrial interval of 1 s followed an error-free trial. An error feedback was given, if participants had made the wrong response to S1 and/or reported the wrong S2. If they did not perform R1 within 1 s from the onset of S1, they received no feedback, but the corresponding trial was repeated at the end of the block. After each block participants received feedback on the percentage of correct trials. To avoid ceiling or 2oor effects in the identi1cation task, the presentation duration of S2 was adjusted to achieve 75% performance accuracy across all SOA conditions. The presentation time was decreased by one screen refresh when the error rate in the last block was lower than 10%. It was increased by one refresh when the error rate was above 40%. The experiment was carried out in a dimly lit and sound-proof chamber. An experimental phase was preceded by a practice phase. Twelve paid volunteers (10 females, 2 males, mean age 23 years) took part in the experiment.
25.1.3.3 Results and discussion Keypress task. Participants made both a wrong go and a wrong nogo decision on less than 0.5% of the trials. Across the go trials, the percentage of incorrect responses was 2.1%. Finally, 0.4% of the responses exceeded the reaction time criterion of 1 s. Reaction times (RT) were calculated for those go trials in which none of the errors described above had occurred. These were entered into a 2 × 3 analysis of variance (ANOVA)2 with response (middle keypress vs. left and right keypress) and SOA (200, 400, and 1000 ms) as within-participant variables.
Fig. 25.2 Mean proportion of correct judgments (and standard errors between participants) of S2 in the nogo and go trials of Experiment 1. The x-axis depicts the SOAs between the presentation of S1 and S2.
525
aapc25.fm Page 526 Wednesday, December 5, 2001 10:11 AM
526
Common mechanisms in perception and action
Across all conditions mean RT was 542 ms. Participants responded slower when pressing both keys simultaneously (563 ms) than when pressing the middle key (520 ms) alone, F(1, 11) = 13.61, p < 0.01. This RT disadvantage probably originated from the higher motor demands of programming two keypresses (cf. Rabbitt, Fearnley, and Vyas 1975). The other factors did not exert any in2uence on the RTs. Identi1cation task. The mean presentation duration for S2 across all participants was 45 ms (range: 16–133 ms). Across all conditions, percentage correct (PC) was 0.78 (SD = 0.03). PC values were subjected to a 2 × 3 ANOVA with condition (go vs. nogo) and SOA. Collapsed across SOA, identi1cation performance was better in the nogo trials than in the go trials, F(1, 11) = 13.37, p < 0.01 (cf. Fig. 25.2). Collapsed across condition, identi1cation performance decreased with decreasing SOA, F(2, 22) = 21.10, p < 0.001. Additionally, the interaction was signi1cant, F(2, 22) = 4.71, p < 0.05, which indicated that the disadvantage of the identi1cation performance in the go condition compared with the nogo condition was large for the shortest SOA of 200 ms, and smaller for the two remaining SOAs of 400 ms and 1000 ms. The nonspeci1c SOA effect replicated the perceptual impairment during a response task already observed in previous PRP experiments (e.g. De Jong and Sweet 1994; Jolicœur 1999; Wühr and Müsseler, in press). This effect might have originated from an overlap of perceptual processes in both tasks (i.e. from encoding S1 and S2 concurrently, cf. Wühr and Müsseler, in press). Alternatively, this effect might have originated from the go–nogo decision in the present keypress task; that is, from the decision whether S1 called for a response or not. Most importantly, the observed go–nogo differences provided evidence that the processes preparing and controlling a manual response R1 impaired the concurrent encoding of S2. More speci1cally, the perceptual de1cit in encoding S2 was largest with the 200-ms SOA in the go condition; that is, when S2 was presented about 342 ms before the execution of R1. This 1nding suggests that the difference between the go and the nogo condition is mainly due to the preparation of R1, and not to its execution. In both conditions, the amount of perceptual analysis devoted to S1 and the go–nogo decision can be assumed to be equal. The larger perceptual de1cit in the go condition must result from the additional motor processes. In sum, both the nonspeci1c effect (i.e. the PC disadvantage with short SOAs) and the go–nogo effect (i.e. the PC disadvantage on the go trials compared with the nogo trials) provided evidence that a response task is able to impair the visual encoding of a concurrently presented S2. The latter 1nding additionally demonstrated that the selection and execution of R1 contributed to the perceptual impairment of S2.
25.2 Specific interference of response planning and execution in visual encoding This section outlines research devoted to the question of whether the production of a response can speci1cally affect concurrent processing in a perceptual task. This research was inspired by theoretical assumptions about the perception–action interface, which directly predicted speci1c effects of action planning on perception.
25.2.1 Action planning and perception: an event-coding model Speci1c interactions between action planning and perception are suggested by theories assuming that responses may be evoked cognitively by the anticipation of their sensory effects (e.g. Green-
aapc25.fm Page 527 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
wald 1970; Hommel et al., in press; see, for early versions of this idea, James 1890; Lotze 1852). The anticipated effects may refer not only to body-related feedback but to any kind of response- or action-contingent events as well (Hoffmann 1993; Hommel 1997; Meltzoff, Kuhl, and Moore 1991). By repeatedly performing an arbitrary movement that produces some perceivable sensory effect, actors may associate the corresponding pattern of motor activity with a code representing the to-be-expected sensory effects (Aitken 1994; Hommel 1997). Once established, such a link could be used the other way round to select and to activate the motor pattern by activating an effect code 1rst. Therefore, the central assumption here is that movements are represented cognitively by their distal effects and can be initiated by activating the representation of these effects. As Prinz (1997) has pointed out, the assumption of action planning by anticipating action effects implies that not only stimulus codes (i.e. codes of perceived events) represent external events but response codes (i.e. codes of to-be-produced events) do so as well. Accordingly, both types of event codes may be commensurable (Prinz 1990), or even identical (Hommel 1997; Müsseler 1995, 1999). Such event codes are not viewed as a single, uniform whole, but do comprise different feature codes. It is well established that visual stimuli are represented by distributed features (e.g. Singer 1994; Treisman and Gelade 1980). The assumption of distributed action features is also not entirely new (Jeannerod 1997; Keele, Cohen, and Ivry 1990). The new assumption is that these features are represented in the same domain and can contribute to the formation of a perceptual or a motor event code (cf. Hommel et al., in press). In a different context, a feature code, which was previously integrated in a perceptual event code, could be integrated in a motor event code (and vice versa). In principle, this notion holds for whole perceptual and motor event codes as well; in other words, in a different context, the same combination of feature codes that represent a perceptual event could refer to a motor event (and vice versa). Thus, perceptual and motor event codes are not entities of a different kind, but only refer to different events of the temporary speci1c context. Given that stimulus processing and action planning operate on a common representational level, perceptual and action-planning processes are assumed to interact when there is a feature overlap between R1 and S2. For example, a feature overlap between R1 and S2 can be assumed to originate
Fig. 25.3 Feature codes (open and 1lled dots) are integrated into perceptual or motor event codes (ellipses) evolving from common structures. Identical feature codes could be bound into a perceptual event code or into a motor event code depending on the present situational context. Two different temporary states are depicted evolving from the assumed feature overlap and nonoverlap of event codes (here by the LEFT and RIGHT feature code) in a compatible and incompatible dual-task situation. The perceptual and the motor event code come into con2ict with respect to the overlapping feature code in the compatible condition, whereas they coexist in the incompatible condition.
527
aapc25.fm Page 528 Wednesday, December 5, 2001 10:11 AM
528
Common mechanisms in perception and action
from a RIGHT (or LEFT) feature code at the common-coding level. This code is considered to be accessed when a right (left) keypress is generated as well as when an arrow pointing to the right (left) is perceived. Another, more speci1c assumption is that when the LEFT (or RIGHT) code is involved in response generation, its sensitivity for a left-pointing (or right-pointing) stimulus is reduced (cf. Müsseler 1995; compatible condition, see Fig. 25.3). In other words, once the LEFT (or RIGHT) feature code is integrated into an action plan (i.e. when a motor event code is established), it is less available for perceptual processing (Hommel 1998). In contrast, if the motor and the perceptual event do not possess overlapping features, they can coexist without any con2ict (incompatible condition). This prediction was tested in several experiments summarized below. The observation of such a speci1c impact of R1 upon S2 challenges traditional models in which the relationship between perception and action is generally conceived as a one-way route: perception is assumed to initiate and generate responses, but it proceeds independently from response production. We have already criticized this view elsewhere (e.g. Hommel et al., in press; see, for a focus on the perceptual point of view, Müsseler 1999). The present section con1nes discussion to the prediction that the generation and execution of R1 will exert an in2uence on the identi1cation of S2.
25.2.2 Speci1c interference between a response task and visual encoding The event-coding model predicts that the identi1cation of S2 can be affected by (the planning and execution of) R1 and/or the feature overlap between R1 and S2. In order to examine the pure in2uence of R1 on encoding of S2, our 1rst experiments tried to reduce the impact of S1–R1 translation by allowing participants to prepare R1 for as long as they wished or by having participants generate R1 without a preceding S1 (Müsseler, Wühr, and Prinz 2000). The basic procedure was as follows: the participants had to prepare a left or right keypress R1 at leisure, and to quickly execute R1 after the execution of a neutral response R0. The purpose of R0 was to provide an observable borderline between the preparation of R1 (before R0) and the execution of R1 (after R0). The execution of R0 triggered the presentation of S2, which, therefore, was to be identi1ed during the execution of a compatible or incompatible R1. A typical 1rst result of these experiments was that the perceptual task proved easier when the masked S2 was presented alone than when it was presented during the execution of R1 (Müsseler 1999). Accordingly, perceptual single-task performance was better than dual-task performance, indicating a nonspeci1c impact of response generation upon perceptual identi1cation, comparable to the impairment observed in PRP experiments (see above, cf. Arnell and Duncan 1998; De Jong 1993; De Jong and Sweet 1994; Dell’Acqua et al., in press; Jolicœur 1999; Ruthruff et al. 1995). More important from an event-coding point of view was the 1nding that observers were less able to identify an R1-compatible S2 (e.g. left keypress, left-pointing arrow) than to identify an R1-incompatible S2 (e.g. left keypress, right-pointing arrow). In other words, the execution of a left response speci1cally impaired the identi1cation of a left-pointing arrow compared with the identi1cation of a right-pointing arrow and vice versa (‘blindness to response-compatible stimuli’: Müsseler and Hommel 1997a,b). Experiments revealed that this effect can not be attributed to a bias in judging S2 (e.g. Müsseler, Steininger, and Wühr 2001) or to a stimulus–stimulus interference between the processing of S1 and the processing of S2 (e.g. Müsseler et al. 2000). By studying the time course of the blindness effect, we found the effect to exist over at least two seconds before response execution (Wühr and Müsseler, 2001; see also Caessens, Lammertyn, Van der Goten, De Vooght, and Hommel 1997). Thus, the perceptual impairment seems to re2ect a
aapc25.fm Page 529 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
con2ict that emerges already during the planning phase of an action. In another series of experiments, the blindness effect was observed for stimulus–response combinations other than arrows and manual left–right keypresses, demonstrating the generality of the effect (Hommel and Müsseler 2001). Finally, evidence was obtained that feature overlap between the anticipated action effect and the to-be-identi1ed stimulus contributes to perceptual impairment. This was shown in a task when proximal spatial feedback of the responses was eliminated while adding other distal action effects (Steininger 1999). On the other hand, when there is no intention to produce these action effects, the blindness effect disappears (see also Müsseler et al. 2000). Both 1ndings are consistent with the effect-oriented view of the event-coding model. The previously described studies used a paradigm that is rather dissimilar from the PRP paradigm applied to investigate the nonspeci1c interference from a response task on visual encoding. The main difference is that in the PRP paradigm R1 has to be executed immediately in response to S1, whereas it could be prepared at leisure in the experiments on the blindness effect. The use of a PRPlike paradigm with a perceptual task enabled researchers to investigate an important question about the origins of the blindness effect. In the original experiments on this effect, the participants were never allowed to perform R1 immediately in response to S1. Instead, the participants either had to perform another response before R1 (e.g. Müsseler and Hommel, 1997a), or they had to wait for a go signal (e.g. Wühr and Müsseler, 2001). These tasks might have caused a temporary inhibition of R1. In other words, the LEFT and RIGHT code might have been inhibited to prevent premature execution of R1. That inhibition could have caused the blindness effect. If this was correct, no blindness effect should occur in a PRP paradigm, when S1 calls for the immediate and quick execution of R1. First steps towards answering these questions have been made by Wühr and Müsseler (in press). The authors used a paradigm in which a speeded manual left or right R1 to a tone S1 was combined with the visual identi1cation of a masked left- or right-pointing S2 presented after S1 with varying SOAs between 50 to 1000 ms. Findings revealed a nonspeci1c decrease in identi1cation performance with decreasing SOA, thus replicating the previous PRP studies. Additionally, a speci1c impairment was observed, in that observers performed better with incompatible than with compatible R1–S2 relationships, thus replicating the speci1c blindness effect. This observation argues against the possibility that the blindness effect is due to a transient inhibition of R1. The following experiment aimed to replicate and to extend this PRP research by trying to decompose the speci1c blindness effect into costs and bene1ts.
25.2.3 Experiment 2 25.2.3.1 Blindness to response-compatible stimuli in a PRP task: costs and bene1ts Research on the blindness effect has shown that the preparation and execution of a response can speci1cally affect the concurrent processing of a visual stimulus. In particular, the encoding of response-compatible S2 seems to be impaired compared with the processing of response-incompatible S2. The present experiment further examined the effect of R1–S2 feature overlap on the encoding of S2 in a PRP-like task. In particular, we investigated whether the perceptual impairment originated from costs of the compatible condition or if it originated from bene1ts of the incompatible condition. The event-coding model predicts costs of the compatible condition. Alternatively, however, bene1ts of the incompatible condition could result from an attentional bias towards unexpected inputs, like the occurrence of an response-incompatible stimulus (e.g. Johnston and Hawley 1994). According to this alternative view, it is not that the response-compatible feature is less available for
529
aapc25.fm Page 530 Wednesday, December 5, 2001 10:11 AM
530
Common mechanisms in perception and action
perceptual processing, but that the response-incompatible feature is preferably processed. In order to assess costs or bene1ts, the identi1cation rates in the compatible and incompatible conditions were compared with the identi1cation rates in a neutral condition. A PRP set-up was used that was similar to the previous experiment except for the introduction of a feature-overlap between R1 and S2 (cf. Table 25.1). The go trials from the previous experiments formed the neutral conditions in this experiment. Again, participants responded to an ‘m’ with a keypress of the middle 1nger of their right hand, whereas they responded to a ‘b’ with simultaneous keypresses of their index and ring 1ngers. Both conditions were neutral with respect to the features LEFT and RIGHT. These features, however, formed the R1–S2 feature overlap in the compatible and incompatible condition. The left and right keypresses R1 in response to the letters ‘l’ and ‘r’ were assumed to share the LEFT and RIGHT feature with the left and right arrows S2. The event-coding model states that the observed difference between the compatible and the incompatible conditions is due only to costs of the former condition. As a consequence, the identi1cation of a response-compatible S2 should also be inferior when compared with a response-neutral condition. This prediction should hold for the intermediate SOA of 400 ms, at least, because the blindness effect has been found to be largest around the execution of a speeded R1 (Wühr and Müsseler, in press).
25.2.3.2 Method The only change in the procedure was that participants now had to respond to each of the four lower-case letters S1 (‘b,’ ‘l,’ ‘m,’ and ‘r’), which signaled the required response R1 as indicated in Table 25.1. Twelve paid volunteers (9 females, 3 males, mean age 23 years) took part in this experiment. None of them had participated in the previous experiment. 25.2.3.3 Results and discussion Keypress task. 3.5% of the responses exceeded the RT criterion of 1 s. The mean percentage of incorrect responses was only 1.9%. A 4 × 3 ANOVA on percentage of incorrect responses indicated signi1cantly more errors for the two neutral conditions compared with the compatible and incompatible condition, F(1.5, 16.6) = 7.42, p < 0.01. Analyses revealed no effects for the compatible and the incompatible condition, whereas, for the two neutral conditions, the percentage of incorrect responses increased signi1cantly with increasing SOA, F(2, 22) = 5.20, p < 0.05. In fact, only the incorrect responses for the R1–S2 relationship ‘neutral both keys’ increased with increasing SOA, leading to a signi1cant interaction between neutral conditions and SOA, F(2.3, 25.7) = 7.10, p < 0.01. Across all conditions, mean RT was 599 ms (SD = 74 ms). A 4 (R1–S2 condition) × 3 (SOA) ANOVA revealed a signi1cant main effect of R1–S2 relationship on RT, F(3, 33) = 4.83, p < 0.01. In correspondence with Experiment 1, this main effect means that the RT in the condition ‘neutral both keys’ (636 ms) was slower probably due to the higher motor demands than the RT in the three other R1–S2 conditions, which did not differ from each other (neutral middle: 585 ms; compatible: 592 ms; incompatible: 582 ms; cf. Rabbitt et al. 1975). Neither the main effect of SOA nor the interaction were signi1cant, both F < 1. Identi1cation task. The mean presentation duration for S2 across all participants was 44 ms (range: 23–74 ms). Across all conditions, PC was 0.75 (SD = .08, cf. Fig. 25.4). A 4 × 3 ANOVA on PCs revealed a signi1cance of the factor SOA, F(2, 22) = 23.85, p < 0.001, replicating the nonspeci1c interference observed in Experiment 1. Additionally, the interaction with condition was signi1cant, F(6, 66) = 3.49, p < 0.01.
aapc25.fm Page 531 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
Fig. 25.4 Mean proportion of correct judgments of incompatible, compatible, and neutral S2 in Experiment 2.
To assess costs and bene1ts, we 1rst analyzed the PC values for the two neutral R1–S2 conditions only. Because there was no signi1cant difference between these conditions, they were collapsed together and the resulting data were compared with the PC values from the compatible and the incompatible condition, respectively. The differences between the compatible and the neutral condition were computed; negative differences indicate costs, whereas positive differences indicate bene1ts. A positive difference was observed for the SOA of 200 ms (0.01), and a negative difference for the SOAs of 400 ms (– 0.04) and 1000 ms (– 0.01). The corresponding differences for the incompatible condition were negative for the SOA of 200 ms (– 0.05) and positive for the 400 - (0.03) and 1000-ms SOAs (0.05). The differences were subjected to a 2 × 3 ANOVA with Compatibility and SOA as within-participant variables. The main effect of SOA was eliminated due to the computation of differences. The main effect of Compatibility was not signi1cant, but there was a signi1cant interaction of Compatibility and SOA, F(2, 22) = 6.12, p < 0.01. This interaction re2ects the 1nding that, for the 200-ms SOA, the identi1cation performance for compatible S2 was better than for incompatible S2, whereas, for the SOAs of 400 ms and 1000 ms, the identi1cation performance for compatible S2 was worse than for incompatible S2. Finally, we tested whether the differences observed between the neutral condition and the compatible or incompatible condition, respectively, deviated from zero. For the incompatible condition, only the positive difference for an SOA of 1000 ms was signi1cant (t = 3.38, p < 0.01, two-tailed); for the compatible condition, only the negative difference for the 400-ms SOA (t = –2.12, p < 0.05, one-tailed). For the 200-ms SOA, no signi1cant differences were observed. In sum, the results indicated a nonspeci1c dual-task interference effect in visual encoding. Identi1cation performance decreased when the temporal overlap between the keypress task and the identi1cation task increased. Findings revealed also that response-compatible stimuli were less well identi1ed than response-incompatible stimuli. Hence, the results replicated successfully the blindness effect in a PRP paradigm with speeded responses R1 (Wühr and Müsseler, in press). Furthermore, in comparison with neutral conditions, costs were observed for the compatible condition with the 400-ms SOA (i.e. about 190 ms before R1 execution). As predicted by the event-coding model, the
531
aapc25.fm Page 532 Wednesday, December 5, 2001 10:11 AM
532
Common mechanisms in perception and action
sensitivity for a response-compatible S2 was reduced when the code for the overlapping feature was involved in response generation. Surprisingly from an event-coding point of view, bene1ts were found at the 1000-ms SOA. The fact that such a bene1t occurred only with the longest SOA (i.e. after R1 execution) might point to a mechanism that facilitates the processing of response-incompatible stimulation after the motor event code has released the spatially opposite feature code. In essence, when the system was engaged in generating a left (right) R1 (compared with a neutral R1), the RIGHT (LEFT) feature code was more sensitive to activation after the release of the feature codes when R1 had been performed. Thus, the spatial LEFT–RIGHT feature codes were able to exert an in2uence on each other, indicating that they might be part of a one-dimensional representation.
25.3 General discussion The present experiments demonstrated that response planning processes were able to exert a striking impact on visual encoding. A variant of the PRP paradigm was used to investigate the in2uence of response planning and execution on the concurrent processing of a visual stimulus. Participants had to perform (or to withhold) a manual response to a visual S1 that preceded a to-be-identi1ed S2 with variable SOAs. When the participants performed R1 in response to S1, the accuracy in identifying S2 decreased with decreasing SOA. Thus, the production of R1 nonspeci1cally interfered with the encoding of S2, replicating previous 1ndings (e.g. De Jong and Sweet 1994; Jolicœur 1999; Wühr and Müsseler, in press). In addition, the amount of interference in visual encoding was found to vary both with the amount of response-related processing in the S1–R1 task (go–nogo task) and with the feature overlap between R1 and S2. In Experiment 1, there was more interference when the participants responded to S1 (go condition) than when they did not respond (nogo condition). In Experiment 2, we analyzed costs and bene1ts in identifying response-compatible and responseincompatible stimuli by comparing the corresponding identi1cation rates with neutral conditions. We observed costs in identifying response-compatible stimuli at an intermediate SOA (400 ms), and noted bene1ts in identifying response-incompatible stimuli at a long SOA (1000 ms). In the following, we discuss whether existing theoretical ideas can account for these 1ndings. It has been known for a long time that capacity limitations occur when two tasks are performed in parallel. It is beyond the scope of the present paper to discuss the various models, according to which capacity limitations originate from system-inherent bottleneck(s) (cf. Kahneman 1973; Pashler 1994; Wickens 1980) or from the functioning organism’s needs to produce appropriate actions in real-time (Allport 1987; Neumann 1996; Van der Heijden 1992; see also Meyer and Kieras 1997). Recently, the variety of the bottleneck assumptions (for an overview see Neumann 1996) has been enhanced by the idea of a ‘central multi-purpose mechanism’ that is needed for (dif1cult) perceptual tasks as well as for the selection and initiation of responses (e.g. De Jong 1993; Pashler 1994; Ruthruff et al. 1995). A central mechanism with limited capacity can account for the nonspeci1c SOA effects observed in the present experiments. As long as the central bottleneck is engaged in selecting R1, the assumption is that it cannot be used to process S2 (or to consolidate S2 in visual short-term memory, cf. Jolicœur 1999; Jolicœur et al., this volume, Chapter 28). Moreover, such a mechanism can also account for the observed go–nogo difference in Experiment 1. In the go condition the central bottleneck mechanism is additionally engaged in selecting and initiating R1, resulting in a larger de1cit of encoding S2. However, whereas this idea can account for the nonspeci1c effects, it cannot easily explain the
aapc25.fm Page 533 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
speci1c effects observed in Experiment 2. The problem is that a central bottleneck hypothesis does not address the representational structures underlying perception and action. The event-coding model is an alternative possibility for explaining interference between response planning and perceptual processing. Up to now, this model was mainly concerned with speci1c effects, which were assumed to originate from a feature overlap between R1 and S2 (cf. Section 25.2). Therefore, the event-coding model can directly explain the speci1c interference effects observed in Experiment 2. When the processing in the keypress task has to access codes that are also needed for the processing in the identi1cation task, a speci1c interference effect should arise. In addition, the event-coding model also suggests an explanation for the observed nonspeci1c effects in both experiments. The central assumption of the event-coding model, that response planning and perceptual encoding operate in partially overlapping representations, does also suggest a possible explanation for the observed nonspeci1c interference effects. Interferences between two processes are certainly more likely to occur when these processes operate in the same representational domain than when these processes operate in distinct domains. Recently, Hommel and colleagues (1998; Hommel et al., in press; Stoet and Hommel 1999, this volume, Chapter 26) extended the event-coding model. The authors suggested that action planning consists of two phases: first, the feature codes of a to-be-performed response are activated (activation phase). Then, during the second phase of processing, the already activated feature codes that belong to one response are integrated into a coherent action plan. From a functional point of view, integration might be necessary to protect the system from integrating other, irrelevant feature codes. Once integrated, feature codes are seen to be insulated against other use (integration phase; cf. encapsulation hypothesis in Müsseler 1999). The blindness effect is seen to be a negative consequence of the integration of codes. However, this account also suggests that the processing could be facilitated during the 1rst phase. If a feature code has just been activated by processing one task, then processing of this feature in another task should bene1t before features are bound. This was indeed what Stoet and Hommel (1999, this volume, Chapter 26) found with two nested keypress tasks. In their experiments, participants prepared a left or right 1nger movement (A), performed another left–right choice reaction (B) in between, and then executed response A. Feature sharing proved to be bene1cial during the feature activation stage, but led to mutual interference in the feature integration stage. When we adopt the two-phase assumption for the present issue (i.e. for the concurrent processing of a keypress task and an identi1cation task), an early bene1t in encoding S2 is predicted when S2 is presented during the phase of activating the feature codes for R1. In contrast, when features of R1 are bound (i.e. when an action plan is built up), the usual blindness effect should occur. In summary, the two-phase version of the event-coding model suggests a facilitation in the identi1cation of response-compatible stimuli at short SOAs and an impairment at long SOAs. In Experiment 2, however, the compatible condition showed only a numerical advantage at the short SOA of 200 ms when compared with the incompatible condition, but no advantage when compared with the neutral condition. One might argue that facilitation will occur with shorter SOAs than 200 ms. Wühr and Müsseler (in press) used SOAs down to 50 ms in a similar task without observing facilitation at short SOAs. Thus, there is (at least, for a concurrent perceptual task) insuf1cient empirical evidence for the two-phase extension of the event-coding model. Note that until now the discussion has dealt with activation of feature codes and their integration as if these phases were determined exclusively by temporal factors. However, other factors might also determine the time course of processing. It is obvious that certain feature-code combinations are more likely than others to be integrated into a speci1c event code. For example, when driving,
533
aapc25.fm Page 534 Wednesday, December 5, 2001 10:11 AM
534
Common mechanisms in perception and action
a red traf1c light is probably associated more strongly with putting one’s right foot on the brake than with any other reaction. Thus, the pre-existing strength of association between feature codes probably exerts an influence on the time and order in which feature codes are integrated. The transition from activation to integration may also depend on the amount of feature overlap between competing event codes. For example, the feature overlap between listening to a selfproduced spoken word and speaking the same word is probably higher than the overlap between perceiving a left arrow and pressing a left key. Listening to and speaking a word are similar in terms of frequency range, intonation, and semantic content, whereas a left arrow and a left keypress share only the abstract LEFT feature code. It is likely that this amount of feature overlap affects the binding process and therefore the transition from activation to integration in a dual-task experiment (for converging evidence of this idea see Hommel and Müsseler 2001). The transition from activation to integration might also be different for establishing a perceptual or a motor event code. Assume, for example, that a perceptual event code is formed up comparatively slowly and a motor event code comparatively fast, generation of allowing immediate responses. Therefore, feature overlap might exert different effects on dependent variables, that is, perceptual judgment and response execution. For example, it is known that the processing of a target is in2uenced by the mere presentation of a feature-sharing distractor. In studies using perceptual judgments as dependent measure, the identi1cation of a target is often impaired by distractors with identical features (e.g. Bjork and Murray 1977; Santee and Egeth 1980). With reaction time as dependent measure, the results are often opposite; that is, the presence of a distractor with identical features speeds up target processing (e.g. Eriksen and Eriksen 1974; Schwarz and Mecklinger 1995). Thus, feature overlap seems to facilitate response execution in one condition and to impair identi1cation of a target in another condition. Unfortunately, up to now little research has been done to investigate when response facilitation and perceptual impairment occur at the same time (but see Wühr, Knoblich, and Müsseler 2001). Thus, future research is needed to further investigate the conditions under which a feature overlap either facilitates or impairs performance.
25.4 Conclusion The present paper focused on crosstalk between action planning and visual encoding in a dual-task situation that is similar to the PRP paradigm. The experimental variations originated from a model in which perception and action planning are assumed to share codes in a common domain. According to this model, action-planning processes should in2uence and modify visual processes in an elementary manner. More speci1cally, the model assumes that once a feature code is integrated in a motor event code its availability for perceptual processing is reduced until it is released again (probably after the execution of the action). The basic assumptions of the event-coding model were not developed to explain perceptual impairments on dual tasks, but to explain various phenomena in action planning, such as spatial stimulus–response compatibility, sensorimotor synchronization, and ideomotor action (see, for an overview, Hommel et al., in press). From an action-planning view, perceptual consequences are only the byproduct of the general processing principle that action planning and perception operate on identical codes. However, the present discussion demonstrates that, in order to understand perceptual phenomena, it may be fruitful to take into account the action-control demands with which a perceptual system is confronted. As a 1nal remark, it is worth noting that there is no need to treat the event-coding model and the central bottleneck hypothesis, as outlined above, as being mutually exclusive. Because these
aapc25.fm Page 535 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
accounts deal with different interferences originating from different sources and processes, they can also supplement each other. This means that nonspeci1c and speci1c interferences may coexist, as observed in the present experiments.
Acknowledgments This research was supported by a grant of the Deutsche Forschungsgemeinschaft (Mu 1298/2). We wish to thank two anonymous reviewers for valuable comments, criticisms, and suggestions on an earlier version; Michael Blum-Kalagin and Lucia Kypke for carrying out the experiments; John Tank for stylistic corrections. Requests for reprints should be sent to either author at the Max Planck Institute for Psychological Research, Amalienstr. 33, D-80799 München, Germany. E-mail: muesseler@ or
[email protected], URL: http://www.mpipf-muenchen.mpg.de/~ muesseler or /~wuehr.
Notes 1. Usually, the delay in the second response is compared with performance on single-task conditions or dualtask conditions with long SOAs. Pashler (1994) has argued that the dual-task condition with long SOAs is a more appropriate comparison condition, because it is, for example, more comparable in terms of memory load. 2. When necessary, F probabilities in the present and the following analyses were corrected according to Greenhouse-Geisser.
References Aitken, A.M. (1994). An architecture for learning to behave. In P. Clift, J.-A. Meyer, and S.W. Wilson (Eds.), From animal to animats 3, pp. 315–324. Cambridge, MA: MIT Press. Allport, D.A. (1987). Selection for action: Some behavioral and neurophysiological consideration of attention and action. In H. Heuer and A.F. Sanders (Eds.), Perspectives on perception and action, pp. 395–419. Hillsdale, NJ: Erlbaum. Arnell, K. and Duncan, J. (1998). Substantial interference between response selection and stimulus encoding. Abstracts of the Psychonomic Society, 39th Annual Meeting, 3, 2. Bertelson, P. and Tisseyre, F. (1969). Refractory period of c-reactions. Journal of Experimental Psychology, 79, 122–128. Bjork, E.L. and Murray, J.T. (1977). On the nature of input channels in visual processing. Psychological Review, 84, 472–484. Caessens, B., Lammertyn, J., Van der Goten, K., De Vooght, G., and Hommel, B. (1997). Temporal characteristics of action planning. Paper presented at the 6th Wintercongres van de Nederlandse Vereniging voor Psychonomie, Egmond aan Zee. Carrier, L.M. and Pashler, H. (1995). Attentional limits in memory retrieval. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 1339–1348. De Jong, R. (1993). Multiple bottlenecks in overlapping task performance. Journal of Experimental Psychology: Human Perception and Performance, 19, 965–980. De Jong, R. and Sweet, J.B. (1994). Preparatory strategies in overlapping-task performance. Perception and Psychophysics, 55, 142–151. Dell’Acqua, R., Turatto, M., and Jolicœur, P. (in press). Cross-modal attentional de1cits in processing tactile stimulation. Perception and Psychophysics.
535
aapc25.fm Page 536 Wednesday, December 5, 2001 10:11 AM
536
Common mechanisms in perception and action
Eriksen, B.A. and Eriksen, C.W. (1974). Effects of noise letters upon identi1cation of a target letter in a nonsearch task. Perception and Psychophysics, 16, 143–149. Greenwald, A.G. (1970). Sensory feedback mechanisms in performance control: With special reference to the ideo-motor mechanism. Psychological Review, 77, 73–99. Hoffmann, J. (1993). Vorhersage und Erkenntnis [Anticipation and cognition]. Göttingen (Germany): Hogrefe. Hommel, B. (1997). Toward an action-concept model of stimulus–response compatibility. In B. Hommel and W. Prinz (Eds.), Theoretical issues on stimulus–response compatibility, pp. 281–320. Amsterdam: Elsevier. Hommel, B. (1998). Automatic stimulus–response translation in dual-task performance. Journal of Experimental Psychology: Human Perception and Performance, 24, 1368–1384. Hommel, B. and Müsseler, J. (2001). Action-feature integration blinds to feature-overlapping perceptual events (submitted for publication). Hommel, B., Müsseler, J., Aschersleben, G., and Prinz, W. (in press). The theory of event coding (TEC). A framework for perception and action. Behavioral and Brain Sciences, 24(5). James, W. (1890). The principles of psychology. New York: Holt. Jeannerod, M. (1997). The cognitive neuroscience of action. Lyon, Oxford: Blackwell Publishers. Johnston, W.A. and Hawley, K.J. (1994). Perceptual inhibition of expected inputs—the key that opens closed minds. Psychonomic Bulletin and Review, 1, 56–72. Jolicœur, P. (1999). Dual-task interference and visual encoding. Journal of Experimental Psychology: Human Perception and Performance, 25, 596–616. Jolicœur, P., Tombu, M., Oriet, C., and Stevanovski, B. (2002). From perception to action: making the connection. This volume, Chapter 28. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall. Kahneman, D., Beatty, J., and Pollack, I. (1967). Perceptual de1cit during a mental task. Science, 157, 218–219. Keele, S.W., Cohen, A., and Ivry, R. (1990). Motor programs: Concepts and issues. In M. Jeannerod (Ed.), Attention and performance XIII: Motor representation and control, pp. 77–111. Hillsdale, NJ: Erlbaum. Lotze, H. (1852). Medicinische Psychologie oder Physiologie der Seele [Medical psychology or the physiology of the mind]. Leipzig: Weidmann. McCann, R.S. and Johnston, J.C. (1992). Locus of the single-channel bottleneck in dual-task interference. Journal of Experimental Psychology: Human Perception and Performance, 18, 471–484. Meltzoff, A.N., Kuhl, P.K., and Moore, M.K. (1991). Perception, representation, and the control of action in newborns and young infants: Towards a new synthesis. In M.J. Weiss and P.R. Zelazo (Eds.), Newborn attention: Biological constraints and the in2uence of experience, pp. 377–411. Norwood, NJ: Ablex Press. Meyer, D.E. and Kieras, D.E. (1997). A computational theory of executive cognitive processes and multipletask performance: Part 1. Basic mechanisms. Psychological Review, 104, 3–65. Müsseler, J. (1995). Wahrnehmung und Handlungsplanung [Perception and action planning]. Aachen: Shaker Verlag. Müsseler, J. (1999). How independent from action control is perception? An event-coding account for more equally ranked crosstalks. In G. Aschersleben, T. Bachmann, and J. Müsseler (Eds.), Cognitive contributions to the perception of spatial and temporal events, pp. 121–147. Amsterdam: Elsevier. Müsseler, J. and Hommel, B. (1997a). Blindness to response-compatible stimuli. Journal of Experimental Psychology: Human Perception and Performance, 23, 861–872. Müsseler, J. and Hommel, B. (1997b). Detecting and identifying response-compatible stimuli. Psychonomic Bulletin and Review, 4, 125–129. Müsseler, J., Wühr, P., and Prinz, W. (2000). Varying the response code in the blindness to response-compatible stimuli. Visual Cognition, 7, 743–767. Müsseler, J., Steininger, S., and Wühr, P. (2001). Can actions affect perceptual processing? The Quarterly Journal of Experimental Psychology, 54A, 137–154. Neumann, O. (1996). Theories of attention. In O. Neumann and A.F. Sanders (Eds.), Handbook of perception and action, Vol. 3, pp. 299–446. London: Academic Press. Pashler, H. (1994). Dual-task interference in simple tasks: Data and theory. Psychological Bulletin, 116, 220–244. Prinz, W. (1990). A common-coding approach to perception and action. In O. Neumann and W. Prinz (Eds.), Relationships between perception and action, pp. 167–201. Berlin: Springer. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Rabbitt, P.M.A., Fearnley, S., and Vyas, S.M. (1975). Programming sequences of complex responses. In P.M.A. Rabbitt and S. Dornic (Eds.), Attention and Performance V, pp. 295–317. London: Academic Press.
aapc25.fm Page 537 Wednesday, December 5, 2001 10:11 AM
Response-evoked interference in visual encoding
Ruthruff, E., Miller, J., and Lachmann, T. (1995). Does mental rotation require central mechanisms? Journal of Experimental Psychology: Human Perception and Performance, 21, 552–570. Santee, J.L. and Egeth, H.E. (1980). Interference in letter identi1cation: A test of feature-speci1c inhibition. Perception and Psychophysics, 27, 321–330. Schwarz, W. and Mecklinger, A. (1995). Relationship between 2anker identi1ability and compatibility effect. Perception and Psychophysics, 57, 1045–1052. Singer, W. (1994). The organization of sensory motor representations in the Neocortex: A hypothesis based on temporal binding. In C. Umiltà and M. Moscovitch (Eds.), Attention and Performance XV: Conscious and nonconscious information processing, pp. 77–107. Cambridge, MA: MIT Press. Smith, M.C. (1967). Theories of the psychological refractory period. Psychological Bulletin, 80, 161–191. Steininger, S. (1999). Handeln und Wahrnehmen [Acting and perceiving]. Aachen: Shaker Verlag. Stoet, G. and Hommel, B. (1999). Action planning and the temporal binding of response codes. Journal of Experimental Psychology: Human Perception and Performance, 25, 1625–1640. Stoet, G. and Hommel, B. (2002). Interaction between feature binding in perception and action. This volume, Chapter 26. Telford, C.W. (1931). The refractory phase of voluntary and associative responses. Journal of Experimental Psychology, 14, 1–37. Treisman, A.M. and Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. Van der Heijden, A.H.C. (1992). Selective attention in vision. London: Routledge. Welford, A.T. (1980). The single-channel hypothesis. In A.T. Welford (Ed.), Reaction times, pp. 215–252. London: Academic Press. Wickens, C.D. (1980). The structure of attentional resources. In R. Nickerson (Ed.), Attention and Performance VIII, pp. 239–257. Hillsdale, NJ: Erlbaum. Wühr, P. and Müsseler, J. (2001). Time course of the blindness to response-compatible stimuli. Journal of Experimental Psychology: Human Perception and Performance, 27, 1260–1270. Wühr, P. and Müsseler, J. (in press). Blindness to response-compatible stimuli in the psychological-refractoryperiod paradigm. Visual Cognition. Wühr, P., Knoblich, G., and Müsseler, J. (2001). An activation-binding model (ABM) for the concurrent processing of visual stimuli (submitted for publication).
537
aapc26.fm Page 538 Thursday, December 6, 2001 9:54 AM
26 Interaction between feature binding in perception and action Gijsbert Stoet and Bernhard Hommel Abstract. To explain how coherent representations can be formed of information that is distributed throughout the brain, binding mechanisms have been hypothesized that temporarily hold together or bind such distributed information. Evidence of temporary feature binding has been reported from tasks requiring perceptual integration and action planning, and there is some evidence that action planning affects perception. The present study provides further evidence that binding-related effects cross borders between perceptual and motor domains by demonstrating that perceptual integration affects action planning. Results from 3 psychophysical experiments suggest that if a particular perceptual feature is bound into an object representation, it is less accessible for concurrent action planning. Furthermore, our results support the idea that the formation of object representations goes through two phases: feature activation and feature integration. Feature sharing between perception and action is bene1cial during the feature activation, but leads to mutual interferences in feature integration. Wider implications of these 1ndings are discussed, especially with regard to feature binding as a general mechanism of cognitive representation as well as the relationship between perception and action.
Psychologists and neuroscientists have long studied how representations of perceptual and action events are organized and how these representations are related to neuronal activity. It is known that elementary features of perceptual and action representations are represented by speci1c neuronal populations. For example, features of visual objects have been found to be coded in various feature maps distributed across the brain (DeYoe and Van Essen 1988; Ungerleider and Haxby 1994), and neurons coding speci1c motor features, such as the direction of reaching movements, have been identi1ed (Georgopoulos 1990). Therefore, it is likely that representations of objects and action plans are based on distributed neuronal populations, each coding different aspects or features of the representation (Singer 1994). One of the unanswered questions associated with this hypothesis about the structure of representations refers to the binding problem: if objects are represented by the activity of distributed sets of neurons, how is the relationship between these neurons coded (von der Malsburg 1981, 1995)? This problem is nicely illustrated by Rosenblatt’s (1961) example of a perceptron, a simple neural network consisting of just four neurons (Fig. 26.1(a)). Neuron 1 responds to the presence of a triangle and neuron 2 to the presence of a square. Neuron 3 responds to all objects in the upper visual 1eld and neuron 4 to all objects in the lower visual 1eld. If this system has to detect a square in the upper visual 1eld, an output neuron would have to detect the simultaneous activity in neuron 2 and 3 (Fig. 26.1(a)). But now suppose that there is a triangle in the upper and a square in the lower visual 1eld: The output neuron would falsely respond (Fig. 26.1(b)). In other words, the perceptron can only handle one object at a time. The example shows that representing the presence or absence of features alone is not suf1cient to represent multiple objects simultaneously.
aapc26.fm Page 539 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
Fig. 26.1 (a): Correct detection of a square in the upper visual 1eld. (b): Perceptron cannot handle the coactivation of two objects.
In order to represent multiple objects, the brain needs to code which features belong together and which do not. However, this simple binding scheme becomes problematic if some features belong to several objects at the same time, as is illustrated in Fig. 26.2. In the model of Fig. 26.2(a), a red circle and a green square are represented at the same time. While the active neurons (1lled circles) code the features that are present in the environment, the bindings represent which features belong together. Although this model offers a solution to the perceptron problem (illustrated in Fig. 26.1), the model cannot represent multiple objects that share a feature (Fig. 26.2(b)): the presence of a red and a green circle result in one blurred object. This is because the shared feature connects the two objects, and there is no way of distinguishing the two objects when the same types of bindings are used for the different objects. A solution is presented in Fig. 26.2(c), where different types of bindings are used for the different objects. The shared feature is connected to each object with a different type of binding. Altogether, the binding problem is best characterized by the question of how distributed sets of features can represent multiple objects without confusing the features of the individual objects. A solution of the binding problem requires three conditions to be satis1ed. First of all, it should allow features to interact with each other. Second, feature interactions must be 2exible, because feature relationships in the environment change quickly. Third, it should allow features to participate in different representations at the same time. Although the binding problem has been investigated mainly in visual perception, there are also some studies on action planning (Engel, Roelfsema, Fries, Brecht, and Singer 1997; Stoet and Hommel 1999). The reason to consider a binding problem in the motor domain is that action plans are likely to be based on sets of distributed action features (Stoet and Hommel 1999) so that, as in perception, the simultaneous representation of multiple actions requires a mechanism for coding which motor features belong together. Suppose that you plan a LEFT FOOT and a RIGHT HAND movement. If the action plans involve the features LEFT, RIGHT, HAND, FOOT, then binding is required to prevent feature confusion that would lead to a LEFT HAND and a RIGHT FOOT movement.
539
aapc26.fm Page 540 Thursday, December 6, 2001 9:54 AM
540
Common mechanisms in perception and action
Fig. 26.2 (a): Two sets of active feature neurons representing a red circle and a green square. Inactive neurons (open circles) do not participate in representations. (b): Representation of a red circle and a green circle is not possible without loss of information if bindings are not distinguishable. (c): Representation of a red circle and a green circle is possible because the two sets of features are distinguished by different types of binding.
The solutions of the binding problem proposed until now are controversial and hotly debated in neuroscience (for an impression of the debate, see Neuron, Vol. 24, Sept. 1999). Yet psychophysical research has contributed to a better understanding of the binding problem independent of implementation questions on the neurophysiological level. Psychophysical experiments have studied behavior under conditions where feature binding is necessary. For example, in visual search tasks, people have been shown to perform much better if the target is de1ned in terms of a single feature rather than a feature conjunction, suggesting that in the latter case some time-consuming feature integration needs to take place (Treisman 1996). Likewise, when being only brie2y presented with more than one object, people tend to produce illusory conjunctions, hence to combine features the wrong way (Treisman and Gelade 1980). Other psychophysical experiments have explored interactions between features bound to different representations. For example, Stoet and Hommel (1999) investigated how a previously prepared action plan (A) residing in memory for later execution in2uences the preparation of another action plan (B). They found that if plan B shares an action feature with the already prepared plan A, planning takes more time. According to Stoet and Hommel, this is because an already bound feature is, in a sense, occupied and thus less easily available for other action plans until the planned action is carried out. Recent 1ndings of Müsseler and colleagues even suggest that integrating features in action planning has an impact on perceptual integration. In particular, Müsseler and Hommel (1997a,b) showed that identifying or detecting an arrow pointing to the left or right is more dif1cult if a spatially compatible response is made at the same time. Wühr and Müsseler (1997) observed that this kind of ‘blindness’ to response-compatible stimuli sets in as early as two seconds before the manual response is actually emitted, revealing that it is not the execution, but the planning of a feature-overlapping action that hampers perception. Along the lines of Stoet and Hommel’s (1999) feature-occupation account, the observation of Müsseler and colleagues might indicate that the binding of a, in that case spatial, feature code to an action plan makes it less available for representing a visual object (cf. Müsseler and Wühr, this
aapc26.fm Page 541 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
volume, Chapter 25). In other words, the effects of feature-occupation might cross borders between perception and action planning. Such an interpretation 1ts nicely with the general idea that perceptual events and action plans are coded within the same representational domain (Prinz 1997), so that feature codes are shared by perception and action planning (Hommel, Müsseler, Aschersleben, and Prinz, in press; Müsseler and Hommel 1997a). If so, we should also be able to 1nd effects going in the opposite direction from Müsseler and colleagues’ action-effect blindness. That is, we should be able to demonstrate that coding a perceptual event including a particular feature X should impair the planning of an unrelated action sharing this feature. This is what we did in three experiments, in which we asked human subjects to perform a left- or right-hand action some time after being presented with a stimulus appearing on the same or the other side.
26.1 Paradigm and rationale We adapted Stoet and Hommel’s (1999) ABBA paradigm so as to allow us to investigate possible interactions between perception and action planning. Participants performed two tasks (A and B) on each trial, with task B embedded in Task A (see Fig. 26.3). In the basic version, Task A required memorization of a visually presented object (Stimulus A). Task B was a speeded choice reaction task in which a left or right index 1nger movement (Response B) was signaled by the identity of a centrally presented letter (Stimulus B). After completion of Task B, a series of forced-choice questions about the features of Stimulus A were answered (Response A). Thus, participants had to hold a representation of Stimulus A in mind while performing Task B. These modi1cations of Stoet and Hommel’s (1999) original design enabled us to study the in2uence of an already constructed and maintained stimulus representation (Stimulus A) on the formation of an action plan (Action Plan B). If the already constructed representation A has a given feature code bound to it, and if this very code needs to be integrated into plan B as well, creating the plan should be more dif1cult than in situations without feature overlap. That is, feature overlap between Stimulus A and Response B should impair the formation of Response B’s action plan and delay its initiation. For a concrete example, assume Stimulus A is a red square appearing on the left side. Upon presentation, the corresponding codes RED, SQUARE, and LEFT are integrated into a coherent representation. If Response B is then a right-hand movement, the RIGHT code needs to be integrated into action plan B, which does not con2ict with maintaining the representation of Stimulus A. However, in case of a left-hand response, the required LEFT feature would already be bound to the representation of Stimulus A, and it would therefore be dif1cult to access. The prediction that feature overlap between a stimulus and response impairs the response seems at odds with established stimulus–response compatibility research that reports faster and more accurate responses to stimuli sharing features with the responses (see Hommel and Prinz 1997, for an overview). For example, in the Simon task (Simon 1990) people respond to a nonspatial stimulus that is presented at different locations. Even though the stimulus location has no task relevance, responses are faster if stimulus and response locations correspond. However, as we will discuss in Experiment 3 in more detail, there are important differences between the Simon task and the paradigm of the present study, the most important being the temporal delay between the critical stimulus and response. In the Simon task people react to the stimulus that carries the irrelevant location feature, so that the processes concerned with forming the stimulus representation and the action plan overlap in time (Hommel 1993a). In contrast, the ABBA paradigm separates the critical stimulus and response (and, thus, the underlying processes) by having the subjects perform response B to
541
aapc26.fm Page 542 Thursday, December 6, 2001 9:54 AM
542
Common mechanisms in perception and action
Fig. 26.3 Sequence of events in the experimental procedure of Experiment 1. The printed colors in the 1gure are different from the real colors in the paradigm. The background color of the screen was always black. Stimulus A is either red or green, round or rectangular, and left or right on the screen. Stimulus A has to be memorized for recall at the end of the trial. Stimulus B is an white ‘X’ or an ‘H’, instructing to perform a left- or a right-hand task immediately. After Response B questions concerning the previously memorized Stimulus A have to be answered. stimulus B, which is presented some time after stimulus A has been processed. Accordingly, planning response B is unlikely to be affected by processes having to do either with coding stimulus A (the process presumably causing the Simon effect; see Hommel 1993b) or with integrating or consolidating it (a process that might cause nonspeci1c capacity limitations; see Jolicoeur, Tombu, Oriet, and Stevanovsky, this volume, Chapter 28). Hence, if we obtain an effect of Stimulus A on planning Response B even if the two are separated in time, this must be due to some outcome or products of coding and integration processes–such as the hypothesized feature bindings.
26.2 Experiment 1 In our 1rst experiment, we tested whether planning Response B is impaired (i.e. takes more time) if it shares a spatial feature with Stimulus A, in which case the respective feature code (LEFT or RIGHT) should be already integrated into the representation of Stimulus A. Hence, we expected that spatial feature overlap between Stimulus A and Response B (i.e. both left or both right) would result in a slower Response B than when there is no spatial feature overlap.
aapc26.fm Page 543 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
26.2.1 Method On each trial, participants experienced the following sequence of events (Fig. 26.3). A white 1xation asterisk appeared on the black screen, followed by a blank and Stimulus A. Stimulus A varied randomly in position (left or right), shape (circle or square), and color (red or green). Participants were asked to memorize the features of Stimulus A for later recall. Then, after another blank screen, a 1xation dot was presented, followed by a blank screen and a brief presentation of Stimulus B (the centrally presented letter H or X). This stimulus signaled a speeded manual response, which consisted in lifting the left or right index 1nger from the touch-sensitive metal plate on which it rested. If Response B was correct, the questions concerning the features of Stimulus A followed. For each of the three feature dimensions (presented in random order), one of the two possible features (randomly determined) was presented at the center, and participants were to make an unspeeded present–absent (‘yes’ or ‘no’) decision by lifting their left or right index 1nger. The mapping of decisions (‘yes’ or ‘no’) to 1ngers (which was also indicated in each display) was constant for a given participant but balanced across participants. Importantly, however, the random variation of the judged feature values did not allow subjects to translate information about Stimulus A into responses in advance of the 1nal question phase. In case of an incorrect answer no further questions were presented. Twelve adult volunteers participated for pay in a single session of about 15 min. They worked through a practice block of eight trials and an experimental block of 80 error-free trials (2 locations of Stimulus A × 2 locations of Response B × 20 replications). Trials with incorrect responses, response omissions (RT > 1000 ms for Response B or RT > 5000 for Response A), or anticipations (RT < 100 ms) were repeated at some random position in the remainder of the block. Participants were informed about their general performance after every 10 error-free trials, and in the end they received a small bonus depending on their mean performance.
26.2.2 Results and discussion Mean RTs and percentages of errors (PEs) for Responses B and A were analyzed as a function of feature overlap (LEFT–LEFT or RIGHT–RIGHT) versus no overlap (LEFT–RIGHT or RIGHT– LEFT) between Stimulus A and Response B (see Table 26.1). The signi1cance criterion was set to p < 0.05. RTs of Response B were signi1cantly longer in the overlap than in the no-overlap condition, F(1, 11) = 6.35, p < 0.05, providing 1rst support for the hypothesis that constructing a perceptual object representation occupies the codes of the object features, so that these codes are temporarily less available for the construction of other, in this case action-related, representations. Apparently, memorizing Stimulus A led to the integration of the spatial code referring to A’s location (e.g. LEFT) so that later integration of the same code into the action plan of Response B was dif1cult and RT increased. In contrast, the RTs of Response A were long and not different in the overlap and no-overlap conditions, presumably re2ecting the nonspeeded nature of this response.
26.3 Experiment 2 Experiment 1 provided 1rst evidence for the assumption that integrating a feature in representing a stimulus event occupies the respective spatial code, and that this occupation impairs the planning of a feature-overlapping action. However, one might argue that requiring subjects to memorize
543
aapc26.fm Page 544 Thursday, December 6, 2001 9:54 AM
544
Common mechanisms in perception and action
Table 26.1 Mean reaction times (RTs) and proportion of errors (PEs) for Expts. 1–3 as a function of feature overlap between stimulus A and response B. Standard deviations are given in parentheses Response B
Response A
RT
PE
RT
PE
Overlap
482 (78)
2.0 (1.4)
816 (137)
8.4 (4.1)
No overlap
469
2.5
814
5.3
(89)
(2.2)
(144)
(6.6)
430 (51)
3.0 (2.7)
Experiment 1
Experiment 2 Overlap No overlap
420
2.5
(51)
(1.8)
Experiment 3 (long preview) Overlap
384 (75)
4.0 (2.1)
1888 (522)
5.1 (7.1)
No overlap
374
4.0
1998
5.4
(65)
(3.4)
(574)
(3.4)
Experiment 3 (short preview) Overlap
359 (35)
3.0 (3.5)
1749 (213)
1.0 (1.0)
No overlap
376 (35)
6.0 (5.6)
1832 (294)
1.5 (1.9)
a stimulus for later report brings in all sorts of possible strategies, such as recoding the stimulus into a more abstract format, or imagery techniques. If so, it may be that these strategies were responsible for the obtained result pattern, rather than the assumed feature-integration processes. To rule that out, we sought for a modi1cation of our design which, on the one hand, would require participants to at least brie2y attend to Stimulus A, so that feature integration could take place but, on the other hand, would not require memorizing the stimulus and thereby introduce possible recoding strategies. Accordingly, we modi1ed the task of Experiment 1 by having Stimulus A and its features no longer be memorized, so that there was no memory test and no Response A. However, Stimulus A served as a Go-signal for Task B. In particular, participants were to react to Stimulus B only when Stimulus A appeared; its features had no behavioral relevance. In 12 randomly intermixed catch trials Stimulus A was omitted, in which case participants were to refrain from responding to Stimulus B. Participants were urged to attend to the Go-signal by informing them that they would be excluded from the
aapc26.fm Page 545 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
experiment in case of more than two responses in the catch trials. Ten naive adult volunteers participated for pay.
26.3.1 Results and discussion Similar to Experiment 1, RTs for Response B were signi1cantly slower in the overlap than in the no-overlap condition, F(1,9) = 7.64, p < 0.05. This suggests that the binding of a feature to a perceptual representation makes it less available for subsequent binding into an action plan. This effect cannot be attributed to memory rehearsal or other possible strategies, because no feature of Stimulus A was to be memorized or was otherwise relevant to the task. This is consistent with the claim of Kahneman, Treisman, and Gibbs (1992), that attentively perceiving a stimulus is a suf1cient condition for feature binding to occur. Here it is demonstrated that this spontaneous binding affects not only perception but action planning as well.
26.4 Experiment 3 As already admitted, the 1nding that feature overlap between one event and another yields a negative effect might seem puzzling at 1rst sight. No doubt, the much more common 1ndings are positive effects of feature overlap, as documented by numerous reports from research on S–R compatibility (for overviews, see Hommel and Prinz 1997). Given that, the observation that feature overlap produces interference seems to stand in contradiction to a whole wealth of well-established effects and phenomena. In order to address this apparent contradiction, Stoet and Hommel (1999) assumed that the temporal delay between the two events may play a critical role, an idea they tested by varying this delay in their version of the ABBA design. In particular, participants were cued to prepare Action A, but to withhold it until the end of the trial. In between preparation and execution of Action A, subjects were asked to prepare and execute a second Action B. If the temporal delay between Stimulus A and Stimulus B was long (presumably allowing for full integration of plan A) the already reported negative effects of feature overlap were obtained. Hence, if subjects had prepared and memorized a left-hand Action A, they were slower initiating a left-hand than a right-hand Action B. However, if Stimulus B appeared soon (100 ms) after Stimulus A—so that planning Action A could not be completed before at least starting to plan Action B—positive effects on B were obtained, that is, feature overlap sped up initiating B. These and other 1ndings (see Hommel 1998b) suggested a two-phase model of action planning. In the 1rst phase, the individual features of an action plan are activated. During this phase, the features are primed and they facilitate processes using the same features. In the second phase, the activated action features are integrated into an action plan and are from then on less available for other representational processes. Although the original two-phase model refers to action planning, the observed commonalities between perceptual integration and action planning suggest that it might also apply to feature integration in perception (Hommel et al., in press). Indeed, there is evidence that codes of perceptual features get activated before effects indicative of feature binding can be observed (Hommel, submitted). If so, the key variable to explain the apparent contradiction between the standard positive effects of feature overlap and the present observation of negative effects would be time or, more precisely, the interval between the presentation of Stimulus A and the planning of Action B. If this interval is short, action planning would be more likely to fall into the 1rst phase of perceptual
545
aapc26.fm Page 546 Thursday, December 6, 2001 9:54 AM
546
Common mechanisms in perception and action
integration, so that feature overlap between stimulus and action should facilitate. This seems to characterize the situation in standard compatibility experiments, where the action follows the stimulus immediately. However, as soon as feature-integration processes begin (i.e. after about 250–500 ms; see Hommel, submitted) feature codes are still activated but now bound to a particular event representation. This should make it more difficult to use these codes to create other representations; bene1ts turn into impairments. In short, effects of stimulus–response feature overlap should be positive with short, but negative with long intervals between object presentation and response planning. This prediction was tested in Experiment 3 by comparing two conditions (see Fig. 26.4). In a long-preview condition, Experiment 1 was replicated by presenting Stimulus A for a time long enough to allow the integration of object features needed for later recall. In a short-preview condition, the basic task was the same, but there were two major modi1cations. First, Stimulus A preceded Stimulus B only brie2y, so that the feature codes representing Stimulus A were likely to be activated but unlikely to be fully integrated before the planning of Response B started. Second, Stimulus A remained on the display until Response B was performed, so that memorizing Stimulus A before planning Response B was unnecessary. According to the distinction between activation and integration, short preview should produce positive effects of feature overlap between Stimulus A and Response B, whereas long preview should yield negative effects, similar to Experiments 1 and 2.
26.4.1 Method The method was similar to Experiment 1, except for the following modi1cations. Stimulus A was always followed by a 15° × 3° mask consisting of 76 × 12 randomly ordered red and green
Fig. 26.4
Sequence of events in the experimental procedure of Experiment 3.
aapc26.fm Page 547 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
rectangles. Stimulus B consisted of a change in the brightness of the screen background from gray to black or to white. There were two separate sessions, one with long preview of Stimulus A and one with short preview (Fig. 26.3). In the long-preview session, a yellow 1xation asterisk was followed by a blank and Stimulus A. Then the stimulus was masked and, after a further interval, Stimulus B was presented by changing the background color. After completion of Response B, the whole screen turned gray and the memory test began (i.e. Response A). It consisted of the presentation of eight randomly ordered rectangles, each containing one of the objects (i.e. combinations of location, shape, and color) used as Stimulus A. Participants indicated their decision by pressing the corresponding key (1–9, excluding the central 5) of the numeric keyboard of the PC. Although RTs were measured for Response B, the memory test was not under time pressure. In the short-preview session, Stimulus A preceded Stimulus B by only 100 ms but stayed visible up to 685 ms after Response B had been completed. Then it was masked and the memory test began. Twelve new adult volunteers participated for pay in both the short and long preview sessions, which took about 15 min each.
26.4.2 Results and discussion For Response B, signi1cant interactions of feature overlap and preview were obtained in both RTs, F(1, 11) = 78.56, p < 0.001, and PEs, F(1, 11) = 7.72, p < 0.05. Separate analyses showed that, as expected, RTs were negatively affected by feature overlap with long preview, F(1, 11) = 5.10, p < 0.05, but positively affected with short preview, F(1, 11) = 18.68, p < 0.001 (see Table 26.1). The PE effect is due to the fact that feature overlap had no effect with long preview (this replicating the previous 1ndings), but a positive effect with short preview, F(1, 11) = 8.21, p < 0.05. This result pattern supports the prediction that brie2y after a stimulus is presented, its features are activated, but not yet bound.1 After some time, the features get integrated and are more dif1cult to bind to other, feature-overlapping events. Interestingly, in the memory test (Response A), RTs were faster with feature overlap than without, F(1, 11) = 9.12, p < 0.05, and accuracy was greater in the short- than the long-preview condition, F(1, 11) = 6.44, p < 0.05. This is in accordance with the activation–integration model and with a similar observation of Stoet and Hommel (1999). At the moment that Response A is prepared and executed, none of its codes are integrated in another task, since the plan for the other action, Response B is no longer maintained. That is, after Response B is executed the bindings between the codes of its representation are disintegrated. Nevertheless, the codes still have a rest activity that is carried over to the preparation of Response A.
26.5 General discussion In all three of our experiments we found evidence that action planning is affected by perceptual feature integration. In particular, we were able to demonstrate that responses are initiated more slowly if the response location corresponds to the location of a previously memorized (Exp. 1 and 3) or merely perceived (Exp. 2) object. Furthermore, Experiment 3 provided preliminary evidence that this effect depends on the time available to integrate the features of that object—suggesting that feature binding is a temporally extended, time demanding process that can be distinguished from the mere activation of feature codes. Taken together, these 1ndings are in agreement with the two-phase
547
aapc26.fm Page 548 Thursday, December 6, 2001 9:54 AM
548
Common mechanisms in perception and action
activation-integration model, proposed by Stoet and Hommel (1999) and extended here to include perceptual integration. Figure 26.5 summarizes how this model accounts for the processes taking place in the overlap (Fig. 26.5 (a) to (f)) and no overlap (Fig. 26.5 (g) to (l)) conditions we investigated. Our 1ndings have several implications, two of which we would like to emphasize.
Fig. 26.5 Explanation of the results in terms of the action and integration model of feature integration. (a) and (g): Stimulus presentation causes activation of the feature codes that correspond to the feature codes of the stimulus (activated features are illustrated as 1lled circles). (b) and (h): The memorization process as in Exps. 1 and 3, or the attention in Exp. 2 cause the temporal integration of the activated features. (c) and (i): The presentation of Stimulus B causes activation in associated motor features. For simplicity, it is assumed that Stimulus B automatically activates the properties of the motor plan associated with Stimulus B. Note that in (c) one of the features already belongs to the integrated set of features representing Stimulus A, whereas in (i) none of the features of A and B overlap. (d) and (j) represent the process of integration. In the feature overlap trial feature F2 must be integrated in two different representations, whereas in (j) none of the features is shared by different representations. It is exactly this phase of processing where the disadvantage of feature overlap comes into play: integration of features that are already in use for other representations is more dif1cult than integration of features that are free. (e) and (k) show that the execution of Response B is based on the representation of the action. The model assumes no differences between the two execution processes. (f) and (l) show that the recall process of Stimulus A is based on the representation of Stimulus A. Note that the integration of B no longer exists. The model assumes that the temporal binding of Plan B was discarded after execution of B. The model assumes that after the disintegration of Plan B, the activity of its codes dissipates gradually. This causes a positive effect on the recall of Stimulus A in case of overlap.
aapc26.fm Page 549 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
26.5.1 Binding and bindings Negative effects of feature overlap between Stimulus A and Action B on initiating the latter were obtained only if Stimulus A appeared two or more seconds before Action B was signaled, but not if Stimulus A and B were presented in close succession. This suggests that, inasmuch as the negative overlap effect is related to feature binding, it is unlikely to re2ect direct interference between ongoing binding processes. Rather, it seems to indicate an after-effect of one binding process (via the binding it produced) on another binding process, a kind of prospective interference. In other words, our results seem to be due to the impact of an already existing binding (a cognitive structure) on current binding (a cognitive process). The main characteristics of our effect (its speci1city and temporal range) distinguish it from another interference effect that stimulus processing can exert on action planning. As Jolicœur and colleagues (e.g. Jolicœur, Dell’Acqua, and Crebolder 2000; Jolicœur et al., this volume, Chapter 28) have repeatedly shown, storing a stimulus for later report interferes with selecting a response at the same time and up to some hundred milliseconds later. Jolicœur and Dell’Acqua (1998) have argued that later report of a stimulus requires a process that they call short-term consolidation, a process that they assume interferes with selecting a response. In elaborating on these ideas, Jolicœur et al. (this volume, Chapter 28) suggest that response selection may involve response-code consolidation, a process similar to the short-term consolidation of stimulus information. We are sympathetic to this view and think that it is very close to the perspective that we propose here. Nevertheless, it is important to consider that Jolicœur et al. focus on the direct interference between two integration or consolidation processes, not on the products of these processes. Accordingly, the effects they deal with are most pronounced if stimulus and response processing overlap in time but disappear with delays of one or more seconds—the exact opposite of what we observed. Moreover, the interference demonstrated by Jolicœur and colleagues is nonspeci1c in the sense that stimulus processing interfered with response selection independent of any feature overlap, whereas feature overlap played a crucial role in our 1ndings. The picture that emerges from these result patterns might be sketched like this: integrating the features of a perceived or planned event might draw upon a strictly resource-limited mechanism that allows integration of only one event at a time—a characteristic that may be responsible for costs in both the consolidation of stimulus information (Jolicœur and Dell’Acqua 1998) and delays of action planning in multiple-task performance (Hommel 1998b). The outcome of such an integration or binding process is a coherent cognitive structure comprising codes of the features of the respective event. If one or more of these codes are shared with another, later integrated event, this integration process is prolonged and/or its use is complicated through cross-talk from the involuntarily connected structure.
26.5.2 Perception and action Our 1ndings add to an increasing number of phenomena in perception and action that indicate the existence of temporary feature bindings. The similarity between these phenomena and their characteristics suggest a general principle of how events are represented in perception and action, namely through cognitive structures formed by temporarily integrating codes representing the features of the to-be-represented event (Hommel et al., in press). But apart from mere similarity of processes, our 1ndings also suggest at least some sharing of representational codes.
549
aapc26.fm Page 550 Thursday, December 6, 2001 9:54 AM
550
Common mechanisms in perception and action
Minimally, the observation that feature overlap between a stimulus and a logically unrelated action affects performance on the latter seems to suggest that codes of this feature are shared, that is, accessed and used by both perceptual processing and action planning. However, even though converging evidence for this conclusion comes from Müsseler and Hommel (1997a,b) and related studies, there is a possible objection. Assume that what get integrated are not low-level perceptual features of stimuli but more abstract, high-level semantic codes, which then interact with the semantic representation of the to-be-planned action. This would imply that our binding story may hold but there would be no need to claim interactions between perceptual and action-related feature codes. Instead, what interacts may be codes of the same, abstract kind. If so, the observation of code sharing would be somewhat less surprising. Although our present data do not allow us to rule out this idea, some recent data of Hommel and Müsseler (2001) make us doubt that it is applicable. Hommel and Müsseler employed the design developed by Müsseler and Hommel (1997a) but varied the ‘format’ of both the to-beplanned action and the to-be-identi1ed stimulus. That is, they asked subjects to plan either a left-or right-hand keypress or the verbal utterance ‘left’ or ‘right’ (or, to be precise, the German equivalents) and presented then either left- or right-pointing arrows or the words ‘left’ or ‘right’. If subjects were presented with arrows while maintaining the plan to perform a keypressing action, actioncompatible arrows were less accurately identi1ed than incompatible arrows, which replicates the 1ndings of Müsseler and Hommel (1997a). As a left-pointing arrow and the word ‘left’, or a right-pointing arrow and the word ‘right’, have the same meaning, their semantic representations should be equivalent or even identical, so that a semantic-coding view would predict comparable effects of action planning on arrows and words. However, word identi1cation was not affected by planning keypresses at all. In contrast, planning verbal utterances impaired the identi1cation of compatible words, while arrow identi1cation remained unaffected. Obviously, a merely semantic relationship between a planned action and a processed stimulus is insuf1cient to produce interactions between their codes; what seems necessary is similarity between more low-level perceptual and action-related codes, just as our feature-binding approach suggests. If so, there is considerable reason to think that our 1ndings re2ect a true interaction between perception and action planning. If these considerations are correct, we are left with the insight that the codes that seem to be shared are speci1c and abstract at the same time. They are speci1c inasmuch as they code real locations of stimulus events or actions, not just spatial meaning. But they are also abstract in being able to code both perceptual events and action plans. Although this sounds self-contradictory it need not be. If we assume that actions are cognitively coded and planned in terms of their perceivable effects (Elsner and Hommel, in press; Hommel 1996; Hommel et al., in press; Müsseler and Hommel 1997a), the only difference between the codes involved in perceiving an event and in planning an action is that the former may or may not have resulted from one’s own movements and that the latter is still in the process of being produced. The quality of the codes themselves does not need to differ; in either case they make up internal structures the activity of which is correlated with the intended or real presence or absence of a particular event characterized through the possession of particular features. In other words, feature codes may code features irrespective of whether these belong to registered input or intended output. If so, feature codes would always be speci1c with respect to the features it codes but would be abstract with respect to the origin of the coded event.
aapc26.fm Page 551 Thursday, December 6, 2001 9:54 AM
Interaction between feature binding in perception and action
Acknowledgments We thank the Max Planck Society for supporting the research; Fiorello Banci and Karl-Heinz Honsberg for constructing the response devices and for technical support; and Lawrence Snyder for helpful comments on the manuscript.
Note 1. Although our 1ndings suggest a critical role of timing, we should mention that some aspects of our design do not allow us to exclude possible contributions from another factor. In order to roughly equate the durations of Stimulus A in the two preview conditions, and to discourage subjects from memorizing that stimulus under short preview, we left Stimulus A on the screen while subjects were working on Task B. As this was not the case under long preview, it might be that the presence of Stimulus A somehow contributed to the different results under short and long preview. Although we 1nd it dif1cult to imagine what such a contribution may look like, future research may provide us with a more differentiated picture.
References Allport, D.A., Tipper, S.P., and Chmiel, N.R.J. (1985). Perceptual integration and postcategorial 1ltering. In M.I. Posner and O.S.M. Marin (Eds.), Attention and performance XI, pp. 107–132. Hillsdale, NJ: Erlbaum. DeYoe, E.A. and Van Essen, D.C. (1988). Concurrent processing streams in monkey visual cortex. Trends in Neuroscience, 11, 219–226. Elsner, B. and Hommel, B. (in press). Effect anticipation and action control. Journal of Experimental Psychology: Human Perception and Performance. Engel, A.K., Roelfsema, P.R., Fries, P., Brecht, M., and Singer, W. (1997). Binding and response selection in the temporal domain: A new paradigm for neurobiological research. Theory in Biosciences, 116, 241–266. Georgopoulos, A.P. (1990). Neural coding of the direction of reaching and a comparison with saccadic eyemovements. In Cold Spring Harbor Symposium on Quantitative Biology, 55, 849–859. Cold Spring Harbor, NY: Laboratory Press. Hommel, B. (1993a). The relationship between stimulus processing and response selection in the Simon task: Evidence for a temporal overlap. Psychological Research, 55, 280–290. Hommel, B. (1993b). The role of attention for the Simon effect. Psychological Research, 55, 208–222. Hommel, B. (1994). Spontaneous decay of response code activation. Psychological Research, 56, 261–268. Hommel, B. (1996). The cognitive representation of action: automatic integration of perceived action effects. Psychological Research, 59, 176–186. Hommel, B. (1997). Toward an action-concept model of stimulus–response compatibility. In B. Hommel and W. Prinz (Eds.), Theoretical issues in stimulus–response compatibility, pp. 281–320. Amsterdam: North-Holland. Hommel, B. (1998a). Event 1les: Evidence for automatic integration of stimulus–response episodes. Visual Cognition, 5, 183–216. Hommel, B. (1998b). Automatic stimulus–response translation in dual-task performance. Journal of Experimental Psychology: Human Perception and Performance, 24, 1368–1384. Hommel, B. (2001). Time course of feature binding. Manuscript submitted for publication. Hommel, B. (submitted). Time course of feature binding. Manuscript submitted for publication. Hommel, B. and Prinz, W. (1997) (Eds.). Theoretical issues in stimulus–response compatibility. Amsterdam: North-Holland. Hommel, B., Müsseler, J., Aschersleben, G., and Prinz, W. (in press). The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences. Jersild, A.T. (1927). Mental set and shift. Archives of Psychology, (Whole No. 89). Jolicœur, P. and Dell’Acqua, R. (1998). The demonstration of short-term consolidation. Cognitive Psychology, 36, 138–202.
551
aapc26.fm Page 552 Thursday, December 6, 2001 9:54 AM
552
Common mechanisms in perception and action
Jolicœur, P., Dell’Acqua, R., and Crebolder, J. (2000). Multitasking performance de1cits: Forging links between the attentional blink and the psychological refractory period. In S. Monsell and J. Driver (Eds.), Attention and performance XVIII: Control of cognitive processes, pp. 309–330. Cambridge, MA: MIT Press. Kahneman, D., Treisman, A., and Gibbs, B.J. (1992). The reviewing of object 1les: Object-speci1c integration of information. Cognitive Psychology, 24, 175–219. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility: A model and taxonomy. Psychological Review, 97, 253–270. Müsseler, J. and Hommel, B. (1997a). Blindness to response-compatible stimuli. Journal of Experimental Psychology: Human Perception and Performance, 23, 861–872. Müsseler, J. and Hommel, B. (1997b). Detecting and identifying response-compatible stimuli. Psychonomic Bulletin Review, 4, 125–129. Müsseler, J. and Wühr, P. (2002). Response-evoked interference in visual encoding. This volume, Chapter 25. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Rosenblatt, F. (1961). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington, DC: Spartan Books. Simon, J.R. (1990). The effects of an irrelevant directional cue on human information processing. In R.W. Proctor and T.G. Reeve (Eds.), Stimulus–response compatibility, pp. 31–86. North-Holland: Elsevier. Simon, J.R., Acosta, E., Mewaldt, S.P., and Speidel, C.R. (1976). The effect of an irrelevant directional cue on choice reaction time: Duration of the phenomenon and its relation to stages of processing. Perception Psychophysics, 19, 16–22. Singer, W. (1994). The organization of sensory motor representations in the Ncocortex: A hypothesis based on temporal coding. In C. Umiltà and M. Moscovitch (Eds.), Attention and performance XV: conscious and nonconscious information processing, pp. 77–107. Cambridge, MA: MIT Press. Stoet, G. and Hommel, B. (1999). Action planning and the temporal binding of response codes. Journal of Experimental Psychology: Human Perception and Performance, 25, 1626–1640. Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178. Treisman, A. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. Ungerleider, L.G. and Haxby, J.V. (1994). ‘What’ and ‘where’ in the human brain. Current Opinion in Neurobiology, 4, 157–165. von der Malsburg, C. (1981). The correlation theory of brain function. Internal report. Göttingen, Germany: Max Planck Institute for Biophysical Chemistry. von der Malsburg, C. (1995). Binding in models of perception and brain function. Current Opinion in Neurobiology, 5, 520–526. Wühr, P. and Müsseler, J. (1997). Time course of the blindness to response-compatible stimuli. Munich, Germany: Max Planck Institute for Psychological Research, Paper no. 2/1997.
aapc27.fm Page 553 Wednesday, December 5, 2001 10:13 AM
V Coordination and integration in perception and action
aapc27.fm Page 554 Wednesday, December 5, 2001 10:13 AM
This page intentionally left blank
aapc27.fm Page 555 Wednesday, December 5, 2001 10:13 AM
27 Coordination and integration in perception and action Introduction to Section V Robert Ward
Integration of perception and action systems is necessary for survival. A perceptual system which can’t guide action is literally useless, and an action system which isn’t guided by perception is probably dangerous. So how are perception and action systems linked to allow effective behaviour? The chapters in this section address many aspects of this question, but at least two important themes emerge. One is the increasing variety of approaches that develop and even replace traditional notions about the way perceptual analyses are dynamically linked to responses. A second crucial issue is the way that associations between stimuli and response are structured by experience and by task demands.
27.1 Architectures for converting perception into action The processing stream leading from perception to action has traditionally assumed an initial stage of perceptual processing, a terminal stage of response execution, and between these two stages, a stage of ‘response selection’. Perceptual analysis → Response selection → Response execution Processes operating at the stage of response selection are meant to provide correct integration of perception and action, since it is here that stimulus features, which have been previously identi1ed and are relevant for response, are linked to programs specifying appropriate motor commands. However, despite the importance of response selection for enabling action, psychological theories of response selection have remained underdeveloped compared with theories of visual attention, motor control, and other forms of perceptual–motor processing. The introductory tutorial by Jolicoeur, Tombu, Oriet, and Stevanovski (Chapter 28) represents an important step in this development. The tutorial reviews a diverse literature on capacity limitations and their timecourse, looking throughout the processing stream leading from perception to action. From this review Jolicoeur et al. develop a new model of response selection. Their model postulates multiple bottlenecks, which are used to move both perceptual codes and response codes into and out of short-term memory stores. The scope of the model allows it to reconcile a number of previously apparently unrelated results, such as studies of PRP and the attentional blink, into a single framework. But further, the model of Jolicoeur et al. provides a richer notion than traditional accounts of response selection and its functionally separable sub-components.
aapc27.fm Page 556 Wednesday, December 5, 2001 10:13 AM
556
Common mechanisms in perception and action
New work here also challenges basic assumptions of traditional accounts of the perception-toaction stream. As noted by Cohen and Feintuch (Chapter 29), traditional accounts have generally assumed that visual analyses of all types feed into a common response selection mechanism. Cohen and Feintuch show evidence that instead, there may be a number of specialized perception-to-action streams. Based on their studies of 2anker interference, Cohen and Feintuch suggest that in addition to visual streams used to compute object identity, there are multiple response activation processes, each suited for rapid detection of important visual features and execution of related responses. Their Dimensional-Action architecture represents a new way to think about the relationship of attention, object-recognition, and action processing. The issue of integrating functionally separable systems for object identity and motor control is considered explicitly in the VAM model of Schneider and Deubel (Chapter 30). VAM shows how the integration of perception and action can be reconciled to well-established neuropsychological evidence showing dissociation of cortical streams for processing visual object identity, and for processing visual attributes well-suited for control of action. The VAM model postulates that while processing of object identity and response control can be dissociated, they are nevertheless linked by a common attentional capacity. Schneider and Deubel demonstrate this shared capacity empirically in dualattention and motor tasks. The results and model suggest interacting processes which work together to jointly control the allocation of attention to object identi1cation and responses. Finally, by traditional approaches, stimulus and response codes are linked exclusively at a single level of response selection. This view is challenged by Logan and Zbrodoff (Chapter 31). They note that theories of perceptual processing frequently propose a number of interacting levels, as do theories of motor response, and they consider the theoretical and empirical implications of this interaction for response selection. They note that multiple levels of stimulus and response processing may allow for overlap of stimulus and response codes at many levels, not just one. Logan and Zbrodoff 1nd that effects of stimulus–response overlap can be found throughout the processing stream, and in particular note that effects of stimulus features can depend crucially on the form of response, presumably the result of processes downstream from response selection. One implication is that if response selection is de1ned by the stages in which stimulus and response codes come into contact, then it may represent much of the perception-to-action stream.
27.2 Learning effective responses Two of the chapters here address what is really the fundamental mystery of response selection: that is, given the uncountable number of possible responses we can potentially make, how is it that appropriate responses are selected for particular stimuli? Zießler and Nattkemper (Chapter 32), and Hazeltine (Chapter 33), look at ways in which we learn to make effective responses that produce desired outcomes in the environment. An effect of every response is another stimulus—for example, a motor response of the left hand creates a new stimulus, of a moving left hand. Views of action–effect learning suggest that by encoding responses in terms of the effects they produce on the environment, responses can then be selected on the relevance of their effect to current goals. Zießler and Nattkemper test important aspects of this view. In their experiments, participants learn implicit contingencies between their responses (keypresses) and subsequent stimuli, or action-effects (visual letters). Responses to targets were then found to be faster when targets appeared with action-effects for compatible responses, arguing that knowledge of action-effects was used to prepare responses. If we select actions based
aapc27.fm Page 557 Wednesday, December 5, 2001 10:13 AM
Coordination and integration in perception and action
on our knowledge of their effects, then how do we encode these effects? As simple sequences of motor commands, or as something more? Hazeltine considers this issue in the context of implicit motor learning. Hazeltine’s studies show clear transfer of learning between responses which are executed very differently, but which still achieve the same goal. We therefore encode actions not simply with respect to how they are achieved, but also with their consequences to the environment. As a whole, the chapters in this section point towards a much more dynamic view of the perception-to-action stream than the traditional view allows. They suggest an architecture with multiple paths from perception to action, with extensive interactions between stimulus and response processes throughout. These interactions in2uence both the dynamics of perception–action integration and the acquisition of effective perception–action associations. So where might these kinds of investigation continue? Studies of response selection to date have traditionally been heavily constrained to almost erase contributions of the response selection process itself. For example, typical are experiments in which subjects are asked to select one of two possible responses based on the colour of an object or the pitch of a tone. The possibilities for response selection in the real world are generally much more complex than this, and future studies may explore this complexity. In fact, the process of response selection is really at the very heart of what we call intelligence: that is, how we 2exibly link creative and adaptive responses to a variety of stimulations, taking into account the constraints of internal and external states, and current goals.
557
aapc28.fm Page 558 Wednesday, December 5, 2001 10:13 AM
28 From perception to action: making the connection—a tutorial Pierre Jolicœur, Michael Tombu, Chris Oriet, and Biljana Stevanovski
Abstract. We review results that bear on the relationship between perception and action, with an emphasis on capacity limitations in the information processing system. Over and above limitations in storage capacity of short-term memory (STM) stores, several converging sources of evidence suggest that limitations are also imposed by transformation operations into and out of the short-term memory system (STMS). In particular, entry into an STM store appears to require a capacity-limited operation we call short-term consolidation. The psychological refractory period (PRP) paradigm also provides evidence for central capacity limitations, which we attribute to the consolidation of response codes. Results from PRP experiments show that explicit retrieval of information from long-term memory or from STM requires central mechanisms. We present new evidence suggesting that the bottleneck observed in the PRP paradigm is not merely strategic, but rather re2ects central processing limitations. We review evidence for the capacity-free activation of stored semantic information and present a model in which such activation occurs before the capacity-limited operations in the STMS are required. Evidence for parallel activation of response codes is evaluated within the context of several wellknown paradigms including the 2anker effect, the crosstalk effect, the Simon effect, and the Stroop effect. We highlight the importance of maintaining an appropriate task set in observing these effects and introduce a model that details the stages of processing that mediate the 2ow of information from perception to action. In outlining the components of this model, we attempt to account for the wide range of phenomena reviewed in the chapter. Pipelining, chunking, and chording allow the system to have high throughput despite structural limitations imposed by the STMS.
28.1 Introduction In this chapter we perform a selective review of empirical results that bear on the issue of how perception and action are integrated to support behaviour that begins with the uptake of stimulus information and culminates in an overt response. We focus on experiments from the cognitive literature that shed light on the nature and structure of the information processing system. We are particularly interested in work that has revealed and sought to understand capacity limitations in the information processing 2ow from perception to action. One of our goals is to understand these capacity limitations and use them to construct a conceptual framework that will allow us to characterize how perception and action are integrated to support ongoing behaviour.
28.2 Early versus late vision Although our review is not limited to studies in which stimuli have been presented visually, much of the literature (and thus much of our review) deals with visual input. There is a general consensus among vision researchers that early vision can be characterized as the processing of features by fast
aapc28.fm Page 559 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
parallel mechanisms that have very high capacity. This early processing is mediated by a massively parallel neural architecture with specialized regions of cortex dedicated to the perception of different visual attributes, such as colour and motion (Zeki 1993). In contrast, late vision is characterized as serial, having low processing capacity, and specialized for the processing of objects rather than features (Luck and Vogel 1997; Pinker 1984; Treisman and Gelade 1980; Wolfe 1994). Visual search experiments suggest that simple features can sometimes be processed by the early, high-capacity system, while more complex objects (e.g. feature conjunctions) must usually be processed by the later, capacity-limited system (Treisman and Gelade 1980; but see Wolfe, Cave, and Franzel 1989). Somewhere, either in later vision itself, or in the transfer from early to late vision, or both, there must be a processing bottleneck (or perhaps several bottlenecks). In this chapter we review evidence that allows us to re1ne this coarse breakdown between early and late visual processes.
28.3 Input capacity limitations One potential cause of the general processing limitations associated with later vision could reside in restrictions in the storage capacity of short-term memory (STM). From the point of view of the integration of perception and action, we will consider the transition between early vision and late vision, and capacity limits of STM, to belong to the perception side of the perception–action distinction.
28.3.1 Capacity of storage Miller’s (1956) well-known paper highlighted the capacity limitation of storage in STM and suggested a general limit in the number of simultaneous objects that can be maintained in STM at seven, plus or minus two. Not long after Miller’s paper, Sperling (1960) developed an experimental paradigm that allowed him to demonstrate both the capacity limitation of later processing and the large capacity of early visual processing. On each trial subjects viewed a brie2y presented matrix of symbols (letters, or letters and digits). For one group of subjects, the task was to recall as many characters as possible (whole report condition) by 1lling in a grid corresponding to the dimensions of the presented matrix. A response was considered correct if the recorded item matched both the identity and location of the presented item. For a second group of subjects, a tone with a high, medium, or low frequency was presented at the offset of the display. The frequency of the tone was used to cue one of the rows in the display. In contrast to the 1rst group, these subjects were only required to report the row cued by the tone (partial report condition). Subjects in the whole report condition recalled about four or 1ve items from a twelve item display. In the partial report condition, they recalled three or four of the four items in the cued row. Because there was no way to know which row would be cued, Sperling inferred that the entire display must have been encoded as a short-lived memory trace. This form of memory later came to be called iconic memory (Neisser 1967). The results suggested that about 90% of the display was available when the tone was presented. Results from the partial report condition support the notion that early visual processing is a high capacity and spatially parallel processing system. Results from the whole report condition, in contrast, highlight the sharp capacity limitation of later systems. More recently, Luck and Vogel (1997) used a paradigm that allowed them to estimate the capacity of visual short-term memory (VSTM). They brie2y presented subjects with a sample array of 1–12
559
aapc28.fm Page 560 Wednesday, December 5, 2001 10:13 AM
560
Common mechanisms in perception and action
objects (for 100 ms in most experiments). The objects varied on one or more dimensions (e.g. orientation and colour). After an interval of 900 ms, a test array was presented for 2000 ms and subjects were required to indicate whether the arrays were the same or different. Subjects were very accurate with arrays of up to four objects, after which performance began to decline as more objects were added to the display. Adding a concurrent verbal memory load, cueing a speci1c object, and increasing or changing the feature dimensions of the objects had no effect on performance. Furthermore, increasing the duration of presentation of the sample array did not affect performance, suggesting that the observed capacity limitation was not due to perceptual limitations. The results suggest that VSTM can store up to about four objects. Interestingly, each object can be composed of multiple features such that VSTM can hold many more than four features, as long as they are bundled into objects.
28.3.2 Capacity of transfer to storage As we brie2y reviewed in the previous section, it is clear that there are sharp capacity limitations in the amount of information that can be maintained in STM or in VSTM. Are these storage capacity limitations suf1cient to explain the information processing bottleneck between early vision and later cognitive processing? There is good evidence that the answer to this question is no. We now review evidence that shows that the processes that transfer information from perception to memory also exhibit capacity limitations. Jolicœur and Dell’Acqua (1998) performed several experiments in which subjects were asked to perform two tasks. The 1rst task required them either to encode a small number of visually-presented characters or to ignore them. For example, in Experiment 7, one or three letters were presented on each trial. In some blocks of trials, the subject encoded these letters for recall at the end of the trial, a few seconds later. In other blocks of trials, the letters were presented but they could be ignored. Following the letters, a tone was presented. The tone had either a low or a high pitch and the task was to make a speeded pitch discrimination and to respond by quickly pressing one of two keys on a computer keyboard. The time between the onset of the letters and the onset of the tone, the stimulus onset asynchrony (SOA), was varied from trial to trial within each type of trial block. When the letters could be ignored, making the SOA between the letters and the tone shorter had minimal effects on the response time to the tone. In contrast, when the letters had to be encoded so they could be recalled at the end of the trial, response times to the tone increased as the SOA between the letter(s) and the tone decreased. Furthermore, this effect was larger when three letters had to be encoded than when one letter had to be encoded. However, the letter–tone SOA effect was still signi1cant when only one letter was encoded. This pattern of results shows that one or more processes engaged by the visual-encoding task slows or delays a process required for the speeded tone task. Furthermore, the process or processes in the visual-encoding task causing this interference are under the control of the subject because the very same stimulus presentation in ‘ignore’ trial blocks does not cause response slowing in the tone task. As argued in a foregoing section, STM and VSTM have a capacity of at least four items. Jolicœur and Dell’Acqua (1998) found signi1cant slowing effects in the concurrent tone task even when only one or two items were encoded in the visual encoding task. Thus, the effects observed by Jolicœur and Dell’Acqua are unlikely to have been caused by capacity limitations in the ability to store the encoded items.
aapc28.fm Page 561 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
Jolicœur and Dell’Acqua (1998) interpreted the results as evidence for a capacity limitation in the transfer of information from perception to STM, a process which they called short-term consolidation. Slowing of the response to the tone in experiments such as those of Jolicœur and Dell’Acqua (1998) is still observed long after the presentation of the information to be encoded, suggesting that the process of short-term consolidation sometimes takes a long time. Jolicœur and Dell’Acqua used computer simulations to show that their results could be 1t reasonably well by a model in which short-term consolidation times have an exponential distribution. We note here, as an interesting aside, that the general ex-Gaussian form of reaction time distributions could be explained if we assume that a time-consuming process, such as short-term consolidation (or response code consolidation, see Fig. 28.1), is generally needed in tasks that require an overt response. We must also assume that the distribution of completion time of this process is wellapproximated by an exponential distribution. The ex-Gaussian distribution can be thought of as arising from the convolution of an exponential and a Gaussian distribution. The simulation results of Jolicœur and Dell’Acqua (1998) suggest that there is a slow central stage with exponential termination times, providing one key ingredient required to produce an overall ex-Gaussian responsetime distribution. The combination of all the other processes required to complete such tasks could be represented by a single Gaussian distribution, which would provide the other ingredient needed to produce the observed ex-Gaussian response-time results in speeded tasks. Note that the individual non-central processes that combine to produce a global Gaussian distribution may well have processing times that are not distributed exponentially. All that is required is that their sum should approximate a Gaussian. Using a paradigm similar to that used by Jolicœur and Dell’Acqua (1998), described above, Jolicœur and Dell’Acqua (1997) showed that dual-task slowing in the speeded tone task is also found when the information to be encoded consists of a random polygon seen for the 1rst time. These random polygons had no pre-existing names in the subject’s memory, in contrast with the characters used in Jolicœur and Dell’Acqua (1998). Thus, the results obtained with the random polygons show that the observed dual-task slowing cannot be explained by appealing to processing related to the existence of names for the information to be encoded. The long time course of the observed dual-task interference is consistent with Potter’s earlier work on the encoding of pictures shown using rapid serial visual presentation (RSVP; Potter 1976, 1993). The name of an object was given to the subject either just before the onset of the RSVP sequence, or just after. When the name was given before, subjects could accurately determine whether the RSVP sequence contained a picture corresponding with the name. In contrast, if the name was given after, accuracy was much lower. The high accuracy for trials in which the name was given before the RSVP sequence suggests that the pictures could be identi1ed with high probability, despite the rapid presentation rate. However, the low accuracy observed when the name was given after the RSVP sequence suggests that insuf1cient time was allowed for the perceptual representation of the pictures to be transferred to memory, consistent with the long time course of short-term consolidation revealed by the experiments of Jolicœur and Dell’Acqua (1997, 1998). Another phenomenon, called the attentional blink (AB) (after Raymond, Shapiro, and Arnell 1992), can also be explained based on the notion that encoding information into VSTM is a slow and capacity-demanding operation. Many researchers have demonstrated the AB phenomenon by requiring subjects to report two targets embedded in an RSVP stream of distractors. A typical presentation rate is 10 items per second (100 ms per item). If we assume that the short-term consolidation of the 1rst target takes a relatively long time and that it requires access to a limited-capacity process of
561
aapc28.fm Page 562 Wednesday, December 5, 2001 10:13 AM
562
Common mechanisms in perception and action
short-term consolidation, then there would be a period of time during which the encoding of a second target would be impaired. This is precisely what is observed as the AB phenomenon. Chun and Potter (1995) proposed this model to account for the AB effect. The Chun and Potter model is meant to apply to within-modality paradigms in which the two critical targets are in the same sensory modality (see Potter, Chun, Banks, and Muckenhoupt 1998). The model proposed by Jolicœur and his colleagues (e.g. Arnell and Jolicœur 1999; Jolicœur 1998, 1999a,b,c; Jolicœur and Dell’Acqua, 1997, 1998, 1999; Ross and Jolicœur 1999) differs from Chun and Potter’s in that the Chun and Potter model applies only to the encoding operations involved in the unspeeded version of the AB paradigm (short-term consolidation) and when the stimuli are in the same modality. In contrast, Jolicœur and his colleagues have argued for, and provided evidence supporting, the notion that short-term consolidation requires a mechanism with limited capacity that is also required in many other tasks (for example, the speeded tone task used by Jolicœur and Dell’Acqua 1998; see also Jolicœur and Dell’Acqua 1999). That is, the central interference model proposed by Jolicœur and colleagues postulates a central locus of interference, allowing for interactions across sensory modalities and across different tasks. Examples of more general dual-task interference effects predicted by the model of Jolicœur and his colleagues are provided in later sections of this chapter. One aspect of the AB phenomenon that has sometimes been claimed to be inconsistent with the central interference theory of Jolicœur and his colleagues concerns what Visser, Bishoff, and Di Lollo (2000) have termed lag-1 sparing. Lag-1 sparing is the frequently (but not always) observed 1nding that accuracy of report of the second target in the presentation sequence is better when the second target occurs immediately after the 1rst target (that is, at lag 1) than for longer lags. All authors who have speculated about this interesting result have basically said what amounts to the same thing: somehow, the system can process more than one item at the same time, but only if they occur in very rapid succession. Hence, the item at lag 1 can sometimes be processed concurrently with the 1rst target, which produces superior performance at lag 1 than at longer lags. What has not been clear in these formulations, however, is why the subject cannot process two items concurrently when they are presented at longer lags. Why is it that the subject cannot select the item at lag 2 for concurrent processing with the item at lag 0? In order to account for this aspect of the results, we make an additional assumption: we assume that short-term consolidation functions like a batch processor. A consolidation batch can consist of more than one item. Once under way, however, a batch cannot be interrupted and it must run to completion before another batch can begin. We make an analogy to a chemical batch, or to baking a cake. At the outset, one can increase the yield of the batch by increasing all ingredients proportionally (e.g. doubling all the ingredients). Once the chemical reaction is under way, however, there is a very limited period of time during which additional ingredients can be added without spoiling the entire batch. For example, once a cake is half baked, it is too late to add more egg and 2our. There is a critical point beyond which one cannot add more raw ingredients without spoiling the entire batch. We hypothesize that short-term consolidation operates according to this principle. Once the short-term consolidation of one or more items is initiated, there is a period of time during which additional items cannot be consolidated. This notion explains both the AB phenomenon and lag-1 sparing. It is likely that there is an upper limit on the number of items that can be consolidated simultaneously, which may be set by the capacity of the store (e.g. four items for VSTM). The acceptance window for a consolidation batch is not known, but it is likely in the range of 50–150 ms for alphanumeric characters, based on results from typical AB experiments. The conceptualization of short-term consolidation as a batch processor
aapc28.fm Page 563 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
also allows us to explain why observers can report several items in experiments such as Sperling’s (1960); many items would be available to the consolidation process despite a short acceptance window.
28.4 Retrieval capacity limitation In the foregoing section we argued that encoding information into STM requires a capacity-limited process of short-term consolidation. Now we review brie2y two paradigms that have been used to study capacity limitations in the retrieval of information from memory.
28.4.1 Long-term memory Carrier and Pashler (1995) focused on retrieval from long-term memory (LTM). Their experiments had two phases. In Phase 1, subjects encoded information into LTM. In Phase 2, their memory was tested in the context of a dual-task paradigm sometimes called the psychological refractory period (PRP) paradigm. The PRP paradigm was used here as a tool to determine whether retrieval from LTM requires the same capacity as that needed to perform another simple speeded task. In the PRP paradigm, two stimuli are presented sequentially, and the SOA between them is manipulated. A speeded response is required to each of the two stimuli, and usually, two distinct tasks are performed, one based on the 1rst stimulus and one based on the second stimulus. Typically, response times to the 1rst stimulus are not strongly affected by SOA, whereas those to the second stimulus increase sharply as SOA is reduced. This slowing of Task 2 responses is the so-called PRP effect (see Pashler 1994a, for a review). Manipulations performed in Task 2 are of special interest. If the effects of such a manipulation are additive with SOA, then the stage of processing affected by the manipulation must be either in, or after, the stage that is responsible (i.e. the bottleneck stage) for the capacity limitation that produces the PRP effect (for ease of exposition, we speak of ‘a bottleneck’ or ‘the bottleneck’, but we do not mean to rule out the possibility of multiple bottlenecks). If the effects of the manipulation decrease as SOA is reduced, then the stage affected by the manipulation must precede the bottleneck stage. A decreasing effect as SOA is reduced is often referred to as underadditive with decreasing SOA (or with increasing task overlap). The analysis that supports this interpretation of results observed in PRP experiments can be found in several papers (e.g. Jolicœur, Dell’Acqua, and Crebolder 2000, 2001; McCann and Johnston 1992; Pashler 1994a; Pashler and Johnston 1989). The logic of Carrier and Pashler’s (1995) experiments was to manipulate the dif1culty of retrieval from LTM in the second task of a PRP experiment, and to examine whether this manipulation was additive or underadditive with decreasing SOA. In Phase 1 of Experiment 1, subjects 1rst encoded a list of word pairs. Some pairs were presented once, while others were presented twice. In Phase 2, memory for the studied items was tested in the context of a PRP experiment. The 1rst task was to make rapid discriminations between two tones based on their pitch. At varying SOAs subjects were presented with a word to which they had to produce the paired associate learned in Phase 1. Retrieval dif1culty was manipulated by varying the number of times subjects had seen the word pairs in Phase 1 (once vs. twice). If retrieval from LTM can proceed in parallel with the capacitydemanding stages of Task 1 (and therefore the PRP bottleneck), an underadditive interaction with decreasing SOA should have been found. If retrieval from LTM cannot proceed in parallel with the PRP bottleneck, however, then retrieval dif1culty should have effects that are constant across SOAs (additive effects). The results were clearcut: the effect of retrieval dif1culty was additive with SOA.
563
aapc28.fm Page 564 Wednesday, December 5, 2001 10:13 AM
564
Common mechanisms in perception and action
Carrier and Pashler performed a second experiment in which a recognition task replaced cued recall. Retrieval dif1culty was manipulated by presenting words once or 1ve times at study. Studied items were interspersed with new items during the memory test. Once again retrieval dif1culty was additive with SOA. Based on this evidence, Carrier and Pashler argued that retrieval from LTM cannot proceed in parallel with a simple concurrent speeded task. In this view, retrieval from LTM is sharply capacitylimited and cannot proceed concurrently with certain other central operations (i.e. those required to perform a simple speeded discrimination task). In light of new evidence in Hommel (1998) and Logan and Schulkind (2000), which we discuss in a later section, we speculate that Carrier and Pashler’s conclusions may be correct for explicit retrieval from LTM, but not for implicit retrieval.
28.4.2 Short-term memory Whereas Carrier and Pashler (1995) investigated whether retrieval from LTM would be subject to the PRP bottleneck, Jolicœur (1999d) investigated whether retrieval from STM is subject to the same constraint. In Experiment 1, Sternberg’s (1966) STM scanning paradigm was embedded as Task 2 of a PRP experiment. At the beginning of each trial, a memory set of two or four consonants was presented. Once encoded, the subject initiated the speeded portion of the trial by pressing the space bar. Shortly thereafter, a tone was presented and the subject made a speeded three-alternative discrimination response based on the frequency of the tone. At one of several possible SOAs from the tone, a probe letter was presented. The subject was to decide rapidly whether the probe was or was not in the memory set. As expected, responses to the probe character at long SOAs took longer when the memory set size was four than when it was two, which replicates Sternberg’s (1966) well-known 1nding. The most interesting results were that the memory-set size effect was additive with SOA, suggesting that scanning STM cannot proceed in parallel with the central operations required to perform the concurrent tone task. Converging evidence for this conclusion was obtained in Experiment 2, by reversing the order of tasks in the paradigm. The Sternberg memory task was now Task 1 and the tone task was Task 2. At short SOAs, the effect of memory set size found in Task 1 was re2ected in its entirety in response times in Task 2. This shows that the manipulation of memory set size affected a stage either in or before the bottleneck. Combined with the results of Experiment 1, Jolicœur (1999d) concluded that retrieval from STM is sharply capacitylimited and that it requires the PRP bottleneck.
28.5 Output capacity limitations Evidence for a relatively late capacity limitation in the information processing 2ow can be found with the PRP paradigm. Recall that, in this paradigm, two stimuli are presented sequentially and each stimulus requires a distinct speeded response. The SOA between the stimuli is varied and typically ranges from about 50 ms to 600–1200 ms, depending on the nature of the tasks. For example, the 1rst stimulus could be a tone that varies in frequency, with an associated response that depends on the pitch of the tone; the second stimulus could be a letter with a response that depends on letter identity. The paradigmatic pattern of results is that response times to the 1rst stimulus (e.g. the tone) are relatively constant across the set of SOAs, whereas response times to the second stimulus (e.g. the letter) increase as SOA decreases (see Pashler 1994a, for a review).
aapc28.fm Page 565 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
Several researchers have interpreted results from the PRP paradigm as evidence for a bottleneck in response selection. In this view, only one process of response selection can take place at any given time. When two such selections should ideally be made concurrently (i.e. at short SOAs), they are instead performed sequentially and the second one has to wait for the 1rst one to 1nish. The period of waiting explains the observed PRP effect in the times to execute the second response (McCann and Johnston 1992; Pashler 1994a; Pashler and Johnston 1989; Van Selst and Jolicœur 1997).
28.5.1 Structural versus strategic bottleneck Although several researchers have interpreted results from the PRP paradigm as evidence for a structural capacity limitation in the information processing stream, this view has not gone unchallenged. Meyer and Kieras (1997, 1999) propose instead that there are essentially no central capacity limitations, and that dual-task interference (e.g. the PRP effect) does not occur when 1ve conditions are met. In their 1999 article (p. 54), they state these conditions as follows: ‘. . . virtually perfect time-sharing occurs when 1ve prerequisite conditions prevail in combination: (1) participants are encouraged to give the tasks equal priority; (2) participants are expected to perform each task quickly; (3) there are no constraints on temporal relations or serial order among responses; (4) different tasks use different perceptual and motor processors; and, (5) participants receive enough practice to compile complete production-rule sets for performing each task.’ If condition 4 is violated, bottlenecks in perception or in response initiation or production could create dual-task slowing. If any the 1rst three conditions is violated, a strategic bottleneck could be imposed in order to comply with the instructions. Tombu and Jolicœur (2000) performed three experiments to test these assertions. In Experiment 1, two stimuli were presented on every trial. One was a tone (363 Hz or 660 Hz) and the other was a letter (H or S). A speeded pitch discrimination was made to the tone and a speeded identity discrimination was made to the letter. Subjects responded to the tone by pressing pedals with their feet, and to the letter by pressing keys with their 1ngers. According to De Jong (1993), hand and foot responses require different motor processors, reducing the likelihood of dual-task interference at the level of motor processors. On half the trials the tone was presented 1rst, and on the other half of the trials the letter was presented 1rst. The second stimulus followed the 1rst stimulus at varying SOAs (50, 150, 350, 750, or 1550 ms). Subjects were instructed to give the two tasks equal priority and were instructed to respond quickly and accurately to both stimuli. The experiment was divided into two sessions. In the constrained order session, subjects were required to respond to the stimuli in the order in which they occurred, thereby imposing a constraint on response order. In the unconstrained order session, subjects were free to respond to the stimuli in whatever order they wished. This aspect of the instructions was stressed to ensure that subjects understood that response order was not constrained and to ensure that they felt free to respond in any order. The order of sessions (unconstrained vs. constrained response order) was counterbalanced across subjects. At the beginning of each session two practice blocks were performed to familiarize subjects with the tasks. In the unconstrained response order session, all 1ve conditions listed by Meyer and Kieras were satis1ed. In contrast, in the constrained response order session, the third condition was violated. Therefore, if Meyer and Kieras are correct, a PRP effect should be found when response order is constrained but not when response order is unconstrained.
565
aapc28.fm Page 566 Wednesday, December 5, 2001 10:13 AM
566
Common mechanisms in perception and action
The results were clearcut: substantial PRP effects were observed both when response order was constrained and when it was unconstrained, a pattern of results inconsistent with Meyer and Kieras’s (1997, 1999) claims concerning the PRP effect. In order to provide converging evidence for the results of Experiment 1, Tombu and Jolicœur used different response modalities, to reduce further the likelihood of interference at the level of motor processors. Experiment 2 was identical to Experiment 1 except that tone responses were made by pressing keys with the index and middle 1ngers of the dominant hand, and letter responses were made orally and measured with a voice key. Once again constraining response order violated one of Meyer and Kieras’s conditions for perfect time-sharing. In the unconstrained response order condition, however, all 1ve conditions were met. Once again, a substantial PRP effect was observed in both conditions. These results corroborate those of Experiment 1 in showing that the PRP effect is not a consequence of strategically controlling the order of responding in order to comply with the constraints imposed by experimental instructions. Experiment 3 was carried out to discover whether increasing the amount of prior practice with the tasks would change the results. Subjects performed 1ve sessions of practice trials (approximately 5000 trials). The goal of this phase of the experiment was to allay any doubt that subjects had compiled complete production-rule sets for performing each task. In the practice trials, subjects were presented with three types of trials, which occurred with equal frequency. In one type of trial, only a tone was presented. In another type of trial only a letter was presented. In the remaining trial type, a letter and a tone were presented simultaneously. Subjects were instructed to respond to the stimuli that were presented, as quickly and accurately as possible. When two stimuli were presented, emphasis was placed on the fact that responses could be made in any order. The practice phase was followed by two sessions that were identical to those in Experiment 2. Again, as in Experiments 1 and 2, substantial PRP effects were observed both when response order was constrained and when response order was unconstrained. In all three experiments there was clear evidence that subjects were behaving differently in the constrained and the unconstrained conditions. In particular, while there were many out-of-order responses (responding to the second stimulus before the 1rst) in both conditions, there were fewer of them in the constrained response-order condition. Furthermore, response times in Task 1 at short SOAs in the constrained condition were elevated relative to those in the unconstrained condition. Tombu and Jolicœur interpreted this difference as evidence that subjects were paying more attention to stimulus order in the constrained condition (as they should, in order to emit the responses in the same order), and that the perception of stimulus order became more dif1cult as SOA was reduced, leading to a rise in response times. The absence of a rise in response times in Task 1 with decreasing SOA in the unconstrained condition suggested that subjects were not trying to perceive stimulus order, which in turn suggested that they were not constraining their response order to match stimulus order. In other words, there was good evidence that subjects were following the instructions that differentiated the constrained from the unconstrained condition. In summary, substantial PRP effects were found even when the two tasks did not share stimulus input modalities or response output modalities, when the subjects had practiced the component tasks for thousands of trials, and when there were no instructional constraints on response order (Tombu and Jolicœur 2000). We conclude that the PRP effect is caused by capacity limitations in the information processing system rather than by strategically controlled optional bottlenecks (Meyer and Kieras 1997) imposed to ensure that responses are produced in a particular order.
aapc28.fm Page 567 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
28.6 Action–perception interactions Several researchers have investigated whether perception requires central capacity-limited mechanisms. They have done so using a dual-task paradigm in which a 1rst task, which requires a speeded response, is designed to engage central mechanisms. While central mechanisms are busy processing a 1rst stimulus, a second stimulus is presented in the context of a perception task. Whereas the 1rst task requires a speeded response, the second one does not. Subjects are free to respond to the second stimulus at their leisure, at the end of each trial. The dependent measure of greatest interest is the accuracy of perceptual reports for the second stimulus. Blake and Fox (1969) used this approach to determine if the perception of a letter requires central attention. They asked subjects to make a speeded detection response (simple response time) to an auditory signal. A single letter was presented brie2y at varying SOAs following the auditory signal. The exposure duration of the letter was adjusted for each subject such that the letter could be reported correctly about 66% of the time. The key 1nding was that accuracy of letter report was completely unaffected by the SOA between the tone and the letter task, suggesting that visual form recognition (visual encoding) does not require central mechanisms. Blake and Fox had not masked the second stimulus, however. It is possible, therefore, that visible or informational persistence (Coltheart 1980) could have allowed a representation of the letter to outlast the period of time during which potentially capacity-limited perceptual mechanisms might have been tied up by processing required for Task 1. Pashler (1989) performed experiments that had the same logical structure as those of Blake and Fox (1969), but with three important changes. He used a pattern mask immediately following the second stimulus, more complex visual encoding tasks (e.g. report the highest digit from a display containing several digits), and a choice procedure in Task 1, rather than simple response time. Small and generally non-signi1cant effects of SOA were found by Pashler, leading him to conclude that perceptual encoding is essentially capacity free. De Jong and Sweet (1994) challenged Pashler’s (1989) position, noting that Task 1 response times in Pashler’s experiments tended to be somewhat long, and used a relatively short range of SOAs. De Jong and Sweet suggested that Pashler’s subjects may have tended to concentrate their efforts on the dif1cult perceptual task, at the expense of the speeded task. De Jong and Sweet demonstrated that by increasing advance preparation for the speeded task, response times in that task were shorter, and an SOA effect emerged in the unspeeded perceptual encoding task. Accuracy of perceptual reports was lower at short SOAs than at long SOAs, as expected if the perceptual encoding task required central mechanisms that were not available because they were engaged by the speeded 1rst task. Jolicœur (1999b) provided converging evidence supporting the view that perceptual encoding is impeded by the performance of a concurrent speeded task. He employed a paradigm inspired by the one used by Blake and Fox (1969), but he added several new elements, as outlined below. In the visual encoding task, one or two letters were presented, and the task was simply to report them (unspeeded). There were four experiments that formed a two-by-two design in which the letters were either masked or unmasked, and in which the dif1culty of the 1rst task (speeded) involved either two tones and two responses, or four tones and four responses. Increasing the number of S–R alternatives in Task 1 should increase the duration of central processing, which should, if the encoding task also requires central processing, increase the size of SOA effects in the experiment. Given the likely importance of cutting short persistence of the activation produced by the presentation of
567
aapc28.fm Page 568 Wednesday, December 5, 2001 10:13 AM
568
Common mechanisms in perception and action
the letters, it was expected that strong effects of SOA would only be found when the letters were masked. This is exactly what happened: accuracy decreased as SOA was reduced, but only when the letters were masked (for converging evidence on the importance of masking, see Giesbrecht and Di Lollo 1998). Furthermore, this SOA effect was larger when the number of S–R alternatives in Task 1 was larger, consistent with the view that the visual encoding task required central processing and that it suffered greater interference from a longer period of central processing in Task 1. Jolicœur (1999b) also showed that encoding a new random polygon is subject to a form of interference similar to that exhibited by letter encoding, suggesting that the interference effect is not produced because stimuli have names in memory. This conclusion was extended by Dell’Acqua and Jolicœur (2000), by showing that encoding a random checkerboard pattern (a random half of the squares coloured one way and the remainder coloured another way; a type of display used by Phillips 1974) is also subject to the same general phenomenon: accuracy decreases with decreasing SOA, and this effect is modulated by the duration of central processing in Task 1. Although some previous results suggested that visual encoding does not require central resources (e.g. Blake and Fox 1969; Pashler 1989, 1993), we believe that more recent evidence makes a very strong case for central involvement in visual-encoding tasks (see also Jolicœur 1999a, 1999c; Jolicœur and Dell’Acqua 1999). The role of response selection dif1culty in these experiments (Dell’Acqua and Jolicœur 2000; Jolicœur 1998, 1999b,c) highlights the interaction between mechanisms that mediate perceptual reports and those that mediate action. We note that S–R numerosity manipulations in the absence of immediate action do not modulate accuracy of perceptual reports (Jolicœur 1999c; Ward, Duncan, and Shapiro 1996). According to Jolicœur (1999c), immediate action ensures that response selection takes place concurrently with the perceptual encoding task; delayed action allows response selection to be delayed as well, which removes the interference associated with the integration of perception and action in these paradigms.
28.7 Capacity requirements of perceptual object code activation In the foregoing section, we reviewed evidence showing that the accuracy of perceptual reports suffers when concurrent central processing is required to perform another task. We now review evidence that the activation of abstract perceptual codes, per se, does not require limited-capacity central mechanisms. Stolz, Jolicœur, and Li (2000) investigated whether semantic priming would be affected by SOA in experiments in which there is a speeded 1rst task and an unspeeded perceptual encoding second task. The 1rst task was a speeded two-alternative tone pitch discrimination task. At either a short (150 ms) or a long (700 ms) SOA following the tone, a prime word and a target word were presented in rapid succession. The prime was displayed for 50 ms, followed by a 50 ms blank, after which the target word was presented. On half of the trials, the prime and target were semantically related (the prime was a category name, and the target was a dominant exemplar of that category; e.g. fruit, pear). On the other half of the trials, the prime and target were not related (e.g. furniture, pear). The duration of the target word was adjusted, using a staircase procedure, to produce an accuracy of about 80% in the second task. The target word was followed by a pattern mask. Following the mask, two letters appeared below the location of word displays. The task was to choose which of these two letters corresponded with the 1rst letter of the target word. The foil letter was the 1rst letter of a member with higher category dominance in the category of the target. Consequently, a strategy based on using the prime to predict what had been shown would tend to attenuate the priming effect.
aapc28.fm Page 569 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
Accuracy was lower at short SOA (0.76) than at long SOA (0.84; p < 0.0001), showing once again that a perceptual encoding task (in this case seeing the 1rst letter of the target word) is not immune from the additional capacity demands of the concurrent speeded tone task. Accuracy was higher for targets that followed a related prime (0.82) than for those that followed an unrelated prime (0.79), a result that was also highly statistically signi1cant (p < 0.0009). The most interesting result, however, was that the magnitude of the priming effect was not affected by SOA. The priming effect was 3.13% at short SOA and 2.87% at long SOA (F < 1). Thus, although overall accuracy was clearly lower at short SOA than at long SOA, showing that some aspect of the word task was affected by task overlap, the priming effect in the same task was completely unaffected by SOA. These results suggest that the capacity limitation that affects accuracy of the word task occurs after the stage at which words activate suf1cient meaning to support priming. In the context of the AB phenomenon, Vogel, Luck, and Shapiro (1998) provided electrophysiological evidence for the activation of the meaning of items that could not be reported because of the attentional blink. Vogel et al., used the AB paradigm in combination with event-related potential (ERP) recordings to investigate the nature of the impairment underlying the AB effect. ERP recordings typically generate waveforms that have characteristic components that have been claimed to re2ect certain mental processes. Of speci1c interest in the current study were the N1, P1, N400, and P3 components of the ERP. The N1 and P1 components are thought to re2ect sensory processes affected by physical characteristics of the stimulus. The N400 wave is generally elicited as a result of a semantic mismatch. For example, if the following sentence is read; ‘I got in my car and turned on the pickle,’ the word pickle would elicit a large N400 component. The presence of an N400 wave is strong evidence that a stimulus has been processed at the semantic level suf1ciently to determine whether it 1ts into the current semantic context. Finally, the P3 wave is thought to re2ect operations related to the updating of STM (Donchin 1981). In a series of experiments, Vogel et al. (1998) measured the ERP response to the second target in the AB paradigm. In all experiments they found the usual behavioural pattern of results: the second target was poorly reported at lag 3 relative to lag 7. Interestingly, they found that the N1, P1, and N400 ERP components did not vary with lag. The unchanged N1 and P1 components suggest that the early sensory response to the target was unaffected by lag, and the unchanged N400 component suggests that semantic activation was also unaffected by the AB procedure. In contrast, the P3 component was completely suppressed at lag 3, suggesting that the semantic activation generating the N400 component was unable to update STM. Interestingly, Maki, Frigen, and Paulson (1997) found priming from words that could not be reported because they had been presented during the attentional blink period. We believe that the results of Maki et al. (1997) and those of Stolz et al. (2000) likely re2ect the same underlying mechanisms, and both results are consistent with the intact N400 response found by Vogel et al. (1998).
28.8 Parallel activation of response codes 28.8.1 The 2anker effect The selection of information from the visual environment has often been studied using the 2anker paradigm. In their seminal work, using the 2anker paradigm, Eriksen and Eriksen (1974) presented subjects with targets (i.e. H and K, or S and C) that had been assigned different response mappings
569
aapc28.fm Page 570 Wednesday, December 5, 2001 10:13 AM
570
Common mechanisms in perception and action
(e.g. left lever action to H and K targets, right lever action to S and C targets). Targets always appeared at the same location, but they were 2anked on either side by distractor items that subjects had been instructed to ignore. The critical manipulation within this paradigm involved varying the relationship between the target stimulus and its 2ankers. On a compatible trial, the target and 2ankers mapped onto the same response (i.e. HHKHH); in contrast, on an incompatible trial, the target and 2ankers mapped onto different responses (i.e. SSKSS). Response times to the target were slower on incompatible trials than on compatible trials; this pattern of results is typically referred to as the 2anker compatibility effect, or the 2anker effect (in short). Interestingly, when neutral stimuli are included as possible 2ankers, incompatible 2ankers produce signi1cant inhibition whereas compatible 2ankers produce little or no facilitation (Eriksen 1995). The 2anker compatibility effect is interpreted by Eriksen and his colleagues as evidence for the concurrent activation of response codes, leading to response competition when the target and 2ankers are incompatible (Eriksen 1995). A particularly interesting experiment is summarized in Eriksen (1995). Names of animals were presented at screen center and subjects were instructed to classify the animals as large or small in size. On some trials, distractor items appeared in to-be-ignored locations above or below the target. These distractors could be irrelevant to the task (neutral), relevant to the classi1cation response (the words ‘large’ or ‘small’), or other animal names. Slowing of responses was observed on trials in which the distractor was incompatible with the response to the target (e.g. ‘mouse’ target with ‘large’ distractor), as compared to neutral trials. Additionally, when the distractor was an animal name whose size was incompatible with that of the target (e.g. ‘mouse’ target with ‘lion’ distractor), responses to the target were again slowed. Although interference between the target and the distractor occurred when the target and the distractor overlapped in either category or response name, this interference was not observed across categories. For example, presentation of a distractor that was incompatible with the target in terms of size, but belonged to a different category (i.e. ‘mouse’ target with ‘mountain’ distractor), did not result in a slowed response to the target.
28.8.2 The crosstalk effect Results from the 2anker paradigm are consistent with the view that multiple stimuli can activate their abstract perceptual representations and associated response codes in parallel. Interestingly, researchers studying the PRP phenomenon appear to claim the opposite. That is, they propose that response selection imposes a bottleneck in dual-task performance such that the selection of a response to the second of two stimuli cannot proceed while a response is selected for the 1rst (e.g. McCann and Johnston 1992; Pashler 1984; Pashler 1994a; Pashler and Johnston 1989). Exactly which components of response selection are the cause of the bottleneck remains in question. Fagot and Pashler (1992) proposed that the bottleneck results from a capacity limitation in which only one stimulus–response (S–R) mapping available in working memory at any one time can be executed. McCann and Johnston (1992) suggested three further possibilities. Their 1rst hypothesis was that only the stimulus–response (S–R) mappings for a single task can be held in working memory at one time. Though similar to Fagot and Pashler’s suggestion, McCann and Johnston’s 1rst hypothesis does not postulate an absolute limit on the number of rules that can be executed at one time, only on the type of rules; namely, all S–R mappings consistent with a single task can be executed concurrently. McCann and Johnston’s second suggestion is that sets of rules for translating stimulus attributes into a response can be held in working memory for different tasks; however, only the S–R translation
aapc28.fm Page 571 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
rules for a single task can be 1red at one time. The uniting theme in the preceding hypotheses is that there is a capacity limitation somewhere in the process of translating stimulus information into a response. McCann and Johnston’s third hypothesis, though, places the bottleneck at the level of response initiation. Speci1cally, although S–R translations can be carried out in parallel, perhaps only a single motor program can be retrieved at one time. Hommel (1998) challenged the notion that S–R translation is responsible for the bottleneck in dual-task interference. The underlying logic of Hommel’s approach was that if the response that is eventually selected to the second of two stimuli in2uences the response made to the 1rst, information about the second response must have been available at the time of selecting the 1rst. This account is inconsistent with the view that only one stimulus can be translated into a response at any given time. Hommel (1998) tested the hypothesis that S–R translation can proceed in parallel for more than one task in a series of 1ve experiments. In four of the 1ve experiments, two tasks were performed. The 1rst task was to make a key press in response to the colour of a red or green stimulus, presented at screen center (e.g. if red, press left; if green, press right). The second task was to make a vocal response to the identity of a stimulus (e.g. if H, say ‘left;’ if R, say ‘right’). In Experiments 1, 2, 4, and 5, a single stimulus was presented (e.g. a red H), requiring two responses; in Experiment 3, two stimuli were presented at varying SOAs, requiring a single response to each. Of interest in the experiments with two tasks per trial was whether responses in Task 1 (R1) would be affected by the response that would eventually be made in Task 2 (R2). R2–R1 compatible trials (e.g. pressing a left key followed by saying left) produced faster responses than incompatible trials (e.g. pressing a right key followed by saying left). One of the determinants of the magnitude of the observed compatibility effects was the speed of the 1rst response. According to Hommel, one would expect that R2 should have a greater impact on R1 as the time taken to make R1 increases, because S2–R2 translation requires some amount of time to complete. Consistent with this expectation, quintile analyses of R1 response times revealed that the compatibility effect was small or absent in the fastest quintile. The effect became larger with increasing response time, however, re2ecting the increased likelihood that R2 information was available at the time R1 was computed. In Experiment 3, in which two stimuli were presented at an SOA of 50, 150, or 650 ms, the effect was observed only at the shortest SOA. Again, this is consistent with the notion that with decreasing SOA, S2–R2 translation is more likely to have been carried out while R1 was being selected. It is unlikely that the compatibility effect is the result of subjects delaying R1 until selection of R2 was complete (a strategy known as grouping); in Experiment 4, a small but reliable compatibility effect was observed even when subjects were encouraged to withhold R2 by making Task 2 unspeeded. Additional evidence against the grouping explanation is provided by the results of Experiment 5. In this variation, only a single response was required to a single stimulus; critically, either the manual or vocal response was cued 500 ms in advance of the stimulus. Interestingly, responses were faster if the required response was compatible with the response that was not required; hence, the compatibility effect cannot be attributed to the use of a grouping strategy. Stimulus–response translation per se, then, is an unlikely cause of the PRP bottleneck. In Hommel’s view, any particular response is potentially associated with several response features, and the PRP effect arises because of a binding problem created when the features for several response codes are activated simultaneously. When the response feature codes for more than one response are activated concurrently, the system has an integration or binding problem in the sense that it does not know which feature belongs to which response (or which stimulus, which indicates the correct order).
571
aapc28.fm Page 572 Wednesday, December 5, 2001 10:13 AM
572
Common mechanisms in perception and action
This ambiguity must be resolved by somehow selectively enhancing the features of one response and/or suppressing the features of the other. Hommel proposes that resolving this binding problem takes time and that this is the cause of the PRP effect. One needs to make a distinction between the activation of response codes, and the ultimate selection of the response to be executed. The results of Hommel suggest that the former can take place for multiple objects, in parallel. Selecting among the activated response codes, however, could require serial processing, which would produce the PRP effect. While S–R translation may take place in parallel for multiple objects, it is important to point out that crosstalk effects are consistent with a notion of competition among response codes. Such competition could be interpreted as a form of capacity limitation. Integrating this view with the evidence for parallel processing described in the foregoing paragraphs leads to the suggestion that S–R translation may be described as a limited-capacity parallel mechanism. We argue in subsequent sections, however, that the capacity limitation implied by crosstalk, per se, is not the main cause of the PRP effect, and that there is another form of capacity limitation at play in that case.
28.8.3 Task set and the crosstalk effect Further evidence for parallel activation of response codes, under some conditions, can be found in the work of Logan and Schulkind (2000). Their experiments can be described as consisting of a 2anker-task with an SOA manipulation between the target and the 2anker. As in the 2anker paradigm, the response associated with the 2anker in2uenced response time to the target. Unlike the typical 2anker paradigm, however, the 2anker also required an overt response, which likely would serve to enhance the usual 2anker effect. For example, in one experiment subjects were presented with two vertically-arranged characters on every trial and were required to discriminate between letters and digits. The response to the top character was made with the right hand; the response to the bottom character was made with the left hand. The two characters were sometimes presented simultaneously and sometimes with an SOA of 100, 300, or 900 ms. The most interesting result was that the category of the second character in2uenced the time to make the response to the 1rst: when the two stimuli belonged to the same category, response times were faster than when they belonged to different categories. Not surprisingly, this category-match effect, or 2anker effect, was largest when the SOA was 0, and decreased monotonically as SOA was increased. Logan and Schulkind argued that the results are consistent with the view that the identi1cation of the second stimulus and the activation of its associated response codes can proceed in parallel with the homologous operations applied to the 1rst stimulus. Perhaps the most interesting result described by Logan and Schulkind was that the category match effect (or delayed-2anker effect) was observed when the same task was to be performed on both characters, but it was abolished when different tasks were to be performed. In Experiment 2, subjects participated in four sessions in which two digits were presented one above the other. They made either a parity judgment (odd/even) or a magnitude judgment (greater than 5/less than 5). In two of the four sessions the task performed on the 1rst and the second stimulus was the same (either magnitude judgments or parity judgments), in the remaining two sessions the task performed on the 1rst stimulus was different from the task performed on the second stimulus (one required a parity judgment and the other a magnitude judgment). The SOA between the two digits was manipulated as in the previous experiment. A category-match effect was found only when the tasks were the same. When different tasks were performed, the category-match effect was abolished. Logan and
aapc28.fm Page 573 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
Schulkind argued that parallel memory retrieval occurs only when the same task set can be applied to both stimuli; otherwise memory retrieval takes place serially. These results remind us of the category effects in the 2anker task described above. When the category of the 2anker could be used to 1lter out the 2anker as irrelevant to the target task, there was no in2uence of the 2anker. It seems unlikely to us that identity codes for the 2anker had not been activated. Despite the activation of identity information, however, the activation of associated response codes was curtailed. In Logan and Schulkind’s experiment, again it seems unlikely that identity information for the second digit would not be activated (or retrieved). However, the response code associated with that digit was not activated when the tasks were different. Logan and Schulkind argue that their results demonstrate parallel retrieval from LTM for two objects, when the same task set is used to process these two objects. Their results may appear inconsistent with those of Carrier and Pashler (1995), which we described in a foregoing section of this chapter. One possible resolution of this apparent discrepancy is to suppose that each study reveals properties of different aspects of retrieval from LTM: implicit versus explicit retrieval. One aspect, implicit retrieval, would mediate the crosstalk effects found by Hommel (1998) and Logan and Schulkind (2000), and would re2ect activation of representations in LTM and activation of response codes. In order to make an overt response based on the retrieval of an item from LTM, however, explicit retrieval would be required, and parallel explicit retrieval may not be possible, explaining the results of Carrier and Pashler (1995). Clearly, more work will be needed to resolve this issue. In summary, the results of Logan and Schulkind are nicely consistent with previous work of Hommel (1998) and previous work on the 2anker effect, and provide important new evidence concerning the role of task set in the activation of associated response codes.
28.8.4 The Simon effect Simon and Rudell (1967) 1rst observed that responses are made more quickly when the location of stimulus and response correspond than when they do not, despite the fact that the location of the stimulus is not relevant to the task. For example, suppose that a left response is to be made to the letter H and a right response to the letter S. An H displayed to the left of 1xation yields faster responses than the same stimulus displayed to the right of 1xation. Wallace (1971) showed that the Simon effect is observed even if the right hand presses the left key and vice versa, a result that was corroborated by Riggio, Gawryszewski, and Umiltà (1986). Further evidence against an explanation of the effect in terms of the organization of the nervous system was provided by Umiltà and Nicoletti (1985), who demonstrated that corresponding responses were faster even when all stimuli and responses were localized to the same side of the body. Wallace (1971) argued that stimuli are always translated into responses and the presence of irrelevant spatial information in a stimulus facilitates translation of the appropriate response. Hommel (1995a) provided evidence that the Simon effect does not result from facilitation of translation of responses by irrelevant spatial stimulus attributes. If this were the case, he reasoned, it should be possible to eliminate the Simon effect by having subjects complete S–R translation well in advance of actually making a compatible or incompatible response. However, when subjects were cued in advance with the required response and instructed to execute their prepared response only when a green ‘go’ signal appeared, responses were nevertheless faster when the go signal appeared on the side corresponding to the prepared response. Hommel (1996) showed further that, contrary to
573
aapc28.fm Page 574 Wednesday, December 5, 2001 10:13 AM
574
Common mechanisms in perception and action
prior belief, the Simon effect could be found in a simple reaction time task. To demonstrate this, he simply eliminated the ‘no go’ signal from the previously described paradigm, such that a ‘go’ signal appeared on the left or right side of the display on every trial. Once again, responses were faster to spatially compatible stimuli than to incompatible stimuli. In a series of studies, Hommel (1993b, 1994a,b, and 1995b) showed that the Simon effect is transient and decays over time; as time to begin processing the relevant stimulus information is delayed (by, for example, increasing the demands of early perceptual processing), the Simon effect diminishes. Hommel (1993a) demonstrated the importance of task-set in observing the Simon effect. A tone that had a low or high pitch was presented on each trial, and subjects had to respond based on pitch. The tone was presented to the left or the right, via loudspeakers, and the side of the tone was not relevant to the main task. For one group of subjects, the task was to press a left (or right) key in response to the pitch of the tone. Pressing the key had the (incidental) effect of generating a 2ash on the side opposite to the side of the pressed key. For a second group of subjects, the task was to generate a 2ash of light on one side in response to the pitch of the tone. A 2ash was generated by (incidentally) pressing a key on the side opposite to the intended 2ash. The elegance in this paradigm resides in the fact that both groups of subjects were actually being asked to do exactly the same thing; only their orientation to the task differed. The 1nding of key interest was that whether or not a Simon effect was observed depended on the task set adopted by the subjects. The group required to press a key in response to the tone was faster when the tone was presented to the side corresponding to the required key press. Subjects instructed to generate a 2ash, in contrast, were faster to respond when the tone was presented to the side corresponding to the required 2ash. This elegant demonstration underscores the importance of task-set in observing the Simon effect.
28.8.5 The semantic priming effect The semantic priming effect refers to the 1nding that processing of targets preceded by a semantically (or associatively) related context word is facilitated relative to processing of targets preceded by an unrelated context word (e.g. Neely 1991). Such effects have been found, for example, when a lexical decision task (deciding whether a letter string is a word or a nonword) is performed on the target. Although semantic priming was once argued to occur as a result of the automatic processing of the context word, this claim has been disputed based on the observation that the effect depends on task set. When subjects are instructed to perform a letter search on the context word (e.g. indicate if the letter j is in the word mouse), the bene1t of a related context is eliminated (Smith, Theodor, and Franklin, 1983). Thus, processing of the context word leading to priming does not always occur and depends on the task set used to process that word. Another interesting demonstration of the dependence of priming on task set was provided by Klinger, Burton, and Pitts (2000). They presented pairs of words (a heavily masked prime followed by an unmasked target) and required a speeded overt response to the second word. The words varied concurrently on two dimensions of meaning: affect (positive vs. negative), and animacy (living vs. nonliving). A pair like RAT–BUNNY, for example, was incongruent on the affective dimension, and congruent on the animacy dimension. In one of their experiments (Experiment 4), one half of the subjects performed an animacy decision task on word targets; the other half performed an affect decision task. The results indicated that priming effects (i.e. the difference in performance between
aapc28.fm Page 575 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
incongruent and congruent pairs in a given dimension of meaning) depended on the type of task. Priming effects on target responses were evident only when prime–target pairs were congruent along the dimension subjects had to judge in a particular condition. Dell’Acqua and Grainger (1998) found no semantic priming from a heavily masked (subliminal) prime when the target task was word naming. In contrast, they found signi1cant priming for picture naming and for word categorization. They argued that word naming, in some conditions, can rely on a task set optimized for shallow processing (i.e. grapheme to phoneme translation), leading to processing that is not affected by activation at the semantic level.
28.8.6 The Stroop effect This dependence of semantic priming on task set was instrumental in leading to the observation that the Stroop (1935) effect also depends on task set. In the Stroop paradigm, subjects are presented with colour names displayed in different colours and the experimental task is to name the colour in which the stimulus is displayed. Responses are slower and less accurate when naming the display colour of items that are incongruent (i.e. the word ‘red’ displayed in blue) than of those that are congruent (i.e. the word ‘blue’ displayed in blue). A standard interpretation of this 1nding is that response codes for the colour name are activated ‘automatically’ in addition to the response codes for the display colour; on incongruent trials, interference occurs between the two con2icting sets of response codes and additional time is required to resolve the con2ict. Although it has often been assumed that the irrelevant aspects of the stimulus (i.e. colour name) activate their response codes automatically, there is evidence that this is not the case. Using a variant of the Stroop task, Besner, Stolz, and Boutilier (1997) showed that the Stroop effect was sensitive to task set. Subjects were instructed to name the display colour of colour names, but on half of the trials the whole word was displayed in colour; on the remaining trials, only a single letter was displayed in colour. If the Stroop effect occurs as a result of automatic processing of the colour name to the level of meaning, then the magnitude of the Stroop effect should not depend on whether a single letter or the whole word was coloured. However, if the Stroop effect does not occur as the result of automatic processing, and identifying the ink colour of a single letter impairs processing of the colour name to the semantic level, then the magnitude of the Stroop effect should vary according to whether a single letter or all letters are displayed in colour. In their 1rst experiment, which used both congruent and incongruent trials, Besner et al. (1997) found a smaller Stroop effect for the single coloured letter trials (72 ms) than for the whole word coloured trials (103 ms). By altering the task such that it was no longer necessary for subjects to attend to the whole word, as on single coloured letter trials, the task set of reading the colour name was altered and interference between the colour name and the correct response was diminished. In the second experiment, congruent trials were removed and a nonword baseline was used. The authors argued that including congruent trials encouraged subjects to read the colour name because performance would not be impaired by doing so. Processing either the display colour or the colour name would produce a correct response. By including only incongruent and nonword trials, subjects would not bene1t from attending to the irrelevant colour names. Under these conditions, the Stroop effect was eliminated on the single coloured letter trials (–1 ms). By altering the task set deployed on the stimuli, the Stroop effect could be completely eliminated. These results suggest that irrelevant aspects of stimuli do not activate their corresponding response codes automatically; it seems to us that task set must play an important role in determining whether or not this activation occurs.
575
aapc28.fm Page 576 Wednesday, December 5, 2001 10:13 AM
576
Common mechanisms in perception and action
When the Stroop effect does occur, there is good evidence that the con2ict between the two sources of information is resolved at a relatively late stage of processing. Fagot and Pashler (1992) embedded a Stroop task as Task 2 of a PRP experiment and found that the Stroop effect was additive with SOA. This result is consistent with the view that Stroop interference takes place in or after the PRP bottleneck (Jolicœur et al. 2000; McCann and Johnston 1992; Pashler 1994a; Pashler and Johnston 1989), and the most plausible interpretation is that the interference takes place at the response-selection stage. Several similarities between the Stroop effect and the 2anker effect suggest to us that 2anker interference should also be additive with SOA, and this conjecture is presently being tested in our laboratory. Interestingly, McCann and Johnston (1992) found that the Simon effect was underadditive with decreasing SOA when induced in Task 2 of a PRP experiment. This result might suggest that the effect takes place before the PRP bottleneck. However, the transient time course of the Simon effect (Hommel 1993b, 1994a,b, 1995b) could produce the observed underadditivity despite a late locus, as hypothesized by McCann and Johnston (1992).
28.8.7 The importance of task set The foregoing sections highlighted the importance of task set in mediating the 2anker effect, the semantic priming effect, the crosstalk effect, the Stroop effect, and the Simon effect. For all of these effects, there is good evidence that observing them depends on having an appropriate task set in place, without which the effect does not occur. This suggests that the stimulus–response translation process is not automatic, but requires an appropriate task set. In some cases, the task set may be a default set (such as reading words, for the Stroop effect).
28.9 General model The survey of results considered so far leads us to propose the model that is illustrated in Fig. 28.1. The model has three types of representations: abstract perceptual codes, a set of STM codes, and abstract response codes. Here are the most important assumptions incorporated into the model: (1) at the core of the model is a short-term memory system (STMS) that is composed of several subsystems (e.g. verbal, visual, motor); the STMS imposes two major kinds of capacity limitations in the overall information processing architecture: 1rst, the storage capacity of the various subsystems in the STMS is limited (e.g. 4 items for VSTM); and second, the transfer of information into, and out of, the short-term stores is capacity-limited; (2) activation of abstract perceptual codes from perceptual input does not require central capacity; (3) the STMS has an input buffer for items that are selected for processing from activated perceptual codes; selected items have a privileged channel through a stimulus–response translation subsystem, via which they can activate corresponding response codes; there are also direct associative links from perceptual codes to response codes; (4) short-term consolidation1 is the process that transfers information from the input buffer into the short-term stores; the short-term consolidation operation functions like a batch processor;
aapc28.fm Page 577 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
Fig. 28.1 Schematic representation of the cognitive architecture. STC = short-term consolidation; RCC = response code consolidation; S–R translation = stimulus–response translation. (5) response code consolidation (RCC) is a capacity-limited process that transfers information from activated response codes into the STMS response buffer; (6) overt motor responses are initiated by commands that originate in the STMS. We now brie2y recapitulate the major 1ndings reviewed in the foregoing sections and discuss how the model explains them. The distinction between early and late vision is captured in the model by the distinction between the system of abstract perceptual codes and the STMS. By hypothesis, the former does not require central capacity whereas the latter does. There are two ways in which the STMS is capacity-limited. First, the number of representations that can be maintained in the various subsystems is limited (Luck and Vogel 1997; Miller 1956). Second, the transfer of information into STM stores is capacitylimited, as is access to information already in STM. The capacity-limited entry into the STMS explains results such as those of Jolicœur and Dell’Acqua (1997), in which encoding a small number of items for delayed recall causes concurrent speeded responses to be slowed. The results of Sternberg (1966), and those of Jolicœur (1999d) showing additivity of the memory set size effect with SOA in the PRP paradigm, re2ect capacity limitations in accessing information already in STM. Retrieval from LTM, which, for explicit memory, may require access to the STMS, also requires central capacity (Carrier and Pashler 1995). We assume that overt motor responses are initiated from commands that originate in the STMS, perhaps via a specialized response buffer or motor short-term memory subsystem. Thus, the STMS is required when making overt responses, and capacity limitations in transfer operations into and out of the system lead to the PRP effect. While one task engages the response code consolidation process, others must wait, forcing serial response selection. In our view, these limitations are not strategic (for the purpose of controlling response order), but instead re2ect the functional properties of the subsystem that consolidates response code information.
577
aapc28.fm Page 578 Wednesday, December 5, 2001 10:13 AM
578
Common mechanisms in perception and action
We assume that operations involving the STMS, such as short-term consolidation and response code consolidation, can mutually interfere with each other. Encoding information into one of the short-term stores interferes with response code consolidation (Jolicœur and Dell’Acqua 1998, 1999), and performing a speeded task, which requires response code consolidation, interferes with short-term consolidation (De Jong and Sweet 1994; Dell’Acqua and Jolicœur 2000; Jolicœur and Dell’Acqua 1997, 1998, 1999, 2000). These mutual interactions between short-term consolidation and response code consolidation constitute a form of action–perception interaction, if we assume that conscious perception requires entry into one of the short-term stores. The importance of masking in the attentional blink paradigm (Giesbrecht and Di Lollo 1998) and the speeded attentional blink paradigm (Jolicœur 1999b) is explained if we suppose that activation of perceptual codes can persist for some time after a stimulus is removed (Coltheart 1980). This persistence can bridge the period of time during which short-term consolidation might be hindered by concurrent processing requiring the STMS. Intact semantic priming despite action–perception dual-task interference that impedes perception can be explained if we assume that priming is mediated by the activation of perceptual codes that are outside of, and prior to, the STMS. The activation of these codes and associated representations does not need the capacity-limited mechanisms that are required for operations involving the STMS. However, delayed perceptual reports require entry into one of the short-term stores. Hence it is possible to observe dual-task interference, re2ecting interference with memory encoding, in the presence of intact semantic priming (Maki et al. 1997; Stolz et al. 2000). If we assume that the N400 component of the evoked response potential re2ects activity in the population of perceptual codes (and semantic associations), the intact N400 response for items that cannot be reported because of the attentional blink is also explained by the model. The sometimes apparently automatic activation of response codes, leading to response competition in paradigms such as the 2anker task, the compatibility tasks of Müsseler and Hommel (1997a,b), the Simon effect, and the Stroop effect, might be explained by postulating direct links from perceptual codes to response codes. These direct links would mediate S–R associations that would be built up over time from repeated pairings of particular stimulus–response combinations. One dif1culty for this account is the modulating in2uence of task set in many of these paradigms. The results of Logan and Schulkind (2000), for example, require a different explanation. In a PRP paradigm, they found that crosstalk between the second response and the 1rst response occurred only when the same task set could be applied to the 1rst and second stimuli. This suggests that the effects were not mediated by direct links between perceptual codes and response codes. When different task sets had to be applied, there was no crosstalk. The modulating in2uence of task set suggests mediation by S–R translation rules that were maintained in the STMS, and that there is a limited ability to maintain more than one such set of S–R translation rules in an active state. This limitation on simultaneous translation rules may be restricted to cases in which the same stimuli could be subjected to both rules (Rogers and Monsell 1995). Stimuli associated with unique responses (e.g. odd vs. even for a digit; H vs. S for a letter) would produce small or nil switch costs in task-switch paradigms, and perhaps more than one such S–R translation rule can be active at one time in the STMS. It is possible that practice can lead to the formation of direct links between perceptual codes and response codes. The potentiation of these links may be subject to contextual in2uences, however, such that sets of direct S–R links could be activated or deactivated as a function of context. Contextual modulation of S–R links could provide the basis for an account of results in which focusing
aapc28.fm Page 579 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
processing to the letter level eliminated semantic priming (Smith et al. 1983) or the Stroop effect (Besner et al. 1997).
28.10 System throughput 28.10.1 Pipelining The architecture proposed in the model illustrated in Fig. 28.1, with long consolidation times for perceptual and response codes, suggests that the throughput of the system would be rather limited. Indeed, Ward et al. (1996) argued that visual attention has a very slow time course, requiring on the order of 400 ms to process a single item from initial stimulation to conscious perception and overt action. In their view, this slow time course is at the root of the attentional blink effect, which often exhibits measurable effects for 400 ms or longer. If Ward et al. (1996) are correct, how is it that musicians or typists can process visual stimuli to action at rates that far exceed the 2–3 items per second rate suggested by a 400 ms per item attentional dwell time? We suggest several possible solutions to this problem. First, our model assumes that the STMS has an input buffer that allows selected perceptual codes to be passed onto the S–R translation subsystem without having to undergo short-term consolidation. In this view, short-term consolidation is required for temporary storage of information in one of the short-term stores, but not for the control of immediate action. This arrangement allows the system to circumvent a major capacity limitation in the system (i.e. short-term consolidation). Note that we assume that information in the input buffer can be subjected to short-term consolidation during response execution (after response code consolidation), explaining why we can generally remember the last action performed, despite the hypothesized interference between response code consolidation and short-term consolidation. Circumventing the short-term consolidation bottleneck in tasks that do not require delayed recall is only a partial answer to the throughput problem, however, because we can process visual stimuli to action at rates that far exceed the rate suggested by the total time taken to process a single item. Response times in choice tasks are rarely less than 300 ms, and more typically exceed 400 ms. Skilled sequential actions can be performed at much faster rates (Hershman and Hillix 1965). Thus, we need additional ways to increase throughput. A second major way to increase throughput is pipelining. Rather than wait for the complete processing of one item before starting another, several items can be processed in parallel, even if a particular stage can only process information serially. Consider the analogy of a car assembly line. Even though it might take several hours to assemble a particular car from beginning to end, fully assembled cars could nonetheless come off the assembly line at a rate of one every few minutes. This rate is far greater than the time to assemble individual cars serially, and it is achieved by assembling many cars in parallel, even though any particular station in the assembly line deals with a single car at a time. There is good evidence that skilled human performance makes use of pipelining to increase throughput beyond what would be achieved via strictly serial processing. The work of Salthouse (1984), following the lead of researchers like Hershman and Hillix (1965), highlights the perception– action pipeline in the context of skilled typing. Skilled typists were asked to type text under different viewing conditions. Free viewing of the text produced optimal performance and provided
579
aapc28.fm Page 580 Wednesday, December 5, 2001 10:13 AM
580
Common mechanisms in perception and action
a baseline for the other conditions. In the experimental conditions, the text to be typed was shown using a window that allowed only a restricted number of characters to be seen at any given time. For very small windows, performance (typing rate) was far below that observed under free viewing conditions. As the window size was increased, typing rate increased, and it eventually reached an asymptote that was equivalent to the performance observed under free viewing conditions. For meaningful text, the average size of the smallest window that produced performance equivalent to free viewing, which we interpret as the number of characters in the pipeline, was about seven. For random letters, a window of about 1ve characters was required, suggesting that this number of characters was processed in parallel, at various stages of processing through the pipeline. Typists who were more skilled had larger look-ahead windows, suggesting a direct relation between the degree of pipelining and skill. The random text condition provides an estimate of the number of pieces of information that are simultaneously in the pipeline, with less in2uence from other factors such as chunking (which we discuss in a subsequent section). One limiting factor on the rate of output of a pipeline is the rate of the slowest stage. In human performance, central operations involving the STMS are likely to be the rate-limiting stages. In a very interesting series of experiments, Pashler (1994b) showed that, for unskilled performance, central operations usually associated with the PRP effect provided a good explanation for the rate-limiting stages in pipeline processing of two items. Pashler (1994b) reported a signi1cant advantage in performance when subjects could preview one item in addition to the currentlyprocessed item. Adding a second preview item provided little bene1t to performance, however. This latter result contrasts with results of Salthouse (1985), who found as much improvement when the preview increased from two to three as was found from one to two (see also Salthouse 1984). We 1nd the difference between Pashler’s (1994b) and Salthouse’s (1985) results very interesting. The subjects tested by Salthouse were typists who had much greater practice at the task than Pashler’s subjects. Consequently, we speculate that an important element of increasing skill is to increase the amount of information in the processing pipeline. The results reported by Salthouse (1985) suggest to us that there is a qualitative change in the effects of increasing the size of the look-ahead window between three and four. That is, there are very large gains in performance from increasing preview from one to two, and for two to three items (even for only modestly skilled typists). Beyond three items, however, the slope of the improvement in performance as the size of the preview window increases is much shallower (even for highly skilled typists). We speculate that one aspect of typing skill is to increase the pipeline to include at least three items. Speculating further, we propose that the pipeline contains one item before the STMS (processing to activate perceptual codes), one in the input buffer of the STMS undergoing S–R translation and response code consolidation, and one item in the response buffer, following response code consolidation. Another way to characterize this pipeline is to say that there is one item undergoing processing before the central bottleneck, one in the bottleneck, and one after the bottleneck. We speculate that the large gains in performance for pipelining three items re2ect these structural properties of the information processing system. Further gains may re2ect the in2uence of chunking and chording, which are discussed in a subsequent section. It is possible that other capacity limitations could become rate-limiting, however, when seven or more items are processed through the pipeline (for highly skilled typists, see Salthouse 1984). Nonetheless, in the absence of evidence to the contrary, we speculate that central operations involving
aapc28.fm Page 581 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
the STMS continue to be the rate-limiting steps in action–perception pipelines, even for highlyskilled performance (Van Selst, Ruthruff, and Johnston 1999).
28.10.2 Chunking Another way to increase throughput and to circumvent consolidation and storage capacity limitations has been called chunking. In chunking, items are grouped and categorized, and a representation of the grouped category can then be processed through the system, rather than processing the individual items in the group. For example, rather skilled readers of English can treat the letters C, A, and T as a single chunk (CAT) rather than as three independent letters. Murdock (1961), for example, showed that the retention of three random consonants was equivalent to the retention of three words, in the context of the Peterson and Peterson (1959) paradigm. Chunking allows the system to process groups of items (e.g. letters) as single units. We hypothesize that the cost of consolidating a chunk (e.g. a word) is the same as the cost of consolidating one of the elements making up the chunk (e.g. an isolated letter). Thus, by chunking the information before subjecting it to short-term consolidation, the capacity demands of consolidation can be reduced considerably.
28.10.3 Chords and arpeggios As for chunking, on the input side, complex actions can be grouped to form chords, or programs. We use the term ‘chord’ to refer to a complex set of simultaneous actions, or an action that requires the coordination of multiple large sets of muscles. In musical performance, say on the piano or guitar, a particular pattern or posture of the hand allows the performer to play several notes at the same time (called a chord). For skilled performers, we hypothesize that chords are processed as units, rather than as individual actions. This concept can be extended by adding a time dimension. The notes of a chord can be played in succession rather than simultaneously, producing what is called an arpeggio. Although the notes of an arpeggio are played in succession, it is unlikely that individual notes are processed in isolation. Rather, we suppose that these actions are controlled by a motor program that can be called up and executed as one unit (much like a chunk on the input side), thereby reducing the demands on the limited processing capacity of central mechanisms. In the experiments of Salthouse (1984, 1985), we speculate that skilled performance beyond that achieved by pipelining three items may have been achieved by capitalizing on chunking and arpeggiating (or executing a chord as an arpeggio). Given that most individuals learning to type have already learned to read to a reasonable level of skill, it is likely that what distinguishes highly skilled typists from less skilled ones resides in the ability to group on the motor side. In summary, chunking allows us to overcome capacity limitations on consolidation and storage. We can also learn to associate responses to chunks, thereby reducing any capacity limitations imposed by S–R translation and response code consolidation. Chording and arpeggiating reduce the processing demands on the output side. Finally, pipelining increases the throughput of the entire system by overlapping processing that can be carried out in parallel (such as perceptual processing of one object while central processing of another one is under way). There is good evidence that practice also reduces the duration of component stages, producing additional ef1ciency gains. After considerable practice, highly skilled typists can produce long continuous streams of error-free keystrokes with average interkeystroke intervals that are less than 100 ms.
581
aapc28.fm Page 582 Wednesday, December 5, 2001 10:13 AM
582
Common mechanisms in perception and action
28.11 Is the perception–action system strictly feedforward? Much prior research has focused on the impact of perception on action. Some recent work has examined the reverse relationship, however: the impact of action on perception. Müsseler and Hommel (1997a,b) devised experiments in which stimuli could be presented just as subjects initiated a response. In one experiment, a left-pointing or a right-pointing arrow was presented at the beginning of each trial. The task was to prepare a left response (keypress made with a 1nger) if the arrow pointed left or to prepare a right response if the arrow pointed right. Before making this prepared response, however, subjects were required to press simultaneously the two response keys used to indicate left and right responses. We call this action a double keypress. Immediately after the double keypress, subjects executed the prepared response. The initial double keypress served to indicate that response selection had been completed and that the response to the initial arrow was prepared. It also provided a signal that Müsseler and Hommel used to anticipate when the prepared response would be executed. After a short delay following the double keypress, just as the prepared response was initiated, a second stimulus was presented brie2y and masked. The second stimulus was another arrow that pointed either left or right. The critical result was that accuracy of identi1cation or detection (in different experiments) of the second stimulus was impaired when it was presented during the execution of a compatible response, as compared with an incompatible response. For example, a left-pointing arrow was less likely to be detected than a right-pointing arrow if it was presented during the execution of a left response. Müsseler and Hommel suggest that perception of the second stimulus was impaired due to a momentary insensitivity to stimuli that overlap in features or meaning with the action that was executed. They call this temporary insensitivity ‘action-effect blindness.’ These authors argue for overlap between perception and action codes and that executing an action can temporarily ‘tie up’ a code that is required for perception. Müsseler and Wühr (this volume, Chapter 25) review several additional results in this area that support the notion that creating and maintaining an action plan can interact with perception. They suggest that action–perception interactions can be decomposed into content nonspeci1c and content speci1c components. The content nonspeci1c component would re2ect structural aspects of the information processing system such as capacity limitations in visual encoding suggested by performance de1cits caused by concurrent central processing (e.g. De Jong and Sweet 1994; Dell’Acqua and Jolicœur 2000a,b; Jolicœur 1998, 1999a,b,c; Jolicœur and Dell’Acqua 1998, 1999; Jolicœur, Dell’Acqua, and Crebolder 2001). The content speci1c component would re2ect patterns of results that depend on interactions involving the speci1c stimuli and responses involved in a particular situation (e.g. a perception de1cit for a left-pointing stimulus when a left-going response is about to be performed). Thus, the content speci1c component depends on the representations processed through the system and has the potential to reveal fundamental aspects of these representations, whereas the nonspeci1c component depends on structural aspects of the system and has the potential to reveal fundamental aspects of the cognitive architecture. This distinction between content speci1c vs. content nonspeci1c aspects of perception–action interactions is also proposed and supported by the very interesting demonstrations of Stoet and Hommel (this volume, Chapter 26). Stoet and Hommel provide evidence suggesting that the creation of an action plan can be delayed (by roughly 10–15 ms) if a spatial feature needed for that plan (i.e. left or right) has already been integrated into a representation currently held in short-term memory.
aapc28.fm Page 583 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
Stoet and Hommel propose a two-stage model in which features are initially activated, but not integrated into representations of percepts or actions, and later become integrated, or bound, into coherent representations. Once bound, a particular feature is less available for binding into other representations, leading to content speci1c performance de1cits. Prior to binding, however, the activation of the features makes them more available, leading to performance bene1ts (or priming). Both effects were observed in the experiments of Stoet and Hommel (this volume, Chapter 26). Interactions between response codes and perception might be mediated by connections linking response codes to perceptual codes. This is represented in the model shown in Fig. 28.1 by bidirectional direct links between response codes and perceptual codes. Researchers are in the process of working out exactly how these links might work to produce the observed interactions (see Müsseler and Wühr, this volume, Chapter 25; Stoet and Hommel, this volume, Chapter 26).
Acknowledgements We thank Rob Ward and Gordon Logan for constructive criticism of an earlier draft of this chapter, Bernhard Hommel and Derek Besner for useful discussions, and Marg Ingleton for technical support. Work in this chapter was supported by a grant from the Natural Sciences and Engineering Research Council of Canada awarded to Pierre Jolicœur.
Note 1. Were it not for the fact that short-term consolidation has been used to name this process in several previous papers, we would have called it perceptual code consolidation (PCC), to be consistent with how we name the corresponding transfer process postulated to operate on response codes (response code consolidation).
References Arnell, K.M. and Jolicœur, P. (1999). The attentional blink across stimulus modalities: Evidence for central processing limitations. Journal of Experimental Psychology: Human Perception and Performance, 25, 630–648. Besner, D., Stolz, J.A., and Boutilier, C. (1997). The Stroop effect and the myth of automaticity. Psychonomic Bulletin and Review, 4, 221–225. Blake, R.R. and Fox, R. (1969). Visual form recognition threshold and the psychological refractory period. Perception and Psychophysics, 5, 46–48. Coltheart, M. (1980). Iconic memory and visible persistence. Perception and Psychophysics, 27, 183–228. De Jong, R. (1993). Multiple bottlenecks in overlapping task performance. Journal of Experimental Psychology: Human Perception and Performance, 19, 965–989. De Jong, R. and Sweet, J.B. (1994). Preparatory strategies in overlapping-task performance. Perception and Psychophysics, 55, 142–151. Dell’Acqua, R. and Grainger, J. (1998). Unconscious semantic priming from pictures. Cognition, 73, 1–15. Dell’Acqua, R. and Jolicœur, P. (2000). Visual encoding of patterns is subject to dual-task interference. Memory and Cognition, 28, 184–191. Donchin, E. (1981). Surprise! . . . Surprise? Psychophysiology, 18, 493–513. Eriksen, C.W. (1995). The 2ankers task and response competition: A useful tool for investigating a variety of cognitive problems. Visual Cognition, 2, 101–118. Eriksen, B.A. and Eriksen, C.W. (1974). Effects of noise letters upon the identi1cation of a target letter in a nonsearch task. Perception and Psychophysics, 16, 143–149.
583
aapc28.fm Page 584 Wednesday, December 5, 2001 10:13 AM
584
Common mechanisms in perception and action
Fagot, C. and Pashler, H. (1992). Making two responses to a single object: Implications for the central attentional bottleneck. Journal of Experimental Psychology: Human Perception and Performance, 18, 1058–1079. Giesbrecht, B.L. and Di Lollo, V. (1998). Beyond the attentional blink: Visual masking by object substitution. Journal of Experimental Psychology: Human Perception and Performance, 24, 1454–1466. Hershman, R.L. and Hillix, W.A. (1965). Data processing in typing: Typing rate as a function of kind of material and amount exposed. Human Factors, 7, 483–492. Hommel, B. (1993a). Inverting the Simon effect by intention: Determinants of direction and extent of effects of irrelevant spatial information. Psychological Research/Psychologische Forschung, 55, 270–279. Hommel, B. (1993b). The relationship between stimulus processing and response selection in the Simon task: Evidence for a temporal overlap. Psychological Research/Psychologische Forschung, 55, 280–290. Hommel, B. (1994a). Effects of irrelevant spatial S–R compatibility depend on stimulus complexity. Psychological Research/Psychologische Forschung, 56, 179–184. Hommel, B. (1994b). Spontaneous decay of response code activation. Psychological Research/Psychologische Forschung, 56, 261–268. Hommel, B. (1995a). S–R compatibility and the Simon effect: Toward an empirical clari1cation. Journal of Experimental Psychology: Human Perception and Performance, 21, 764–775. Hommel, B. (1995b). Con2ict versus misguided search as explanation of S–R correspondence effects. Acta Psychologica, 89, 37–51. Hommel, B. (1996). S–R compatibility effects without response uncertainty. Quarterly Journal of Experimental Psychology, 49, 546–571. Hommel, B. (1998). Automatic stimulus–response translation in dual-task performance. Journal of Experimental Psychology: Human Perception and Performance, 24, 1368–1384. Jolicœur, P. (1998). Modulation of the attentional blink by on-line response selection: Evidence from speeded and unspeeded Task1 decisions. Memory and Cognition, 26, 1014–1032. Jolicœur, P. (1999a). Restricted attentional capacity between sensory modalities. Psychonomic Bulletin and Review, 6, 87–92. Jolicœur, P. (1999b). Dual-task interference and visual encoding. Journal of Experimental Psychology: Human Perception and Performance, 25, 596–616. Jolicœur, P. (1999c). Concurrent response selection demands modulate the attentional blink. Journal of Experimental Psychology: Human Perception and Performance, 25, 1097–1113. Jolicœur, P. (1999d). Capacity demands of accessing short-term memory. Paper presented at the Ninth Annual Meeting of the Canadian Society for Brain, Behaviour, and Cognitive Science, June 18–19, Edmonton, Alberta, Canada. Jolicœur, P. and Dell’Acqua (1997). Short-term consolidation of random polygons causes dual-task slowing. Paper presented at the Annual Meeting of the Psychonomic Society, Philadelphia, Pennsylvania, USA, November 20–23. Jolicœur, P. and Dell’Acqua, R. (1998). The demonstration of short-term consolidation. Cognitive Psychology, 36, 138–202. Jolicœur, P. and Dell’Acqua, R. (1999). Attentional and structural constraints on visual encoding. Psychological Research Psychologische Forschung, 62, 154–164. Jolicœur, P. and Dell’Acqua, R. (2000). Selective in2uence of second target exposure duration and Task1 load effects in the attentional blink phenomenon. Psychonomic Bulletin and Review, in press. Jolicœur, P., Dell’Acqua, R., and Crebolder, J. (2000). Multitasking performance de1cits: Forging some links between the attentional blink and the psychological refractory period. In S. Monsell and J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII, pp. 309–330. Cambridge, MA: MIT Press, in press. Jolicœur, P., Dell’Acqua, R., and Crebolder, J. (2001). The attentional blink bottleneck. In K. Shapiro (Ed.), The limits of attention, pp. 82–99. Oxford University Press, in press. Klinger, M.R., Burton, P.C., and Pitts, G.S. (2000). Mechanisms of unconscious priming: I. Response competition, not spreading of activation, Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 441–455. Logan, G.D. and Schulkind, M.D. (2000). Parallel memory retrieval in dual-task situations: I. Semantic memory. Journal of Experimental Psychology: Human Perception and Performance, 26, 1072–1090. Luck, S.J. and Vogel, E.K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279–281.
aapc28.fm Page 585 Wednesday, December 5, 2001 10:13 AM
From perception to action: making the connection—a tutorial
McCann, R.S. and Johnston, J.C. (1992). Locus of the single-channel bottleneck in dual-task interference. Journal of Experimental Psychology: Human Perception and Performance, 18, 471–484. Maki, W.S., Frigen, K., and Paulson, K. (1997). Associative priming by targets and distractors during rapid serial visual presentation: Does word meaning survive the attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 23, 1014–1034. Meyer, D.E. and Kieras, D.E. (1997). A computational theory of executive cognitive processes and human multiple-task performance: Part 2. Accounts of psychological refractory-period phenomena. Psychological Review, 104, 749–791. Meyer, D.E. and Kieras, D.E. (1999). Précis to a practical uni1ed theory of cognition and action: Some lessons from EPIC computational models of human multiple-task performance. In D. Gopher and A. Koriat (Eds.), Attention and performance XVII. Cognitive regulation of performance: Interaction of theory and application, pp. 17–88. Cambridge, MA: MIT Press. Miller, G. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Murdock, Jr., B.B. (1961). The retention of individual items. Journal of Experimental Psychology, 62, 618–625. Müsseler, J. and Hommel, B. (1997a). Blindness to response-compatible stimuli. Journal of Experimental Psychology: Human Perception and Performance, 23, 861–872. Müsseler, J. and Hommel, B. (1997b). Detecting and identifying response-compatible stimuli. Psychonomic Bulletin and Review, 4, 125–129. Müsseler, J. and Wühr (2002). Response-evoked interference in visual encoding. This volume, Chapter 25. Neely, J.H. (1991). Semantic priming effects in visual word recognition: A selective review of current 1ndings and theories. In D. Besner and G. Humphreys (Eds.), Basic processes in reading: Visual word recognition, pp. 264–336. Hillsdale, NJ: Erlbaum. Neisser, U. (1967). Cognitive psychology. New York: Appleton-Century-Crofts. Pashler, H. (1989). Dissociations and dependencies between speed and accuracy: Evidence for a two-component theory of divided attention in simple tasks. Cognitive Psychology, 21, 469–514. Pashler, H. (1993). Dual-task interference and elementary mental mechanisms. In D.E. Meyer and S. Kornblum (Eds.), Attention and performance XIV: Synergies in experimental psychology, arti1cial intelligence, and cognitive neuroscience, pp. 245–264. Cambridge, MA: MIT Press. Pashler, H. (1994a). Dual-task interference in simple tasks: Data and theory. Psychological Bulletin, 116, 220–244. Pashler, H. (1994b). Overlapping mental operations in serial performance with preview. Quarterly Journal of Experimental Psychology, 47A, 161–191. Pashler, H. and Johnston, J.C. (1989). Chronometric evidence for central postponement in temporally overlapping tasks. Quarterly Journal of Experimental Psychology, 41A, 19–46. Peterson, L.R. and Peterson, M.R. (1959). Short-term retention of individual verbal items. Journal of Experimental Psychology, 58, 193–198. Phillips, W.A. (1974). On the distinction between sensory storage and short-term visual memory. Perception and Psychophysics, 16, 283–290. Pinker, S. (1984). Visual cognition: An introduction. Cognition, 18, 1–63. Potter, M.C. (1976). Short-term conceptual memory for pictures. Journal of Experimental Psychology: Human Learning and Memory, 2, 509–522. Potter, M.C. (1993). Very short-term conceptual memory. Memory and Cognition, 21, 156–161. Potter, M.C., Chun, M.M., Banks, B.S., and Muckenhoupt, M. (1998). Two attentional de1cits in serial target search: The visual attentional blink and an amodal task-switch de1cit. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 979–992. Raymond, J.E., Shapiro, K.L., and Arnell, K.M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849–860. Riggio, L., Gawryszewski, L., and Umiltà, C. (1986). What is crossed in crossed-hand effects? Acta Psychologica, 62, 89–100. Rogers, R.D. and Monsell, S. (1995). Costs of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124, 207–231. Ross, N.E. and Jolicœur, P. (1999). Attentional blink for color. Journal of Experimental Psychology: Human Perception and Performance, 25, 1483–1494. Salthouse, T.A. (1984). Effects of age and skill in typing. Journal of Experiemental Psychology: General, 113, 345–371.
585
aapc28.fm Page 586 Wednesday, December 5, 2001 10:13 AM
586
Common mechanisms in perception and action
Salthouse, T.A. (1985). Anticipatory processing in transcription typing. Journal of Applied Psychology, 70, 264–271. Simon, J. and Rudell, A. (1967). Auditory S–R compatibility: The effect of an irrelevant cue on information processing. Journal of Applied Psychology, 51, 300–304. Smith, M.C., Theodor, L., and Franklin, P.E. (1983). The relationship between contextual facilitation and depth of processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 697–712. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74, 1–29. Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652–654. Stoet, G. and Hommel, B. (2002). Interaction between feature binding in perception and action. This volume, Chapter 26. Stolz, J., Jolicœur, P., and Li, E. (2000). Semantic priming does not require central capacity. Unpublished manuscript, Department of Psychology, University of Waterloo, Canada. Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–661. Tombu, M. and Jolicœur, P. (2000). Dual-task slowing is not strategic: Evidence from constrained and unconstrained response order. Manuscript submitted for publication, Department of Psychology, University of Waterloo, Ontario, Canada. Treisman, A.M. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. Umiltà, C. and Nicoletti, R. (1985). Attention and coding effects. In M.I. Posner and O.S. Marin (Eds.), Attention and performance XI, pp. 457–474. Hillsdale, NJ: Erlbaum. Van Selst, M. and Jolicœur, P. (1997). Decision and response in dual-task interference. Cognitive Psychology, 33, 266–307. Van Selst, M., Ruthruff, E., and Johnston, J.C. (1999). Can practice eliminate the psychological refractory period effect? Journal of Experimental Psychology: Human Perception and Performance, 25, 1268–1283. Vogel, E.K., Luck, S.J., and Shapiro, K.L. (1998). Electrophysiological evidence for a postperceptual locus of suppression during the attentional blink. Journal of Experimental Psychology: Human Perception and Performance, 24, 1656–1674. Wallace, R. (1971). S–R compatibility and the idea of a response code. Journal of Experimental Psychology, 88, 354–360. Ward, R., Duncan, J., and Shapiro, K. (1996). The slow time-course of visual attention. Cognitive Psychology, 30, 79–109. Wolfe, J.M. (1994). Guided search 2.0: A revised model of visual search. Psychonomic Bulletin and Review, 1, 202–238. Wolfe, J.M., Cave, K.R., and Franzel, S.L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419–433. Zeki, S.M. (1993). A vision of the brain. Oxford, England: Blackwell Scienti1c Publications.
aapc29.fm Page 587 Wednesday, December 5, 2001 10:14 AM
29 The dimensional-action system: a distinct visual system Asher Cohen and Uri Feintuch Abstract. We propose that in the visual system there is a specialized and distinct subsystem dedicated to quick linkages between perception and action, which we term the dimensional-action (DA) system. This system consists of several visual dimensional modules (e.g. color, shape). Each dimensional module has its own perceptual and (some) response processes, which enable quick connections between perception and action within each module. Visual attention modulates which of the computed perception–action linkages will be executed. In the 1rst part of the paper we review evidence that support the two major components of the DA model. We review evidence for the existence of separable response selection mechanisms for the various visual dimensions. We then describe recent evidence supporting the role of spatial attention in the mediation of response execution. In the second part of the paper we claim that the DA system is different from the visual object recognition system. In particular, we suggest that integration of basic visual properties is fundamentally different in the two systems. Integration in the visual object recognition system is designed for identi1cation of objects, which are permanently represented within the system. In contrast, integration in the DA system (often called feature integration) is an extension of the dimension–action linkage, is not represented permanently, and consequently has different properties. We describe recent 1ndings that demonstrate this distinction between the two systems, and present a novel experiment that further supports it.
Considerable evidence (e.g. Bridgeman, this volume, Chapter 5; Rossetti and Pisella, this volume, Chapter 4; see also Goodale and Milner 1992) suggests that there may be two distinct visual systems, one concerned with action (the ‘how’ or the ‘where’ system), and one concerned with conscious identi1cation (the ‘what’ system). Performance of tasks requiring arbitrary responses is naturally part of the ‘what’ system because conscious identi1cation of the stimuli appears to be a prerequisite for the execution of arbitrary responses. The ‘what’ system, then, is concerned with identi1cation of stimuli as well as execution of (some) actions. It is generally assumed, however, that this system is unitary, and performs both identi1cation and action processes. In this paper we focus on the ‘what’ system, and on the distinction within this system between identi1cation processes and action processes. Our main hypothesis is that, contrary to the common assumption, the cognitive ‘what’ system consists of two separable subsystems. One subsystem is concerned with object identi1cation. The second subsystem is designed for quick perception–action connections and is termed here ‘the dimensional-action’ (DA) system.
29.1 The traditional view Essentially all information processing models of human performance, from the classical (e.g. Sternberg 1969) to the contemporary (e.g. Kornblum and Stevens, this volume, Chapter 2), have assumed that human behavior is performed by distinct (although possibly overlapping and continuous) processing stages. These stages include perception, where distal stimuli are identi1ed; response
aapc29.fm Page 588 Wednesday, December 5, 2001 10:14 AM
588
Common mechanisms in perception and action
selection, where decisions are made concerning the mapping of the identi1ed stimuli onto responses; and response execution, where the selected responses are carried out. Two additional widely held assumptions are embedded in this view. First, the perception stage has two roles. Stimuli are identi1ed and recognized at this stage. In addition, it is also the 1rst stage in the series of stages that lead to action. All existing models assume, usually implicitly, that these two roles are performed by the same system (or stage). Second, there is a wide agreement that perceptual processes are distributed and segregated into dimensional modules such as color, orientation, motion and the like (e.g. Cohen 1993; Treisman and Gormican 1988; Wolfe 1994). In contrast, it is generally held that subsequent response selection processes are unitary (e.g. Kornblum, Hasbroucq, and Osman 1990; Pashler 1992). That is, all types of perceptual output are assumed to cascade to a single response selection mechanism, where they are mapped to the appropriate responses. Indeed, some researchers have claimed that dual-task interference is structural and is caused, at least in part, by the existence of a single response selection mechanism (e.g. Pashler 1992). In this paper we challenge these two assumptions. We propose that there are two separable subsystems within the ‘what’ system. One subsystem is concerned with object recognition and operates more or less according to the traditional view. Our main claim is that there is a second subsystem, the dimensional-action (DA) system, which is designed for rapid responses to relatively simple visual stimuli. Speci1cally, we propose that visual dimensional modules such as color and shape are not just perceptual, as typically assumed. Rather, each dimensional module is also endowed with its own response selection mechanism. This system of visual dimensions enables people to execute actions quickly without elaborate stimulus processing. The remainder of the paper is divided into two parts. In the 1rst part we review evidence for the claim that visual dimensions have separate response selection mechanisms, and describe evidence for the role of spatial attention in mediating the output for the various dimensions. We present our DA model, originally suggested by Cohen and Shoup (1997), which incorporates these ideas. In the second part we claim that the DA system is fundamentally different from the object recognition system. The object recognition system is designed to cope with identi1cation of as many objects as possible. One of its roles is to add representations of new objects to its network of representations, and be able to identify these objects at later encounters. In contrast, the DA system is oriented toward action. It has a limited and 1xed number of representations, and responses are based on transient activations of these representations. We review recent evidence that support this distinction (Cohen and Shoup 2000), and describe a new experiment that further supports it.
29.2 Components of the dimensional-action system 29.2.1 Dimensional modules: beyond perception Anatomical (e.g. Livingstone and Hubel 1988), physiological (e.g. DeYoe and Van Essen 1988), and behavioral studies (e.g. Treisman 1986) have demonstrated that different visual areas are specialized for processing particular visual dimensions such as color, orientation, and motion. While there is wide agreement that perceptual processes are segregated into visual dimensions, it is commonly assumed that post-perceptual processes are not segregated into different modules (e.g. Treisman and Gormican 1988; Wolfe 1994).
aapc29.fm Page 589 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
Fig. 29.1 Stimuli used by Cohen and Shoup (1997). Findings from our lab (e.g. Cohen and Shoup 1997; Cohen and Magen 1999) have challenged this view. Cohen and Shoup (1997) used an interference paradigm known as the 2anker task. Subjects in this paradigm make a speeded response, identifying an object presented at the center of a display while ignoring peripheral distractors (e.g. Eriksen and Eriksen 1974; Miller 1991). Response times to the target are facilitated when the distractors are associated with the same response and hindered when the distractors are associated with the alternative response, a result that is generally attributed to processes concerned with post-perceptual stages of performance (Coles et al. 1985; Eriksen and Eriksen 1979). Cohen and Shoup used a cross-dimensional version of the 2anker task. Subjects made one response to the appearance of either a red vertical line or a blue right diagonal line, and a second response to the appearance of a green vertical line or a blue left diagonal line (see Fig. 29.1). Note that the response to the red and green vertical lines can only be solely on the basis of color, and the response to the blue left and right diagonal lines can only be made on the basis of orientation. Thus, in effect subjects were required to make one response on the basis of either one color or one orientation, and another response on the basis of either a second color or a second orientation. Cohen and Shoup found that the interaction between the target and distractors, typically found in this paradigm, is only observed when all stimuli are associated with responses on the basis of the same visual dimension (e.g. the response to all stimuli is based on their orientation). When the response to the target is based on one dimension (e.g. color) and the response associated with the distractors is based on a different dimension (e.g. orientation), no such interference is observed. This result held even after extensive practice in the task. Control experiments showed that the lack of congruency effect was speci1cally due to the response association of the target and 2ankers with different visual dimensions (see Cohen and Shoup 1997). These results motivated the dimensional-action model (Fig. 29.2). Like other models (e.g. Wolfe 1994), the DA model assumes that there exist dimensional modules with separate perceptual processes. Unlike previous theories, however, the model assumes that these modules also segregate response selection processes. According to our model, there exist multiple response selection units, one per visual dimension. Given that there are multiple response selection processes, there is a need for a control process to ensure that just one of these decisions will reach the response execution system where movements are generated. The model assumes that this control is performed by a spatial attention mechanism that enhances activation of information at the attended location. Thus, multiple response decisions can be made in parallel1 concerning stimuli at various locations in the scene, but only one of these decisions (done in the position where spatial attention is focused) is transferred to the central response execution system (cf. Shiffrin and Schneider 1977).
589
aapc29.fm Page 590 Wednesday, December 5, 2001 10:14 AM
590
Common mechanisms in perception and action
Fig. 29.2 Schematic diagram of the original dimensional-action (DA) model. The model is con1gured for a task in which the features green and left are associated with Response decision 1 (R1), and red and right with Response decision 2 (R2). Attentional and input activations are multiplied. (From Cohen and Shoup 1997, p. 167.) This model readily accounts for the results obtained by Cohen and Shoup (1997). When a color target (e.g. a red vertical line) is 2anked by color stimuli (e.g. either red or green vertical lines), both target and 2ankers activate the response decision unit (either R1 or R2) within the color module. As a consequence no competition arises when the target and 2ankers activate the same response decision unit (e.g. both target and 2ankers are red vertical lines). When the target and 2ankers are associated with different responses (e.g. a red line 2anked by green lines), they activate competing response selection units (R1 and R2) and a competition arises between the two units. The resolution of this competition takes time, hence the longer RT in this condition. A similar logic holds when an orientation target is 2anked by orientation stimuli, except that the competition between the response selection units arises at the orientation module. When a color target is 2anked by orientation stimuli or vice versa, however, the situation is different. The target (e.g. red vertical line) activates the R1 unit within the color module, whereas the 2ankers (e.g. orientation stimuli) activate either R1 or R2 units within the orientation module. As can be seen in the model (Fig. 29.2), the response selection units of the color and orientation modules do not interact. Moreover, due to the mediating role of attention mentioned in the previous paragraph (an issue to which we return shortly), activation by the 2ankers
aapc29.fm Page 591 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
cannot reach higher-level processes. Consequently, there is no 2anker effect when a target associated with the response on the basis of one dimension is 2anked by the alternative-dimension stimuli. Priming studies (Cohen and Magen 1999; Found and Müller 1996) are compatible with the claim that the various visual dimensions have separable response selection mechanisms. These studies used a cross-dimensional task in which responses to some targets were de1ned by one dimension (e.g. color), and responses to other targets were de1ned by a different dimension (e.g. orientation). For example, Cohen and Magen (1999, Exp. 4) used a 4 × 2 design in which subjects made one response to two targets, and a second response to two other targets. One of the targets from each response set was de1ned by color and the other was de1ned by orientation. Cohen and Magen found that RT for a target at trial n was faster when the target at the preceding trial was de1ned by the same dimension. For example, RT for a color target was faster when the target at the preceding trial was also de1ned by its color. As argued by Cohen and Magen, these results can be best explained by the assumption that response on the basis of a particular visual dimension activates the response selection mechanism of that dimension, which facilitates subsequent responses to targets from the same dimension. Visual search studies are also in accord with the DA model. In a typical visual search paradigm, participants search for a pre-de1ned target among a variable number of distractors. It is well-established that reaction time (RT) for a target de1ned by a single feature (e.g. color, orientation) is not affected by the number of distractors (e.g. Cohen and Ivry 1991; Treisman and Gelade 1980). This 1nding holds when the actual target is selected on each trial out of two and even three potential targets (e.g. the target can be one of three different colors). Treisman (1988) and Müller, Heller, and Ziegler (1995) compared intra-dimensional search, where all targets are de1ned by a single dimension (e.g. all targets differ from the distractors in their color), to cross-dimensional search, where each target differs from the distractors along a different dimension. They found that, while search in all conditions was unaffected by the number of distractors, RT for the intra-dimensional condition was faster than that for the cross-dimensional condition. Cohen and Magen (1999) have shown that this 1nding crucially depends on the stimulus-toresponse mapping used in the task. In the task used by Treisman (1988) and Müller et al. (1995), subjects made one response when either one of the targets was present, and a second response when no target was present. Cohen and Magen designed a task in which each target required a different response. Speci1cally, in their intra-dimensional task, there could be two targets from the same dimension (e.g. both targets de1ned by their color), each requiring a different response. In the cross-dimensional task the two targets were de1ned by different dimensions (e.g. one de1ned by color, and the other by orientation), and again each required a different response. Cohen and Magen found that in their design the results were reversed, and RTs for the cross-dimensional task were shorter. These 1ndings are best explained by the DA model’s assumption that visual dimensions have separable response selection mechanisms. Speci1cally, subjects in Cohen and Magen’s task were faster in the cross-dimensional task because each target activated a separate response selection unit and subjects could make a response on the basis of the activated response selection unit. This possibility did not exist in the intra-dimensional task where both targets activated the same response selection unit (see Cohen and Magen 1999, for details).
29.2.2 The role of spatial attention As mentioned earlier, spatial attention plays a major role in the DA model. When targets de1ned by different dimensions are present simultaneously, the response for each target will be selected separately
591
aapc29.fm Page 592 Wednesday, December 5, 2001 10:14 AM
592
Common mechanisms in perception and action
and independently. Spatial attention serves as a gating mechanism and determines which of the selected responses from the various dimensions will be transferred to the executive functions. Speci1cally, the selected response of a given target will be transferred to the executive functions only when spatial attention is focused on it. The model makes clear predictions in a situation where targets de1ned by different dimensions are simultaneously present: when the targets are positioned at two different locations, only one selected response (from the target on which spatial attention is focused) will be transferred to the executive functions; when the targets are positioned in a single location, both responses will be transferred to the executive functions. Feintuch and Cohen (in preparation), using the redundant gain paradigm, have recently provided direct evidence for these predictions. In most redundancy gain studies the target can appear in one of two possible locations, and appears in both on redundant trials. Many studies have demonstrated that subjects are faster when redundant targets are present (see Miller 1982, for a review), a phenomenon called redundancy gain. Two general classes of models have been suggested to account for this gain. One class, termed separate activation models, assumes that the presentation of each target leads to a buildup of activation in its representation. When two targets are present, the response is determined by the fastest of the two activations that reach threshold. The redundancy gain is obtained because the probability that either one of the two activations will reach threshold at time t is larger than the probability that the activation of a single target will reach threshold at time t. Note that in this class of models, there is no cross-talk between the two representations. The other class of models, called coactivation models, assumes that the response can be determined by the summed activation of the two targets. Redundancy effects are obtained because both targets contribute activations to a single threshold. In this class of models, then, the information from the two representations interacts to form a response. Miller (1982) has proposed a test to distinguish between the two classes of models. He showed that all separate activation models must satisfy the following inequality: P(RT < t/T1 and T2) < = P(RT < t/T1) + P(RT < t/T2)
(1)
where t is time, and T1 and T2 are the two targets. By contrast, coactivation models need not satisfy this inequality. As stated above, the DA model predicts that when two targets whose responses are de1ned by different dimensions appear simultaneously, performance depends on the locations of the targets. In the different-locations condition where the targets are positioned in two different locations (and consequently, spatial attention may be focused on just one of them), the response to both targets will be selected in their respective dimensional modules. However, just one of these responses, associated with the target on which spatial attention is focused, will be transferred to the executive functions. By contrast, in the same-location condition where both targets appear in the same location, the response to both targets will be selected, and both responses will be transferred to the executive functions. Put differently, in the different-locations condition, separate representations determine the response. In the same-location condition, however, the response will be coactivated by both targets. Following the analysis above, we can expect a violation of Inequality 1 in the same-location condition. No violation of Inequality 1 should be observed in the different-locations condition. Feintuch and Cohen examined these predictions in a go/no go task. Subjects made a response if they saw a blue vertical line, a red right diagonal line, or a blue right diagonal line. They withheld a response if they saw a green vertical line, a red left diagonal line, or a green left diagonal line. The
aapc29.fm Page 593 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
Fig. 29.3 Upper part: Comparison of the CDFs of separate-location-redundant-target and the summed CDF of the individual targets. Lower part: Comparison of the CDFs of same-locationredundant-target and the summed CDF of the individual targets. blue vertical line can be considered a color target because the go-response can only be based on its color; its orientation is shared with the distractors. For similar reasons, the red right diagonal line can be considered to be an orientation target. The blue right diagonal line is a redundant target because the response to this stimulus can be based on either color or orientation. Each trial in this task
593
aapc29.fm Page 594 Wednesday, December 5, 2001 10:14 AM
594
Common mechanisms in perception and action
started with a central 1xation point followed by the appearance of two stimuli, one above and one below the 1xation point. There were three basic go conditions. In the single target condition, a single go target, either the color or the orientation target, appeared in one of the two locations, and a neutral stimulus appeared in the other location. In the different-locations redundant target condition, a color target appeared at one of the locations and an orientation target appeared in the other location. In the same-location redundant target condition, the redundant target appeared in one of the locations, and neutral stimulus appeared in the other location. Mean RT in the three conditions was 389, 377, and 359 for the single target, different-locationsredundant-target, and same-location-redundant-target conditions, respectively. Statistical analyses showed that mean RT for the different-locations-redundant-target condition was signi1cantly faster than that for the single target condition, t(14) = 2.45, p < 0.05. Likewise, mean RT for the same-location-redundant-target condition was signi1cantly faster than that for the single trial condition, t(14) = 9.67, p < 0.05. The proportion of errors (miss) in this task will not be considered here because it was very small and generally showed the same pattern as the RT results. These results, as in many other studies, showed that redundancy gain is obtained when two targets are present simultaneously. The more interesting analysis concerns possible violations of Inequality 1. To examine such violations, we followed the procedure suggested by Miller (1982) and used by many other researchers as well (e.g. Mordkoff and Yantis 1991, 1993; see Mordkoff and Yantis 1991, for details). First, we determined for each subject the cumulative probability density function (CDF) of RT for the two single target conditions (i.e. for the color and for the orientation targets), summed these two CDFs, and calculated the average of this function across the subjects. We then determined for each subject the CDF for each of the two redundant target conditions, and averaged these two functions as well. Finally, we compared the resultant CDFs of the redundant targets to that of the summed CDF for the single target conditions. Shorter RTs for the redundant targets CDF would constitute a violation of Inequality 1. The results of this analysis are shown in Fig. 29.3. The upper part of Fig. 29.3 presents the comparison of the separate-location-redundant-target CDF with that of the summed CDF for the single target conditions. As can be seen in the 1gure, there is no sign of violation of Inequality 1. The lower part of Fig. 29.3 depicts the comparison of the same-location-redundant-target CDF with that of the summed CDF for the single target conditions. As predicted by the DA model, Inequality 1 was violated in quantiles 10, 15, 20, 25, 30, and 35 (p < 0.05 in all these comparisons). It should be mentioned that results of a similar study by Mordkoff and Yantis (1993) appear to contradict our 1ndings. They used a go/no go task with letter and color targets. Like us they found a violation of Inequality 1 when the letter and color appeared in the same location. Unlike us, however, they obtained a violation of Inequality 1 when the color and letter were presented at different locations. One possible explanation for this discrepancy is that Mordkoff and Yantis used letters as shape stimuli and we used lines. However, we redid our experiments using letters instead of lines as the shape stimuli and replicated our results (Feintuch and Cohen in preparation). At the moment we do not know the exact reason for the discrepant results.
29.2.3 The DA model—further issues There are two types of issues that, due to space limitations, cannot be fully discussed. One issue concerns how the model explains several well-known effects that may appear to contradict some of its central assumptions. Second, a number of important phenomena are not addressed at all by the model. We discuss these issues brie2y.
aapc29.fm Page 595 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
Two well-known interference effects may appear to contradict the DA model. One is the Stroop effect (e.g. MacLeod 1991; Stroop 1935), in which words denoting colors interfere with the ink color in which they are printed. The very interference of the word content, presumably recovered by some sort of visual shape analysis, with the ink color appears to contradict the DA model. The DA model states that interference between color and shape does not occur at the response selection stage, where the perceived stimuli are mapped to their appropriate responses. The typical explanation of the Stroop effect (see MacLeod 1991) is that the interference takes place at the response selection stage, and this appears to contradict the DA model. A series of unpublished studies in our lab, however, suggest that the Stroop effect is caused by several factors. These factors in most cases do not include interference at the response selection stage. Instead, it appears that in the typical Stroop paradigm (in which the response is the utterance of the color ink), the interference is primarily due to competition between the details of the motoric response (e.g. having to vocally say ‘blue’) and the printed word. This interference occurs at a stage subsequent to response selection (which is part of the Central Processes in Fig. 29.2). Thus, there is no actual contradiction between the Stroop effect and the DA model. A second well-known interference phenomenon that may appear to contradict the DA model is the Simon effect (e.g. Craft and Simon 1970), which occurs when subjects perform non-spatial tasks (e.g. a color task) and are required to respond with spatially laid response keys. Subjects respond faster when the location of the stimulus, which is irrelevant for the task, is in correspondence with the location of the required response. While the Simon effect indeed taps some sort of interference between perception and action (see e.g. Proctor and Vu, this volume, Chapter 22), it does not involve visual dimensional modules, and is thus not directly relevant for the DA model. The DA model is concerned with the degree of cross-talk between visual dimensions and is not focused on other types of perception–action interference. Finally, there are important elements in the processes linking perception and action that are entirely outside the scope of the DA model. For example, subjects can con1gure immediate associations between stimuli (e.g. colors, shapes) and responses (e.g. computer keys) on the basis of instructions. The DA model claims that the associations between colors and responses, and the associations between shapes and responses are con1gured in different modules. It does not specify, however, how those associations are con1gured. Presumably, some executive functions can determine the speci1c links between stimuli and responses within each module, but the details are beyond the scope of the model. Likewise, the DA model in its present stage does not specify how activation of representations within the various modules persists over time. Thus, phenomena such as negative priming, which depends on activation over time, are presently beyond the scope of the model.
29.3 The DA and the object recognition systems The evidence reviewed so far suggests that there exist multiple response selection mechanisms, one per dimension, and that spatial attention mediates which of the selected responses is transferred to the executive functions. We now further propose that this architecture, embodied by the DA system, is fundamentally different from the more commonly discussed visual system for object recognition. Objects can be considered as the outcome of binding processes between various shape parts such as lines and curves.2 We suggest that object recognition and DA should be considered as distinct visual subsystems within the ‘what’ system. Our claim that the DA system is not designed for object recognition departs from previous theories of feature binding (e.g. Treisman and Sato 1990; Wolfe 1994). The notion that visual dimensions are
595
aapc29.fm Page 596 Wednesday, December 5, 2001 10:14 AM
596
Common mechanisms in perception and action
the primitive blocks of objects has been a major motive for the endeavor to identify visual dimensions. Visual objects are typically complex, and there is a need for a rich set of dimensions to enable object recognition. From our theoretical perspective, this motive no longer exists. A priori, there is no importance to the number of dimensions that the DA system can use. Indeed, we suspect that there may be fewer dimensions than typically claimed. Our hypothesis has several steps. We claim that computational and functional analyses suggest that cross-dimensional conjunction is different from conjunctions of shape elements. We then propose, following Cohen and Shoup (2000), that the DA model can be extended for performance of tasks requiring conjunctions of features from different dimensions (e.g. color and shape, henceforth cross-dimensional conjunction). We suggest that cross-dimensional conjunction is performed by the DA subsystem, whereas binding of shape elements is done by the object recognition system. Indeed, as elaborated below, the two systems have different processing styles. We then describe previous results with the 2anker paradigm that support this distinction (Cohen and Shoup 2000). Finally, we present a new experiment, with the illusory conjunctions paradigm, that further supports the claim that the DA and object recognition systems are distinct.
29.3.1 Computational and functional considerations Bindings can be done between features from different dimensions such as color and shape. It can also be done between shape elements such as line orientations and curves. One major difference between these two forms of binding is that cross-dimensional binding is typically between features that appear at the very same location, whereas individual shape elements by their nature are positioned at different locations. As a result, there is a basic computational difference between crossdimensional and within-shape bindings. For cross-dimensional binding it is generally suf1cient to identify the features that are present at the stimulus’ location. For shape binding, however, identi1cation of the relevant features is not suf1cient. The spatial relations among the relevant features are crucial as well. This property makes the task of identifying conjunctions of shapes more dif1cult than that of cross-dimensional conjunctions (e.g. see Hummel and Biederman 1992, for extensive discussion). In addition, perhaps because there can be many combinations of spatial relations among shape elements, these conjunctions are instrumental in determining the identity of objects. Indeed, some objects (line drawings, letters) are composed entirely of conjunctions of shape elements. Moreover, an important part of our cognitive activity is to categorize and recognize familiar shapes, as well as acquire and store representations of new objects. To this end, there must exist processes that ‘unitize’ speci1c combinations of line orientations as in the case of many letters. Likewise, combinations of shape elements may lead to ‘emergent properties’. In contrast, we do not generally store particular conjunctions of orientations and color, nor are there known emergent properties of cross-dimensional conjunctions. These considerations suggest that there are both functional and computational differences between cross-dimensional and within-shape bindings. Our analysis also indicates that the computations involved in cross-dimensional bindings are much simpler than those involved in withinshape bindings. Cross-dimensional binding does not require analysis of spatial relations; does not need processes for uni1cation of particular conjunctions; nor does it require mechanisms for storage of novel representations. We claim that a simple extension of the DA model may be suf1cient to account for cross-dimensional bindings. The DA model, however, is not designed for object recognition.
aapc29.fm Page 597 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
Interestingly, perhaps for similar reasons, contemporary theories of visual object recognition (e.g. Hummel and Biederman 1992; Tarr and Bülthoff 1995) do not make any effort to deal with crossdimensional conjunctions.
29.3.2 The extended DA model The DA system is designed primarily to enable people to perform simple tasks (which require responses to single features) quickly and ef1ciently. As suggested by Cohen and Shoup (2000), this architecture of multiple visual dimensions with separable response selection mechanisms, can be modi1ed for performance of more complex tasks requiring responses to cross-dimensional conjunctions of features. The new DA model (Fig. 29.4) proposed by Cohen and Shoup (2000) is actually a simple extension of the earlier DA model (Fig. 29.2). The main additional assumption is that there exists a conjunction map whose function is to represent conjunctions of features from different dimensions. Each location in this map is connected to corresponding locations in the feature maps of each dimensional module. The conjunction map is speci1cally dedicated for representations of
Fig. 29.4
Schematic representation of the extended DA model. (From Cohen and Shoup 2000, p. 94.)
597
aapc29.fm Page 598 Wednesday, December 5, 2001 10:14 AM
598
Common mechanisms in perception and action
cross-dimensional conjunctive targets.3 The model assumes that the nature of the conjunction representation at any point in time is determined by the features that are allowed access to the conjunction map at that time. For example, a representation of a red horizontal line is formed when the features ‘red’ from the color module and ‘horizontal’ from the orientation module both gain access to the conjunction map. Importantly, the model assumes that only one feature from each dimensional module can gain access to the conjunction map, and therefore only one conjunction representation can be active at a time. Moreover, the cross-dimensional conjunction representation, by its nature, is transient. That is, at any point in time, the conjunction representation is determined by the features projected to the conjunction map. When the projection changes (e.g. when a different color feature is projected to the map), the representation changes. The model further assumes that when subjects expect a single conjunction target they can ‘program’ or pre-con1gure the conjunction map in advance (e.g. for a red vertical target, allowing access only from red and vertical feature detectors to the conjunction map). In this case, subjects may be able to determine the presence of the target by examining the amount of activity in the conjunction map unit. However, when more than one conjunctive target is possible, subjects in most situations will not pre-con1gure the conjunction map in advance because appearance of alternative targets would require cleaning up the conjunction map from its pre-con1gured representation and creating a new representation. An important component of the model is that selecting the appropriate features for access to the conjunction map is similar to the process of response selection for single feature targets (Fig. 29.2). In feature tasks, different features are mapped to different responses (e.g. green is mapped to Response 1 and red to Response 2). Response selection involves mapping the target’s perceived color (e.g. red) to its correct response (e.g. Response 2). Thus, if both features (red and green) are present, there will be a competition between the two responses. The conjunction model views the formation of conjunction representations as essentially the same process, except that it occurs simultaneously in more than one dimension; and that instead of direct associations to response codes, the feature activations are routed to the conjunction map. These assumptions have interesting implications. First, if subjects expect a single conjunctive target (as in most visual search studies with conjunctive targets), they can be highly ef1cient in identifying it even in the presence of other distractors. They pre-con1gure the conjunction map in advance (e.g. for a red vertical target, allowing access only from red and vertical feature detectors to the conjunction map), and guide their search on the basis of activity in the conjunction map (cf. Wolfe 1994). Visual search 1ndings over the last decade are in accord with this property (e.g. Wolfe 1994). Second, when more than one conjunctive target is possible, subjects will be relatively slow and inef1cient in this task (cf. Treisman and Sato 1990). The reason is that within each of the relevant dimensions (e.g. color and orientation) there will be a separate competition among the features for access to the conjunction map. The competition arises at the level of the individual dimensional modules because of the strong built-in constraint that only one cross-dimensional conjunction representation is possible at any one time. As reviewed by Cohen and Shoup (2000), the model is compatible with all the major 1ndings in the literature.
29.3.3 The study by Cohen and Shoup (2000) Cohen and Shoup (2000) have also examined predictions speci1c to the model. Many studies have investigated the binding problem (see Treisman 1996, for review). Moreover, with some exceptions (Wolfe et al. 1990), cross-dimensional and within-shape bindings were considered as a single problem,
aapc29.fm Page 599 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
resolved by a single mechanism. Cohen and Shoup (2000) tested the possibility that cross-dimensional and within-shape conjunctions are done by different mechanisms. Their 1rst study examined cross-dimensional binding. As described earlier, the DA model assumes that there is a single, transient cross-dimensional conjunction representation. As a result, when more than one conjunctive target is possible, observers will be relatively slow and inef1cient in this task. The reason is that within each of the relevant dimensions (e.g. color and orientation) there will be a separate competition among the features for access to the conjunction map. The competition arises at the level of the individual dimensional modules because of the strong built-in constraint mentioned earlier that only one cross-dimensional conjunction representation is possible at any one time. This study used a 2anker paradigm in which subjects are required to make one response to two conjunctive targets and a second response to two other conjunctive targets (see Fig. 29.5, upper part). Note that the two conjunctive targets that belong to the same response set always differ in both color and orientation. In contrast, the two targets that belong to two alternative responses always differ in only one feature. Compare a congruent condition situation where a target (e.g. a red right diagonal line) is 2anked by the other member from the same response set (e.g. a green left diagonal line) with an incongruent condition (e.g. a red right diagonal line 2anked by a red left diagonal line). The model predicts that competition arises exclusively at the level of the individual dimensions. As a result, more competition should arise in the congruent condition (because the target and 2ankers differ in both dimensions) than at the incongruent condition (because the target and 2ankers
Fig. 29.5 Stimuli used by Cohen and Shoup (2000) in their experiments for the cross-dimensional task (upper part) and for the within-shape task (lower part).
599
aapc29.fm Page 600 Wednesday, December 5, 2001 10:14 AM
600
Common mechanisms in perception and action
differ in just one dimension). Thus, the model predicts that RTs for the congruent condition will be slower than for the incongruent condition. This prediction was supported by the results, and this was true even after extensive practice in the task (1ve sessions). A recent study by Lavie (1997) with cross-dimensional conjunction targets converged on our 1ndings as well. The predictions for within-shape conjunctions are different. Obviously, we have multiple representations of line orientation conjunctions (e.g. letters). Thus, competition for line-orientation conjunction may arise at the conjunction (or object) level. This possibility was tested and supported in another experiment (Fig. 29.5, lower part). The task is essentially the same as in the cross-dimensional task. The two targets that belong to the same response set differ in both lines whereas stimuli from different response sets differ in just one line (compare upper and lower panels in Fig. 29.5). Yet, Cohen and Shoup predicted that in this case, as in typical 2anker studies, there will be a faster RT to the congruent condition. The results supported this prediction as well: subjects were faster in the congruent than in the incongruent condition after a modest amount of practice (one session). The practice was needed because the stimuli are novel and some practice is needed to establish new ‘unitized’ representations (see Driver and Baylis 1991, for converging evidence for line orientation conjunction targets).
29.3.4 The present experiment We now present a novel experiment that converges on the results obtained by Cohen and Shoup (2000). Our hypothesis is that conjunctions of shape elements may be unitized into a single representation. By contrast, cross-dimensional conjunctions are transient and are always built from the individual dimensions. The present experiment aims to provide further support for this hypothesis, and to add two novel 1ndings. First, we use the illusory conjunctions paradigm (e.g. Treisman and Schmidt 1982). Subjects in this paradigm view stimuli composed of conjunctions of features. The stimuli are typically presented for a brief period of time and are then masked. Previous studies (Ashby et al. 1996; Cohen and Ivry 1989; Cohen and Rafal 1991; Prinzmetal 1995; Treisman and Schmidt 1982) demonstrated that subjects make ‘illusory conjunctions’, erroneously combining features emanating from different objects. In a sense, then, this paradigm directly examines whether features from the same object are unitized, and thus is particularly useful for our purpose. Typically, illusory conjunctions involve color and orientation. However, illusory conjunctions have also been demonstrated among line orientations (e.g. Maddox et al. 1994; Prinzmetal 1981). The purpose of this study is to demonstrate that different mechanisms may underlie these two forms of illusory conjunctions. Second, Cohen and Shoup (2000) compared within-orientation and cross-dimensional conjunctions. In the present experiment we compare conjunction of orientation and curvature to crossdimensional conjunction. The within-orientation conjunction used by Cohen and Shoup (2000) may be unique because it involves conjunctions of features from the same dimension (see Wolfe et al. 1990). Orientation and curvature, however, are considered to be separate dimensions (e.g. Wolfe 1994), and at the same time are both required for object recognition. Thus, according to traditional feature integration theories, binding of orientation and curvature should be similar to binding of color and curvature. In contrast, we expect that conjunction of curvature and orientation can be unitized as a single object by the object recognition system, whereas conjunctions of color and curvature, performed by the DA model, is not unitized. This implies that illusory conjunctions between curvature and orientation will not be observed, whereas illusory conjunctions between color and curvature
aapc29.fm Page 601 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
will be observed. Note that in our conception, curvature and orientation belong to a single dimensional module that we call shape. As mentioned earlier, from our point of view, there could be fewer dimensions than typically envisioned because the DA system is not used for object recognition. We used a full-report paradigm (e.g. Ashby et al. 1996) for this experiment. On each trial subjects viewed a display that was presented brie2y and masked. The display consisted of two digits, one on the right and one on the left, and two additional stimuli positioned in between the digits. The nature of these additional stimuli was manipulated between subjects. In the ‘within-shape’ condition the stimuli consisted of conjunctions of curvature and orientation. In the ‘cross-dimensional’ condition the stimuli consisted of conjunctions of curvature and color. The primary task of the subjects, as in typical ‘illusory conjunctions’ paradigms (e.g. Treisman and Schmidt 1982), was to determine whether the two digits are identical. This task is designed to encourage subjects to spread their attention over the entire area of the display. The secondary task of the subjects was to report the two stimuli positioned between the digits. The main question of interest is whether illusory conjunctions will be formed between features of the two objects.
29.3.4.1 Method Subjects. Twenty-four volunteers from The Hebrew University participated either for payment or for course credit. Twelve of these subjects participated in the ‘cross-dimensional’ condition and the remaining 12 subjects served in the ‘within-shape’ condition. Apparatus and stimuli. Stimuli for the primary digit task consisted of the digits 3, 6, and 8, subtending 0.57 degrees of visual angle in height and 0.34 in width. The digits were presented on the horizontal meridian, one to the right and one to the left of the center of the screen. The center-tocenter distance between the two digits was approximately 2.8 degrees of visual angle. The two additional stimuli were positioned between the two digits. The center-to-center distance between any two adjacent stimuli was approximately 0.92 degrees. The stimuli for the within-shape condition are shown in the upper part of Fig. 29.6, and consisted of two sets, the target and distractor sets. The target set was constructed from one of two half circles and one of four line orientations. The half circles could either be U-shaped or an inverted-U shape. The four line orientations could be vertical, horizontal, right diagonal (oriented approximately 45 degrees) or left diagonal. Crossing the half-circles and line orientations gave rise to eight possible target conjunctions. The distractor set consisted of a circle with one of the four line orientations, leading to four possible distractors. The stimuli for the crossdimensional condition are shown in the lower part of Fig. 29.6. This target set was composed of the same two half-circles crossed with four colors—red, green, blue, and yellow. Again, the combination of two half-circles and four colors gave rise to eight possible conjunctions. The distractor set consisted of a circle with one of the four colors, leading to four possible circles. The diameter of the half circles was approximately 0.46 degrees, and the length of the line orientation was approximately 0.68 degrees. The luminance of all stimuli was approximately 35 cd/m2, against a dark background. The stimuli were presented on a NEC MultiSync 4E monitor controlled by a Pentium PC. Responses were made by pressing buttons on the computer’s keyboard. Tasks. The primary task was identical in the two conditions and consisted of a same/different task. On half of the trials the two digits were identical and on the remaining trials the two digits were different. The digits were selected randomly on each trial. Note that all three possible digits are curved. Subjects were told that the digit task was the most important task. The secondary task is similar in its principles to previous illusory conjunctions studies (e.g. Cohen and Ivry 1989). One of the stimuli was taken from the distractor set and the other stimulus
601
aapc29.fm Page 602 Wednesday, December 5, 2001 10:14 AM
602
Common mechanisms in perception and action
was taken from the target set with a single constraint. In the within-shape condition, the two line orientations on each trial (one appearing with the circle and one with a half circle) were always different. In the cross-dimensional condition, the two colors on each trial were always different. Subjects in the within-shape condition were told to ignore the circle and to report the other shape. Similarly, subjects in the cross-dimensional condition were told to ignore the circle and report the other shape. Subjects 1rst pressed one of two buttons to indicate whether the two digits were the same or different. Then they pushed one of eight buttons to indicate which of the eight possible targets appeared. Each of the eight buttons for the secondary task had a sticker with one of the targets drawn on it. Of principal interest is the type of mistake that subjects made in the secondary task. According to the terminology introduced by Treisman and Schmidt (1982), there could be two types of mistakes. The 1rst type is a feature error in which subjects report a feature that was not present in the display at all. The second type is a conjunction error in which subjects correctly report the features present in the display but miscombine features that belong to different objects. Consider the within-shape task with a target consisting of a U-shape half-circle with vertical line, and a distractor consisting of a circle with a horizontal line. Subjects could mistake the half circle, reporting an object containing the other half-circle. This would be a curvature feature error, because the reported half-circle was
Fig. 29.6 Stimuli used in our experiment in the within-shape condition (upper part) and in the cross-dimensional condition (lower part).
aapc29.fm Page 603 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
not present in the display. Subjects could also misreport one of the two lines not present in the display (right and left diagonal lines here). This would be an orientation feature error. Subjects could also report the line presented with the circle (the horizontal line in this example). This mistake would be a within-shape conjunction error. There are in fact 1ve different categories of errors in this terminology. Reporting the U-shaped half-circle with a horizontal line is a within-shape conjunction error. Reporting a U-shaped half-circle with either right or left diagonal line is an orientation feature error. A report of inverted half-circle with the vertical line is a curvature feature error; a report of inverted half-circle with horizontal line is a curvature feature error and within-shape conjunction error. Finally, report of an inverted half circle with either right or left diagonal line is a curvature feature error and an orientation feature error. Our main interest is in two of these error categories, the orientation feature error and the within-shape conjunction error. In both cases, the curvature was correctly reported, and the orientation was misreported. If the selection of the orientation was random we should expect approximately twice as many orientation feature errors than within-shape conjunction feature errors, because there are two possible orientation feature errors and only one possible conjunction error. A higher ratio of conjunction-to-feature errors is an indication of illusory conjunctions. A similar consideration holds for the cross-dimensional condition. Consider for example a display in which a blue U-shaped half-circle was presented with a red circle. A report of a yellow or a green U-shaped circle would be a color feature error. A report of a red U-shaped circle would be a color conjunction error. A report of a blue inverted U-shape half-circle is a curvature error, a report of a red inverted U-shape half-circle is a curvature feature error and a color conjunction error; a report of a yellow or green inverted U-shape half-circle is a curvature feature error and a color feature error. Once again, we focus on the comparison of color feature and color conjunction errors. A higher than 1:2 ratio between color conjunction and color feature error is an indication of illusory conjunctions. Design. For the secondary task there were eight possible targets, each of which could be presented with three possible distractors (each distractor shared orientation with one of the targets and was never presented with it, eliminating one of the distractors for each of the targets). The relative position of the target and distractor (i.e. right of center and left of center) was also counterbalanced leading to 48 (8 × 3 × 2) possible combinations. A block of trials consisted of 96 trials, two trials for each of the 48 possible combinations. Subjects performed a practice block followed by four experimental blocks. The display on each trial was presented for a short time and then masked. The Stimulus Onset Asynchrony (SOA) between the stimulus display and the mask for each subject was determined as follows: we ran a practice block of 96 trials. For the 1rst 12 trials the exposure duration was 243 ms (17 refresh cycles of 14.3 ms each). If primary digit task performance of subjects on these trials was above 90% (11 or more correct answers), we reduced the exposure duration by one cycle. If performance was below 10 correct answers, we increased the exposure duration by one cycle. Otherwise it was unchanged. We then repeated the same procedure for every group of 12 trials. The exposure duration at the end of the practice block was set for the 1rst experimental block. In the end of each experimental block we used the same procedure to either increase, decrease, or maintain the exposure duration. Procedure. At the beginning of each trial, two achromatic asterisks appeared at the location in which the digits would appear. After 500 ms the asterisks were replace by the two digits, and 28.6 ms later the secondary task stimuli appeared as well. The head start of the digits over the secondary task stimuli was designed to further ensure that subjects would primarily focus on the digit
603
aapc29.fm Page 604 Wednesday, December 5, 2001 10:14 AM
604
Common mechanisms in perception and action
Table 29.1 Mean proportion of correct responses and of errors made in the within-shape and cross-dimensional conditions within-shape Correct responses Conjunction error (within-shape/color) Feature error (orientation/color) Feature curvature Feature curvature and conjunction error Feature curvature and feature error
43.4 11.3 14.3 9.7 9.3 11.9
cross-dimensional 66.9 8.7 4.7 12.5 5.0 2.2
task. The display was then masked by four achromatic asterisks covering the entire display area, and appearing for 300 ms. The SOA between the stimulus display and the mask was determined on the basis of the method described above. The screen went blank until subjects 1nished their responses. 1000 ms later, the next trial began. Subjects were told that it was most important to report the digit task correctly. In the end of each block, subjects were told the number of mistakes made in the primary digit task.
29.3.4.2 Results and discussion Mean exposure duration for the two groups of subjects was similar, 304 and 303 ms for the withinshape and cross-dimensional conditions, respectively. Performance on the primary task was similar as well, 94.1% and 96.2% for the within-shape and cross-dimensional conditions, respectively. The difference between the two conditions was not signi1cant statistically. In general, the high performance on the digit task indicates that subjects followed the instructions to focus on this task. The main interest is in the results of the secondary task, presented in Table 29.1. Only trials in which the digit task was performed correctly were analyzed. The table presents the proportion of correct responses, as well as the proportion of errors in each of the 1ve possible categories of errors. The left column presents the results of the within-shape condition, and the right column shows the results of the cross-dimensional condition. Subjects performed the cross-dimensional task more accurately than the within-shape task, F(1, 22) = 7.11, p < 0.05. We shall return to this point shortly. Our main focus was on the comparison between the conjunction and feature errors in each of the two conditions. The ratio of color conjunction to color feature errors in the cross-dimensional task was signi1cantly higher than 0.5, t(11) = 2.39, p < 0.05. The ratio of within-shape conjunction errors to orientation feature errors was not signi1cantly higher than 0.5, t(11) = 1.6, p > 0.05. We also compared the ratio of the color conjunction to color feature errors (committed in the cross-dimensional condition) with that of the ratio of within-shape conjunction to orientation feature errors (committed in the within-shape condition). The ratio of color conjunction to color feature errors, as predicted, was higher, t(22) = 1.85, p < 0.05. The results are quite clear and as predicted. Statistically speaking, there is no indication of illusory conjunctions in the within-shape condition, and a clear indication of illusory conjunctions in the cross-dimensional condition. The failure to demonstrate illusory conjunctions in the within-shape condition should be viewed with some caution, because it can be ascribed to lack of statistical power. It is clear, however, that the number of illusory conjunctions, if they do exist, is moderate.
aapc29.fm Page 605 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
Table 29.2 Mean proportion of correct responses and of errors made by the top half and the bottom half of the subjects in the cross-dimensional condition
Correct responses Conjunction error (within-shape/color) Feature error (orientation/color) Feature curvature Feature curvature and conjunction error Feature curvature and feature error
top half
bottom half
79.0 4.8 3.7 9.9 1.9 0.7
54.9 12.5 5.7 15.1 8.0 3.8
More importantly, the results clearly demonstrate that there are more illusory conjunctions in the cross-dimensional condition than in the within-shape condition. There are two potential problems with our interpretation of the results. First, as reported above, the overall performance was better in the cross-dimensional condition and it is possible that illusory conjunctions are demonstrated only when performance is fairly high (as in the cross-dimensional condition). This interpretation is not tenable for two reasons. First, illusory conjunctions have been demonstrated in similar paradigms to the one we use even when the overall performance was similar to the current within-shape condition (see, e.g. Cohen and Ivry 1989; Exp. 3). In addition, we divided the 12 subjects of the cross-dimensional condition into 2 groups of 6 subjects each on the basis of their overall performance. The results of the two groups are presented in Table 29.2. It is clear that the relative number of conjunction to feature errors is not affected by the overall performance. A second potential problem concerns the claim that a ratio of conjunction to feature errors higher than 1:2 is an indication of illusory conjunctions. As pointed by Ashby et al. (1996), a number of strategic reasons may lead to an excessive number of conjunction errors. Once again, however, this potential problem is not relevant to the present results for at least two reasons. First, the within-shape and cross-dimensional conditions are exactly identical except for the use of orientation versus the use of color in the two conditions. There is no apparent reason why subjects in these two conditions will use different strategies, and yet the performance was different in the two conditions. Second, the present paradigm enables us to examine possible strategic responses (see Cohen and Ivry 1989, appendix). In particular, if subjects used a different strategy when they did not identify the half circles, there should be a different ratio of conjunction to feature errors when the half circle was misreported. As can be seen in Tables 29.1 and 29.2, the proportion of conjunction to feature errors was similar when the half-circles were identi1ed correctly, and when they were not identi1ed correctly. The results of the present results, then, nicely support our prediction. They indicate that binding of within-shape elements, if needed, is fundamentally different from binding of cross-dimensional features. Our interpretation is that within-shape elements may be uni1ed to a single representation and thus may be less susceptible to illusory conjunctions. In contrast, we propose that cross-dimensional conjunctions are transient and are never uni1ed. The results are compatible with this claim.
29.4 Conclusions We propose that within the ‘what’ system there exist two visual subsystems. One system is designed for object recognition. This system is oriented toward identi1cation of existing representations as well
605
aapc29.fm Page 606 Wednesday, December 5, 2001 10:14 AM
606
Common mechanisms in perception and action
as creation of new representations. It is a complex system that requires complex processes (and consequently a fair amount of time) for successful identi1cation of objects. The second, DA system is oriented toward action. It is composed of several visual dimensions (the exact number of which is still to be determined), each of which is endowed with separable perceptual and response selection processes. Spatial attention mediates which of the computations, done in the various dimensions, will be executed. The DA system has also evolved to deal with cross-dimensional conjunctions, but, as in single feature tasks, it is oriented toward action. The identi1cation of cross-dimensional conjunctions in the DA system is very similar to response selection made for single features, its representation of crossdimensional conjunctions is transient, it is not designed to add new representations, nor is it designed for recognition of objects. In this paper we reviewed the evidence for the existence of the DA system, and for its distinction from the object recognition system. We provided new evidence for the distinction between the DA and object recognition systems.
Acknowledgement This study was supported by a grant from the Israel Science Foundation.
Notes 1. The term ‘response selection’ is used differently by different investigators. Here we refer to an abstract response decision without speci1cation of the response implementation. We call it response decision because it is a task-dependent on-line decision. Thus, R1 and R2 in our model are created for the task and are associated with the stimuli (e.g. connection of R1 with red) on the basis of the task instructions. 2. The term ‘object recognition’ is somewhat ambiguous. Moreover, the essential visual properties for recognizing objects are controversial (e.g. Hummel and Biederman 1992; Tarr and Bülthoff 1995). In the present paper we will refer to the properties that give rise to the object’s form. These include line orientations, curvatures, and the spatial relations between them. We do not make any claims, however, on the processes that lead to object recognition. Note also that we don’t make any claims concerning the nature of the representations of objects. These are controversial as well. We only give a rough description of what we mean by ‘visual object’. 3. One may wonder why a special conjunction map is needed. Cohen and Shoup (2000) discuss several reasons for this assumption. Among these reasons, major 1ndings in the cross-dimensional conjunction literature are explained with this assumption. It is a relatively simple hardware investment (an addition of one spatiotopic map). As will be discussed below, it adds strong constraints to processes and representations of cross-dimensional conjunctions. Finally, other binding models also assume a similar structure (e.g. Wolfe 1994; Treisman and Sato 1990), albeit with different goals (see Cohen and Shoup 2000, for more discussion of this issue).
References Ashby, F.G., Prinzmetal, W., Ivry, R., and Maddox, W.T. (1996). A formal theory of feature binding in object recognition. Psychological Review, 103, 165–192. Bridgeman, B. (2001). Attention and visually guided behavior in distinct systems. This volume, Chapter 5. Cohen, A. (1993). Asymmetries in visual search for conjunctive targets. Journal of Experimental Psychology: Human Perception and Performance, 19, 775–797. Cohen, A. and Ivry, R. (1989). Illusory conjunctions inside and outside the focus of attention. Journal of Experimental Psychology: Human Perception and Performance, 15, 650–663.
aapc29.fm Page 607 Wednesday, December 5, 2001 10:14 AM
The dimensional-action system: a distinct visual system
Cohen, A. and Ivry, R. (1991). Density effects in conjunction search: Evidence for a coarse location mechanism of feature integration. Journal of Experimental Psychology: Human Perception and Performance, 17, 891–901. Cohen, A. and Magen, H. (1999). Intra- and cross-dimensional visual search for single feature targets. Perception and Psychophysics, 61, 291–307. Cohen, A. and Rafal, R.D. (1991). Attention and feature integration: Illusory conjunctions in a patient with a parietal lobe lesion. Psychological Science, 2, 106–109. Cohen, A. and Shoup, R. (1997). Perceptual dimensional constraints on response-selection processes. Cognitive Psychology, 32, 128–181. Cohen, A. and Shoup, R. (2000). Response selection processes for conjunctive targets. Journal of Experimental Psychology: Human Perception and Performance, 26, 391–411. Coles, M.G.H., Gratton, G., Bashore, T.R., Eriksen, C.W., and Donchin, E. (1985). A psychophysiological investigation of the continuous 2ow model of human information processing. Journal of Experimental Psychology: Human Perception and Performance, 11, 529–553. Craft, J.L. and Simon, J.R. (1970). Processing symbolic information from a visual display: Interference from an irrelevant directional cue. Journal of Experimental Psychology, 83, 415–420. De Yoe, E.A. and Van Essen, D.C. (1988). Concurrent processing streams in monkey visual cortex. Trends in Neuroscience, 11, 219–226. Driver, J. and Baylis, G.C. (1991). Target–distractor separation and feature integration in visual attention to letters. Acta Psychologica, 76, 101–119. Eriksen, B.A. and Eriksen, C.W. (1974). Effects of noise letters upon the identi1cation of a target letter in a nonsearch task. Perception and Psychophysics, 16, 143–149. Eriksen, C.W. and Eriksen, B.A. (1979). Target redundancy in visual search: Do repetitions of the target within the display impair processing? Perception and Psychophysics, 26, 195–205. Feintuch, U. and Cohen, A. (in preparation). Attention and the interaction of features from different dimensions. Found, A. and Müller, H.J. (1996). Searching for unknown feature targets on more than one dimension: Investigating a ‘dimension-weighting’ account. Perception and Psychophysics, 58, 88–101. Goodale, M.A. and Milner, A.D. (1992). Separate visual pathways for perception and action. Trends in Neuroscience, 15, 20–25. Hummel, J.E. and Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480–517. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility—a model and taxonomy. Psychological Review, 97, 253–270. Lavie, N. (1997). Visual feature integration and focused attention: Response competition from multiple distractor features. Perception and Psychophysics, 59, 543–556. Livingstone, M.S. and Hubel, D.H. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240, 740–749. MacLeod, C.M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163–203. Maddox, W.T., Prinzmetal, W., Ivry, R., and Ashby, F.G. (1994). A probabilistic multidimensional model of location discrimination. Psychological Research, 56, 66–77. Miller, J. (1982). Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology, 14, 247–279. Miller, J. (1991). The 2anker compatibility effect as a function of visual angle, attentional focus, visual transients, and perceptual load: A search for boundary conditions. Perception and Psychophysics, 49, 270–288. Mordkoff, J.T. and Yantis, S. (1991). An interactive race model of divided attention. Journal of Experimental Psychology: Human Perception and Performance, 17, 520–538. Mordkoff, J.T. and Yantis, S. (1993). Dividing attention between color and shape: Evidence for coactivation. Perception and Psychophysics, 53, 357–366. Müller, H.J., Heller, D., and Ziegler, J. (1995). Visual search for singleton targets within and across feature dimensions. Perception and Psychophysics, 57, 1–17. Pashler, H. (1992). Attentional limitations in doing two tasks at the same time. Current Directions in Psychological Science, 1, 44–48. Prinzmetal, W. (1981). Principles of feature integration in visual perception. Perception and Psychophysics, 30, 330–340. Prinzmetal, W. (1995). Visual feature integration in a world of objects. Current Directions in Psychological Science, 4, 90–94.
607
aapc29.fm Page 608 Wednesday, December 5, 2001 10:14 AM
608
Common mechanisms in perception and action
Proctor, R.W. and Vu, K.-P.L. (2002). Eliminating, magnifying, and reversing spatial compatibility effects with mixed location-relevant and irrelevant trials. This volume, Chapter 22. Rossetti, Y. and Pisella, L. (2002). Several ‘vision for action’ systems: A guide to dissociating and integrating dorsal and ventral functions. This volume, Chapter 4. Shiffrin, R.M. and Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 87, 127–190. Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders’ method. In W.G. Koster (Ed.), Attention and performance II, pp. 276–315. Amsterdam: North-Holland. Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–662. Tarr, M.J. and Bülthoff, H.H. (1995). Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein 1993. Journal of Experimental Psychology: Human Perception and Performance, 21, 1494–1505. Treisman, A.M. (1986). Properties, parts and objects. In K. Boff, L. Kaufman, and J. Thomas (Eds.), Handbook of perception and human performance, pp. 1–70. New York: Wiley. Treisman, A.M. (1988). Features and objects: The fourteenth Bartlett memorial lecture. Quarterly Journal of Experimental Psychology, 40A, 201–237. Treisman, A.M. (1996). The binding problem. Current opinions in Neurobiology, 6, 171–178. Treisman, A.M. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. Treisman, A.M. and Gormican, S. (1988). Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95, 15–48. Treisman, A.M. and Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16, 459–478. Treisman, A.M. and Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107–141. Wolfe, J.M. (1994). Guided search 2.0. A revised model of visual search. Psychonomic Bulletin and Review, 1, 202–238. Wolfe, J.M., Stewart, M.I., Friedman-Hill, S.R., Yu, K.P., Shorter, A.D., and Cave, K.R. (1990). Limitations on the parallel guidance of visual search: Color X color and orientation X orientation conjunctions. Journal of Experimental Psychology: Human Perception and Performance, 16, 879–892.
aapc30.fm Page 609 Wednesday, December 5, 2001 10:15 AM
30 Selection-for-perception and selection-forspatial-motor-action are coupled by visual attention: a review of recent findings and new evidence from stimulus-driven saccade control Werner X. Schneider and Heiner Deubel
Abstract. The topic of this paper is how selection-for-visual-perception (usually identi1ed with visual attention) and selection-for-spatial-motor-action are related. The later process refers to the fact that simple actions such as grasping an object usually imply the need to select one movement target among other potential targets. In the 1rst part of the paper, a theoretical framework for understanding the relationship between selective perception and motor target selection will be introduced, namely the ‘Visual Attention Model’ (VAM, Schneider 1995). The main hypothesis of VAM is that of a tight coupling between selection-for-perception and selection-forspatial-motor-action, which is assumed to be mediated by a common visual attention mechanism. Recent behavioral evidence supporting this claim is reviewed in the second part. The basic experimental paradigm (Deubel and Schneider 1996) required subjects to discriminate brie2y presented target stimuli while they were preparing a saccadic eye movement or a manual pointing movement. The data revealed a spatially selective (possibly object-speci1c) coupling of motor programming and visual perception. In the third part of the paper, three new experiments are reported which investigated whether this coupling also holds when the motor action is directed in a stimulus-driven way. A discrimination judgment had to be made about a letter object that was brie2y presented during the preparation of a saccade guided by a peripheral cue. All three experiments showed a tight, spatially speci1c coupling between the intentionally controlled perceptual discrimination and the stimulusdriven saccade programming. Additionally, the third experiment addressed the question of whether this result is the consequence of an obligatory attraction of attention by peripheral cues per se. The data show that a nonrelevant peripheral cue attracted attention automatically only when cue and discrimination target appeared in the same hemi1eld. We conclude that visual attention is not obligatorily coupled to peripheral cues, rather, the spatial relationship between cue and the goal-driven attentional focusing has to be taken into account. Implications of the new 1ndings for theories of attentional control in visual perception and motor action are discussed.
30.1 Selection-for-perception and selection-for-spatialmotor-action are coupled by a common attentional process: the Visual Attention Model (VAM) The claim that attention processes play a prominent role in visual perception is supported by a large body of evidence from different experimental paradigms (for an overview, see, e.g. Pashler 1997). Only a limited amount of information that is present at the retina can be processed up to the level of ‘conscious’ availability. For instance, studies of the change blindness paradigm (e.g. Rensink 2000; Simons and Levin 1997) have shown that a very small number of objects from a natural scene can
aapc30.fm Page 610 Wednesday, December 5, 2001 10:15 AM
610
Common mechanisms in perception and action
be monitored for detecting changes. Given this capacity limitation in conscious visual perception, selection processes are required. They have to determine which parts of visual information—taken up within a single eye 1xation—are processed up to the highest perceptual level which allows information to be used for action, e.g. verbal report. There is still a controversy as to where and how in the visual brain such selection processes take place (e.g. Allport 1993; Pashler 1997), but the existence of selection-for-visual-perception as a major attentional function is undebated. A second function of attention refers to the motor action domain and was termed ‘selectionfor-action’ by Allport (1987). The basic idea is that natural environments usually contain many potential targets for motor actions. However, motor actions such as grasping or pointing are usually directed to only one target at a time. Therefore, a selection process is required that delivers spatial information of the intended target object (its location, size, shape, etc.) to the motor system (Neumann 1987) and that decouples information from other objects from motor control (Allport 1987). For instance, imagine you are sitting in a beergarden and you want to grasp your mug among the other mugs on the table. In this case a selection-for-spatial-motor-action process is needed that selects the spatial parameters of your mug (e.g. its location) in order to control the grasping movement. How are these selection functions, that is, selection-for-perception and selection-for-spatialmotor-action, related? The ‘Visual Attention Model’ (VAM, Schneider 1995) postulates that both selection functions are performed by one common visual attention mechanism which selects one object at a time for processing with high priority. More precisely, the following assumptions were made (Schneider 1995): 1. Selection-for-visual-perception is carried out within the ventral pathway of the visual brain. The ventral pathway runs from the primary visual cortex (V1) to the inferior-temporal cortex and has been claimed to be the brain structure that computes visual information (color, shape, category, etc.) about what objects are present in the world (Mishkin, Ungerleider, and Macko 1983). 2. Selection-for-spatial-motor-action is assumed to be carried out in the dorsal pathway of the visual brain, originating also in V1 and ending in the posterior parietal cortex. The brain areas in this pathway compute spatial information required for motor action, for instance, the location and size of the object that will be grasped (e.g. Milner and Goodale 1995). The consequence of selecting this spatial information-for-action is the set-up of motor programs towards the selected object. These motor programs can refer to a grasping, pointing, or an eye movement. They do not imply overt execution, rather, a separate control (go-) signal is postulated for that purpose (e.g. Bullock and Grossberg 1988; Rizzolatti, Riggio, Dascola, and Umiltà 1987). 3. VAM postulates a common visual attention mechanism for both selection functions. This mechanism gives processing priority to low-level visual representations in brain area V1 that belong to a single visual object (see also Duncan 1996). As a consequence, the neural activation 2ow representing the selected object is processed with highest priority for perception and spatial-motor-action in the higher-level ventral and dorsal areas. Within the ventral areas, this selected object is recognized fastest and made available to conscious visual perception. Simultaneously, within the dorsal pathway, motor programs for a grasping, pointing, or saccadic eye movement towards the selected object are set up with the highest priority. 4. The attentionally mediated coupling of selection-for-perception and for spatial–motor action predicts at the behavioral level that, during the programming phase, the preparation of a spatial– motor action binds the perceptual processing system to the movement target and its location. In other words, the perceptual representation of the external world during movement preparation
aapc30.fm Page 611 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
should be best for the movement target. Vice versa, the intention to attend to a certain object for perceptual analysis should lead to the implementation of motor programs towards this object.
30.2 A review of recent findings: goal-driven programming of saccades and pointing influences perceptual processing The claims of VAM were motivated by a number of empirical 1ndings at the behavioral and neural level. Speci1cally, a study by Deubel and Schneider (1996; Schneider and Deubel 1995) can be considered as the key data source. In that study, we investigated the relationship between one class of goal-driven spatial–motor actions, namely saccadic eye movements, and perceptual discrimination in a dual-task paradigm. Basically, subjects had to perform a perceptual discrimination task while preparing a saccadic eye movement. The spatial relationship between the saccade target (ST) and the discrimination target (DT) was systematically varied. More precisely, as a primary task, subjects had to make a saccade as fast as possible to a location indicated by a symbolic cue. The potential saccade targets were three items of a horizontal linear letter string on each side of 1xation (see Fig. 30.1, for a similar task). The secondary task measured perceptual performance. Subjects had to report a DT that was brie2y presented within the item string and that disappeared before the actual eye movement started, so that perceptual performance was measured during the saccade preparation phase only. ST and DT varied independently within the three possible positions of the string on each side. If visual attention for perception and saccade target selection could be controlled independently, discrimination performance should not depend on the location of the ST. On the other hand, if both selection processes are coupled via a common selection mechanism, then discrimination performance should be best when ST and DT refer to the same object. The result indeed revealed a high degree of spatially selective coupling. Discrimination performance was good when ST and DT referred to the same object. Discrimination performance for an object that appeared only one degree to the left or right of the ST location, however, was close to chance level. Furthermore, in a second experiment with the same paradigm, conditions for a decoupling of perception and spatial motor programming were improved by keeping the DT position for a block of trials constant and by informing subjects in advance about this location. Again, perceptual performance was best when DT and ST referred to the same object. Moreover, we asked in this study whether the intended or the actual saccade landing location mattered for perceptual performance. The data clearly showed that perceptual processing priority is on the intended rather than on the actual landing position of the saccade. The claim that saccade programming and selective perception are related is not unique and it has also been supported by other experimental studies (e.g. Hoffman and Subramaniam 1995; Kowler, Anderson, Dosher, and Blaser 1995; Shepherd, Findlay, and Hockey 1986). However, VAM postulates that any spatial–motor action towards an object, for instance, a grasping or pointing action, should bind the attentional mechanism in visual perception. We tested this prediction with the same experimental paradigm as described before, but now subjects had to point rather than move their eyes (Deubel, Schneider, and Paprotta 1998). In contrast to a saccadic eye movement, it is less obvious why the preparation of a goal-directed hand movement should also in2uence perception. The results showed that perceptual performance was again best when manual target and DT referred to the same location and considerably worse in case of spatial noncongruency. In a further series of experiments (Paprotta, Schneider, and Deubel, in preparation) based on a similar experimental paradigm with
611
aapc30.fm Page 612 Wednesday, December 5, 2001 10:15 AM
612
Common mechanisms in perception and action
a circular arrangement of stimuli we asked whether the coupling of spatial–motor action and perception would still be found if we allowed movements to be become ‘automatized’. We provided the opportunity to ‘automatize’ by having the movements go to the same location in space for a whole block of trials. In case of repetitive pointing movements we found that perceptual performance did no longer depend on the movement target location. However, for repetitive saccadic movements to the same location in space, the dependency of perceptual performance to the movement location persisted. At the mechanistic level, these results imply that the system responsible for manual movements is able to use a stored motor program for action execution, while movements in the saccadic system are always controlled ‘on-line’, that is, involving selective attention. Further evidence for the relevance of visual selection processes in movement programming comes from studies by Tipper, Lortie, and Baylis (1992; see also Tipper, Howard, and Houghton 1998), Castiello (1996), and Craighero, Fadiga, Rizzolatti, and Umiltà (1998). Tipper et al. (1992) investigated the effect of a distractor on a reaching movement towards a target. Interestingly, an effect of the distractor on the movement latency was only observed when the distractor appeared between the starting position of the hand and the target location. Distractors beyond the reaching target did not in2uence the response latency. So, competition between target and distractor for movement control depended on their spatial relationship. Castiello (1996) investigated interference effects of distractors on a grasping movement. In this study, the distractors were task-relevant for a secondary nonspatial task. Given these conditions, Castiello (1996) found an effect of the distractor on the kinematics of the grasp. These interference effects can be interpreted as behavioral evidence for competition of different objects for controlling the movement. Craighero et al. (1998) investigated whether a nonrelevant prime picture in2uenced the latency of the following grasping movement. They found a reduction of grasping latency when the prime picture depicted the to-be-grasped object, as compared with the condition in which the prime depicted a different object. So, visual perception of an object, here the prime, in2uenced the programming of a movement that immediately followed the perception. The authors interpreted this 1nding in terms of the ‘premotor theory of attention’ (Rizzolatti, Riggio, and Sheliga 1994; Rizzolatti et al. 1987) which will be compared with VAM in the General Discussion section.
30.3 Stimulus-driven saccade control and its influence on visual perception: new experimental evidence Up to now, all reported studies that found an in2uence of spatial–motor programming on visual perception concerned intentional, goal-driven movements based on a symbolic cue which required a transformation of the meaning of the cue into a movement target position. The intention to move according to the instruction gave the cue its meaning and its power in controlling the movement. However, movement target selection and the underlying visual attention process can also be controlled in a stimulus-driven way (e.g. Jonides 1981; Müller and Rabbitt 1989; Yantis 1998). This means that the stimulus characteristics are able to control the allocation of the attentional mechanism and consequently (according to VAM) also movement target selection. The stimulus characteristics can be related to elementary physical features such as color, shape, and motion. For instance, if a single red circle appears among green circles then it seems to pop out and visual attention is directly allocated towards this singleton. Other ways to attract visual attention in a stimulus-driven way may involve, for instance, abruptly appearing objects (onsets) or moving
aapc30.fm Page 613 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
objects (see Yantis 1998, for an overview). In many experiments, stimulus-driven control of attention has been realized by peripheral cues (e.g. Jonides 1981; Müller and Rabbitt 1989)—sometimes also called ‘direct cues’. These peripheral cues have proven to be ef1cient in attracting attention in a stimulus-driven, exogenous, and involuntary way (see Müller and Rabbitt 1989; Yantis 1998). In the experiments that we will report in the following the peripheral cue consisted of a abruptly appearing bar marker that appeared directly at the location to which the movement (and therefore attention) had to be directed. A number of empirical studies have shown that endogenous, goal-driven and exogenous, stimulusdriven control of visual attention and of saccadic eye movements differ in a number of functional characteristics. First, it has been demonstrated that shifts of visual attention have different time courses for peripheral cues and for symbolic cues (e.g. Müller and Rabbitt 1989; Nakayama and Mackeben 1989). Peripheral cues lead to a faster, more transient build-up of processing priority at the attended location compared with symbolic cues. Second, peripheral cues are much harder to ignore than symbolic cues (e.g. Jonides 1981; Yantis 1998). Third, exogenous and endogenous saccade control also differ in important respects. Based on lesions studies and other lines of evidence it has been suggested that different pathways in the primate brain control different types of saccadic eye movements (e.g. Pierrot-Deseilligny, Rivaud, Gaymard, Müri, and Vermersch 1995). Stimulusdriven saccades are claimed to be controlled and triggered by a pathway from V1 via the parietal eye 1eld (LIP in the monkey) to the superior colliculus (SC), while the pathway for intentional saccades involves the frontal eye 1eld that in turn projects to the SC and also directly to the saccade generator of the reticular formation. Given this pathway architecture it is possible that stimulusdriven saccades are triggered independently of intentional saccades. Recent behavioral data by Theeuwes et al. (1998, 1999) indeed suggest that intentional and stimulus-driven saccades can be programmed in parallel; further support for the claim of different pathways of saccade control. Given these different characteristics of two control forms for visual attention and saccadic eye movement, it is not unreasonable to assume that selection-for-perception and selection-for-motor-action in saccades can be decoupled if one selection function relies on stimulus-driven control and the other on goal-driven control. VAM, however, assumes that decoupling should not be possible despite the different types of attentional control. In order to test which of the two hypotheses holds, we performed three experiments using a similar experimental paradigm to the one introduced above (Deubel and Schneider 1996). This time, however, subjects had to prepare and execute a stimulus-driven saccade directed by a peripheral cue while the secondary task involved goal-driven selective discrimination. Experiment 1 relied on the same experimental parameters as Experiment 1 of Deubel and Schneider (1996), except that peripheral cues instead of symbolic cues were used for directing the saccade. Experiment 2 was designed to ask whether the coupling is obligatory. In Experiment 3, we introduced an additional 1xation control condition in order to the test the widely-held assumption that abruptly appearing peripheral cues attract attention in an obligatory way.
30.3.1 General methods 30.3.1.1 Subjects Six subjects aged 20–32 years participated in Experiment 1 and 3 and four of these in Experiment 2. All had normal vision and were experienced in a variety of experiments related to oculomotor research. All subjects were naïve with respect to the aim of the study.
613
aapc30.fm Page 614 Wednesday, December 5, 2001 10:15 AM
614
Common mechanisms in perception and action
30.3.1.2 Experimental set-up The subject was seated in a dimly illuminated room. Visual stimuli were presented on a fast 21 inch color monitor providing a frame frequency of 100 Hz with a spatial resolution of 1024*768 pixels. Active screen size was 40 by 30 cm; the viewing distance was 80 cm. The video signals were generated by a freely programmable graphics board, controlled by a PC via the TIGA (Texas Instruments Graphics Adapter) interface. Stimuli appeared on a gray background that was adjusted at a mean luminance of 2.2 cd/m2. The luminance of the stimuli was 25 cd/m2. The relatively high background brightness is essential to avoid the effects of phosphor persistence. Eye movements were recorded with a SRI Generation 5.5 Dual-Purkinje-image eyetracker (Crane and Steele 1985) and sampled at 400 Hz. Head movements were restricted by a biteboard and a forehead rest. The experiment was completely controlled by a 486 Personal Computer. The PC also served for the automatic off-line analysis of the eye movement data in which saccadic latencies, and saccade start and end positions were determined. 30.3.1.3 Calibration and data analysis Each session started with a calibration procedure in which the subject had to sequentially 1xate ten positions arranged on a circular array of 6 deg radius. The tracker behaved linearly within 8 deg around the central 1xation. Overall accuracy of the eyetracker for static 1xation positions was better than 0.1 deg. Dynamically, however, the eyetracker records considerable artifactual overshoots of the eye at the end of each saccade, which we ascribe to the movement of the eye lens relative to the optical axis of the eye (Deubel and Bridgeman 1995). In order to determine veridical direction of gaze, an off-line program searched the record for the end of the overshoot and then calculated eye position as a mean over a 40 ms time window. 30.3.2 Experiment 1: is there a coupling between stimulus-driven saccade control by peripheral cues and goal-driven selective visual perception? Given that stimulus-driven control and goal-driven control of visual attention and of saccadic eye movements differ in a number of aspects (see last paragraph), it is not implausible to assume two independent control structures. Experiment 1 was the 1rst step towards testing this hypothesis, which would predict that DT discrimination should not depend on ST location of a peripherally driven saccade. Subjects had to discriminate a brie2y presented stimulus (DT) while preparing a stimulus-driven saccadic eye movement. The saccade was guided by a peripheral, abruptly appearing cue that directly indicated ST within a string of letters.
30.3.2.1 Procedure Subjects performed four experimental blocks of dual-task trials. A block consisted of 216 experimental trials for which the experimental conditions were selected at random. Figure 30.1 shows an example for the sequence of stimuli of a single trial. Each trial started with the presentation of a small 1xation cross at the center of the screen, with a size of 0.15 deg. Simultaneously, two strings of characters appeared left and right of the central 1xation, each consisting of 1ve ‘8’-like characters. The width of each item was 0.52 deg of visual angle, its height was 1.05 deg. The distance between the items was 1.09 deg, with the central item of the 1ve letters being presented at an eccentricity of 5 deg. After a variable delay ranging from 800 to 1200 ms, the ST was indicated by two vertical lines (bar marker) appearing directly above and below one of the items. Simultaneously, the 1xation
aapc30.fm Page 615 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
Fig. 30.1 Stimulus sequence in Experiment 1. The subject 1xated the central cross for 800–1200 ms. Then a cue consisting of two vertical bars indicated saccade target position. The cue appeared at one of the positions indicated by 1, 2, or 3 in the graph (the numbers are, of course, not shown on the screen), to the left or to the right of 1xation. After a delay of 60 ms, the discrimination target and the distractor stimuli were presented for 100 ms. Both distractor and discrimination target disappeared before the onset of the saccade. After the saccade, the subject has to indicate the identity of the discrimination target. cross disappeared. The side (left or right) and the item position where this cue appeared was varied randomly among the three innermost positions in the string (i.e. at position 1, 2, or 3, as indicated in Fig. 30.1). After a cue lead time of 60 ms, nine of the ten items in both strings were replaced by distractors that were randomly selected to be ‘S’ or a mirror-symmetric ‘S’. One of the three inner items on the side indicated by the ST was replaced by the DT that was either ‘E’ or a mirrorsymmetric ‘E’. Thus, the ST cue provided a valid indication of the side where the DT would appear, but did not specify the position of DT within the string. All experimental conditions occurred with equal probability. DT and the distractors disappeared after a presentation time of 100 ms. Consequently, the discrimination target was no longer available 160 ms after the onset of the saccade target. As a result of this stimulus timing most saccades were initiated well after the disappearance of target and distractors. In order to eliminate occasional responses that occurred too early, the off-line data analysis discarded saccades with latencies shorter than 160 ms. Also, in this and the following experiments, trials with primary saccades smaller than 2 deg were not considered in the analysis. This occurred in less than 4% of the trials. After the saccade the subject had to indicate, without time pressure, the identity of the discrimination target by pressing one of two buttons. The two vertical
615
aapc30.fm Page 616 Wednesday, December 5, 2001 10:15 AM
616
Common mechanisms in perception and action
lines indicating ST stayed on the screen for 2 s until the end of the trial. After that, the central 1xation cross reappeared and the next trial was initiated by the computer. Each subject also ran two types of control blocks. A 1rst type of control block (‘No discrimination— saccade only’—single task condition) was introduced to discern saccadic reaction times in a single task situation. For this purpose, the subject was asked to saccade to the ST, but was not required to discriminate. Each subject performed a single block of 216 trials. In a second type of control block (‘No saccade—discrimination only’—single task condition), the subject was required to keep 1xation on the central cross. The purpose of this block was to measure perceptual performance at different DT positions without the preparation of an overt saccade. Each subject performed a single block of 216 of these trials. The stimulus sequence was identical to that described before except that the line cues appeared above and below, simultaneously at all three item positions on one side. Thus the cues indicated the hemi1eld where DT would appear but it did not indicate one item speci1cally.
30.3.2.2 Results and discussion Our subjects were experienced in various oculomotor tasks and produced fast and accurate saccades. Performance in the discrimination task improved considerably after some initial practice. The 1rst block therefore served for training and was not included in the data analysis. One important prerequisite for the proper interpretation of the experimental results was to establish that the saccadic performance was not affected by the congruency of ST and DT, that is, by the perceptual task. An analysis of saccadic latencies and amplitudes revealed that saccadic performance (latency and amplitude) did indeed not depend on the DT–ST-position relationship. Figure 30.2(a) shows saccadic latency (de1ned as the time between cue appearance and saccade onset) as a function of the position of the discrimination target, separately for the three saccade target positions and averaged over the subjects. Analysis of variance (ANOVA) with repeated measures con1rmed that saccadic latency was indeed independent of DT position (p > 0.20), with a slight but not signi1cant
Fig. 30.2 (Experiment 1) (a) Saccadic latency as a function of discrimination target (DT) position (in degrees of visual angle), given separately for saccades directed to the three saccade target (ST) positions. The horizontal dashed line is saccadic latency in the ‘No discrimination’ control condition. (b) Mean saccadic landing positions as a function of DT position. (c) Distribution of saccadic landing positions for the three ST positions. The dashed lines in (b) and (c) indicate the respective ST positions.
aapc30.fm Page 617 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
tendency to become longer with larger eccentricities of ST positions (p > 0.05). Mean saccade latency was 206 ms. Saccadic latency in the ‘No discrimination—saccade only’ control task was 185 ms, indicating a general slowing effect of the dual-task situation on the speed of the saccade initiation. Figure 30.2(b) displays mean saccadic amplitudes, again as a function of DT positions, and for the three ST positions. The actual ST positions are indicated by the three horizontal dotted lines. The graph reveals that the saccades hit the target with reasonable accuracy, leaving a saccadic undershoot in the range of 0.3–0.4 deg (i.e. less than 10% of the target eccentricity). Again, it is important to note that saccadic amplitude was independent of the position of DT (p > 0.50), indicating that the saccade accuracy is not affected by the perceptual task. Figure 30.2(c), 1nally, provides the distribution of landing positions of the primary saccades for the three ST positions. Standard deviations of the end positions were 0.76, 1.1, and 1.18 deg, for ST1, ST2, and ST3, respectively, showing an increase with increasing amplitudes. Secondary corrective saccades followed with a frequency of 58% of all trials. These follow-up saccades are indeed corrective in the sense that they bring the eye, on the average, between the bar markers that indicate the required 1nal 1xation location. Secondary saccades were not directed to the location of the discrimination target, when DT and ST positions differed. In our experiments, selective perceptual processing is measured by discrimination performance. The three diagrams in the upper row of Fig. 30.3 show discrimination performance for the six subjects
Fig. 30.3 (Experiment 1) Top row: Discrimination performance as a function of DT position, given for the saccade cued to ST positions 1, 2, and 3. The data are presented separately for the six subjects. The lower diagram summarizes the data for all six subjects. Dashed line: Discrimination performance in the ‘No saccade’ condition.
617
aapc30.fm Page 618 Wednesday, December 5, 2001 10:15 AM
618
Common mechanisms in perception and action
who participated in this experiment, measured as percent-correct decisions, and given as a function of DT position. The graphs present the data for the saccade cued to position 1, 2, and 3, respectively, averaged across left and right side of the 1xation. It is immediately obvious that performance consistently depends on the relation between position of the discrimination stimulus and the location of the indicated (future) saccade target position. For all subjects and all ST positions, performance was best when ST and DT positions coincided. When the saccade was not directed to DT position, performance decreased steeply and approached chance level. Superimposed on this pattern is that discrimination performance declines from the more foveal to the more peripheral DT locations, that is, from DT1 to DT3. The lower diagram of Fig. 30.3 summarizes the data across all subjects. For ST1, discrimination performance was close to perfect (88%) when the DT was presented at the ST location, but dropped to 58% at DT2 and 1nally to the 50% chance level for DT3. This astonishing dif1culty in identifying DT if spatially separate from the ST location is also obvious for ST2. In this case, discrimination accuracy dropped from 83% at DT2 (congruency case) to 64% at DT1 and 59% at DT3. A similar data pattern is found for saccades directed by the peripheral cue to ST3. ANOVA (repeated measures) con1rmed a highly signi1cant interaction of ST and DT positions, F(4, 20) = 27.2, p < 0.001, and a signi1cant effect of DT position, F(2, 10) = 5.0, p < 0.05. The data show that the ability to discriminate between objects in a multi-object scene during the preparation of a peripherally cued saccade is spatially limited to one common object, the saccade goal. This means that the predicted coupling between selection-for-spatial-motor-action and selection-for-perception holds also when the movement target is determined in a stimulus-driven way by a peripheral cue. In other words, Experiment 1 provides no evidence that goal-driven selection-for-perception can be decoupled from stimulus-driven saccade target selection. The dashed curve in Fig. 30.3 represents the results of the ‘No saccade—discrimination only’ 1xation control condition where all three items of one side were simultaneously cued. Therefore, subjects knew the hemi1eld of DT but not its exact position. The data show a low and positionalunspeci1c discrimination performance. Interestingly, performance is generally superior to the results from the saccade conditions where ST and DT referred to different items. An important question was whether perceptual performance is linked to the actual landing position of the eye or rather to the intended saccade target position. The relatively broad distributions of the saccade amplitudes as shown in Fig. 30.2(c) allowed for a dissociation of these two aspects. Figure 30.4 provides discrimination performance as a function of the actual saccadic landing positions, given separately for the cases when DT and ST positions coincided (1lled circles) and when ST and DT positions differed (open circles). The data are presented for the three DT positions in separate diagrams. If best performance would be linked to the actual landing position of the eye, both curves (1lled and open circles) should reveal equal performance when the eyes landed on the DT positions, no matter whether intended or not. The data show that this is not the case. First, the curves when ST positions were different from DT are more or less 2at at a low performance level, independent of the actual saccade endpoints. This means that, even in the cases where the eye actually went to land on the DT position (but intended not to do so), performance was close to chance level. Second, for the cases where ST and DT coincided, making a saccade to an item far from DT did not deteriorate performance. So, even when the eyes landed on position 1, the discrimination target at position 3 could be identi1ed accurately given the saccade is cued to position 3. A two-factorial ANOVA, repeated measures, con1rmed these conclusions. The 1rst factor determined whether ST position was equal to DT position or not, and the second factor was whether the eye landed on the amplitude
aapc30.fm Page 619 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
Fig. 30.4 (Experiment 1) Discrimination performance as a function of the actual saccadic landing positions, given separately for the cases when DT and ST coincide (1lled circles) and when ST and DT positions differ (open circles). The data are presented for the three DT positions in separate diagrams. The dashed lines in (b) and (c) indicate the respective ST positions.
bin before, at, or after the DT position. The analysis indeed revealed a highly signi1cant effect of coincidence of intended ST and DT position, F(1, 5) = 112, p < 0.001, but a nonsigni1cant effect of landing position (p > 0.70). Also, the interaction was nonsigni1cant (p > 0.05). These results emphasize the importance of the intended as compared with the actual landing position for controlling attention and perception.
30.3.3 Experiment 2: stimulus-driven saccade control and goal-driven selective visual perception: is the coupling obligatory? An evident question as to the generality of the results of Experiment 1 is to what extent the coupling between stimulus-driven saccade programming and selective visual perception is obligatory, that is, whether subjects are indeed unable to move their eyes to one location and attend to another. In Experiment 1, subjects did not know the position of DT because it could equally likely appear at all three positions inside the string on the side of ST. Therefore, subjects had no incentive to shift their selective perceptual processing away from the intended ST position. However, this incentive should be stronger if subjects knew where DT would appear. So, in order to improve the conditions for decoupling we gave subjects in this experiment knowledge about the position of DT by presenting DT always at the central position of the side where ST appeared.
30.3.3.1 Procedure The experiment was identical to Experiment 1 except that DT always appeared at the central position of the string that contained ST. 30.3.3.2 Results and discussion The basic parameters of the saccadic responses were similar to Experiment 1. Mean saccadic latency was 240.5 ms. ANOVA proved saccadic latencies to be dependent on ST position, F(2, 8) = 5.91, p < 0.05. Saccadic accuracy was again high; mean saccade sizes were 3.61, 4.67, and 5.65 deg for ST 1, 2, and 3, respectively. ANOVA (repeated measures) con1rmed a signi1cant main effect, F(2, 8) = 206, p < 0.001.
619
aapc30.fm Page 620 Wednesday, December 5, 2001 10:15 AM
620
Common mechanisms in perception and action
Fig. 30.5 (Experiment 2) Discrimination performance of 4 subjects as a function of saccade target position. The discrimination target was always presented at the central position (DT 2). The dependence of discrimination performance on indicated saccade target position is shown in Fig. 30.5, separately for the four subjects. Although subjects differed in their overall performance level, it is obvious that preknowledge about test stimulus position did not improve performance at the uncued locations: discrimination rate was still superior when DT and ST coincided, and dropped drastically at the adjacent positions. The main effect of ST position on discrimination performance was signi1cant, F(2, 8) = 16.5, p < 0.01. The data show that, despite improved conditions for decoupling saccades from selective perceptual processing, there is again a clear coupling of both selection functions.
30.3.4 Experiment 3: optimal conditions for decoupling and the question of involuntary attention attraction by peripheral cues One may object against Experiment 2 that conditions for decoupling were still not optimal. The fact that DT always appeared at the central position at the side of ST might not be a suf1ciently ef1cient way to provide usable knowledge of the DT position. The time from appearance of the ST cue to the appearance of the DT was only 60 ms, possibly too short to allocate visual attention to the location of the DT. This problem was addressed in Experiment 3 by keeping the DT position constant for a block of trials. Thus, subjects knew that, within a block, DT would always appear at the central position (position 2, see Fig. 30.1) of one side (e.g. left side), so that visual attention in perception could be allocated onto DT position prior to the appearance of ST. This condition should be ideal for a decoupling of selection-for-perception from selection-for-motor-control. A second issue we wanted to address with Experiment 3 referred to the question of whether an abruptly appearing peripheral cue necessarily binds visual attention in an obligatory way (see Yantis 1998). This would imply that a peripheral cue should always attract attention, independent of whether it is irrelevant (i.e. should be ignored by the subject) or it is relevant as a cue for a saccade. A possible approach to this question is to create an experimental condition where irrelevant peripheral cues per se have a low probability of attracting attention. We reasoned that this situation might be given when peripheral cues appear on the side contralateral to the discrimination target. These cues might be easier to ignore than cues on the same side and in close spatial proximity to the DT. To test these assumptions we introduced an additional 1xation condition were the subject was asked
aapc30.fm Page 621 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
to keep strict central 1xation. Nevertheless, a peripheral cue was presented which should be ignored by the subject. This onset cue could appear with equal probability on the same side as DT (ipsilateral) or on the other side as DT (contralateral). We expected on the one hand that an irrelevant cue contralateral to DT should not attract visual attention and therefore not in2uence perceptual performance. For the saccade condition, where the cue was relevant and determined the saccade target location, on the other hand, we expected that the ST cue should strongly bind the discrimination performance, independent of whether it appeared at the side ipsilateral or contralateral to DT. If these predictions turned out to be correct we could forcefully argue that the coupling of peripherally controlled saccades and perception was not due to the onset cue per se but instead to its functional meaning for saccade control.
30.3.4.1 Procedure The experiment was identical to Experiment 2 except that DT always appeared at the central position of the string (position 2, see Fig. 30.1) at one predetermined side for a block of 48 trials. Within such a block (selected at random), in half of the trials, ST appeared at the same side as DT (ipsilateral), in the other half of the trials, ST appeared contralateral to DT. Subjects performed a total of 4 sessions of a saccade condition and 4 sessions of a 1xation condition. Each session consisted of 2 blocks with DT at the left side and 2 blocks with DT at the right side. In the saccade condition, subjects were asked to saccade to ST. In the 1xation condition, subjects were told to ignore the peripheral cue and to maintain 1xation. Moreover, each subject performed a further control condition of one block of 216 trials where no cue was given. 30.3.4.2 Results and discussion Figure 30.6(a) shows mean saccade landing positions as a function of the ST position. ST positions contralateral to the side where DT appeared are shown as negative numbers, those ipsilateral as positive numbers. A considerable undershooting behavior is obvious for all target positions, most notably for the ipsilateral position peripheral to DT (position 3). The saccade latencies are shown in Figure 30.6(b). Saccades to the ipsilateral side, that is, to positions +1, +2, +3, are faster than contralateral saccades, that is, to positions −1, −2, −3. ANOVA (repeated measures) reveals a signi1cant effect of the factor ‘side’ (contra- vs. ipsilateral) on the latency, F(1, 5) = 15.19, p < 0.05, but no effect of the factor ‘cue position within string’ (1, 2, 3, that is, inner, middle, and outer position, p > 0.90) and no signi1cant interaction (p > 0.40). This contralateral slowing re2ects the only case in our experiments in which the DT position interferes with the saccade programming. This suggests that the presence of a DT at a blockwise constant position introduces an attentional bias to give more priority to objects and location on the side of DT as compared with the other side. Figure 30.7 presents discrimination performance. For the saccade condition (open circles) there was no signi1cant effect of the factor ‘side’ (p > 0.05), but a signi1cant effect of ‘cue position within string’, F(2, 10) = 33.45, p < 0.001, and of the interaction F(2, 10) = 10.42, p < 0.01. A further ANOVA (repeated measures) con1rmed a signi1cant effect for the factor ‘absolute cue position’ (− 3, −2 , −1, +1, +2, +3), F(5, 25) = 8.64, p < 0.001. Newman–Keuls tests (signi1cance level always 0.05) revealed that performance at the central, ipsilateral ST position (the DT position) was signi1cantly different from all other positions. So, again discrimination is best when ST and DT coincide, implying that preknowledge of DT does not allow to withdraw perceptual processing priorities from the ST.
621
aapc30.fm Page 622 Wednesday, December 5, 2001 10:15 AM
622
Common mechanisms in perception and action
Fig. 30.6 (Experiment 3) (a) Saccade amplitude as a function of ST position. (b) Saccadic latency as a function of ST position.
An important question is whether this coupling of perception and action is due to the peripheral cue per se or to the function of the cue for saccade control. The 1xation condition was introduced in order to answer this question. If the cue that has to be ignored in this condition attracts visual attention independent of its function for saccade control, the same effect of cueing on perceptual performance as in the saccade condition should be found. The data depicted in Fig. 30.7 show that this is clearly not the case. Performance in the contralateral case of the 1xation condition turns out to be different from the saccade condition. For the 1xation condition (1lled circles in Fig. 30.7) there was a signi1cant effect of the factors ‘side’, F(1, 5) = 9.90, p < 0.05, ‘cue position within string’, F(2, 10) = 20.21, p < 0.001 and of the interaction, F(2, 10) = 8.74, p < 0.01. A further ANOVA (repeated measures) con1rmed a signi1cant effect of the factor ‘absolute cue position’, F(5, 25) = 12.04; p < 0.001. For the ipsilateral side, performance at the central ipsilateral position (position +2) is signi1cantly better than performance at positions +1 and +3 (Newman–Keuls). Most importantly, for the contralateral side, there is no reduction of discrimination performance due to the irrelevant cue as compared with the condition where no cue at all appeared (Newman–Keuls), while there is a reduction at position +1 and +3 as compared with the no-cue condition (Newman–Keuls). Our results show that whether an irrelevant, abruptly appearing peripheral cue in2uences perception is critically dependent on the spatial relationship of the cue position and the intended attentional position. If an irrelevant cue appears on the same side as the to-be-attended target and in close spatial relationship, the cue exerts an unavoidable interfering effect on perceptual processing of the target—stimulus-driven allocation of attention to the irrelevant cue cannot be avoided. However, if the irrelevant cue appears on the side contralateral to the discrimination target, it no longer affects its perceptual processing—stimulus-driven allocation of visual attention can now be suppressed. This has, to our knowledge, not yet been reported in the literature on stimulus-driven attentional control (e.g. Yantis 1998). While our results show that an onset cue per se does not necessarily attract attention, a peripheral cue serving as a target for a saccade always binds cue position and perceptual processing. This implies that the coupling found in the saccade condition is due to the function of the cue for directing the saccade. So, even for stimulus-driven saccades, the coupling between saccade target selection and selection of the discrimination target is obligatory and restricted to one common target location,
aapc30.fm Page 623 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
Fig. 30.7 (Experiment 3) Discrimination performance as a function of cue position. The discrimination target (DT) appeared always at position +2. Open circles depict the saccade condition in which the peripheral cue directed the saccade. Filled circles depict the 1xation condition in which the cue was irrelevant for the task and had to be ignored.
which argues against the existence of two independent selection mechanisms for stimulus-driven saccade target selection and goal-driven perceptual selection.
30.4 Attentional processes in visual perception and spatial motor programming: general discussion In this section, we will 1rst summarize and discuss the data of the three experiments. The guideline question will be how selection in perception and selection in spatial–motor action are related when goal-driven and stimulus-driven forms of control are involved. Next, implications of Experiment 3 for the role of space in the issue of stimulus-driven attention will be discussed. Finally, reference will be made to two theoretical frameworks for understanding the coupling of selection processes in perception and action, namely to VAM and to the premotor theory of attention. Previous studies (e.g. Deubel and Schneider 1996; Deubel, Schneider, and Paprotta 1998) have shown that selection in visual perception and selection in spatial–motor actions are coupled to a common target object when both selection processes rely on intentional, goal-driven control. The new issue addressed here is whether this coupling still holds when spatial motor selection is under stimulus-driven control. The data from Experiment 1 and Experiment 2 clearly demonstrate that peripheral cues for saccade control generate a spatially selective coupling of discrimination performance and eye movement programming, even when subjects are provided with knowledge about the future DT position. In all conditions, discrimination performance was better when DT and ST referred to the same object as compared with the noncongruent cases. In Experiment 3, DT position was kept constant for a block of trials so that subjects could in advance (prior to ST appearance) allocate their perceptual attention directly to DT. Again, performance was by far best when ST and DT referred to the same location and object. Furthermore, Experiment 3 addressed a central objection against the use of peripheral cues in our experiments, namely the possibility that these cues could attract visual attention in an obligatory
623
aapc30.fm Page 624 Wednesday, December 5, 2001 10:15 AM
624
Common mechanisms in perception and action
manner—irrespective of the fact that these cues are used for saccade control. The data show that irrelevant contralateral cues can be ignored in visual processing, but that contralateral cues that are relevant for saccade control cannot. We conclude that it is not the abrupt appearance of the cue per se that generates the coupling of perception and action but its function to direct the saccade. Stimulus-driven control of a saccade and intentional control of selective visual perception are always spatially coupled to a common target object. The 1ndings from Experiment 3 require a supplement to the current view on the effects of irrelevant onsets on the allocation of visual attention (e.g. Theeuwes 1995; Yantis 1998) with respect to the role of space. In a recent review Yantis (1998, p. 252) wrote: ‘When an observer directs attention to a spatial location in advance of a display, then visual events that would otherwise capture attention will generally fail to do so.’ The 1xation condition of Experiment 3, however, reveals that attentional attraction by abrupt onsets in the case of prefocused attention was dependent on the spatial relationship between the attended object and the irrelevant peripheral cue—only if both were in the same hemi1eld was perceptual performance strongly affected by the onset cue. It is an open question for further research whether hemi1eld crossing or absolute distance between attended object and irrelevant onset cue are the decisive parameters for modulating the interference effect of the irrelevant cue on perceptual analysis. How general are these conclusions? It still remains to be investigated whether stimulus-driven perceptual selection and goal-driven motor selection would also be obligatorily coupled (as VAM would predict). Moreover, it should be considered that our stimulus-driven saccades were voluntary, in the sense that the subject’s intention was to use the cue. This is emphasized by the results of the 1xation condition of Experiment 3. Peripheral cues that had to be ignored at the contralateral side had no effect on perceptual processing. Therefore, the intention to use the cue for saccade control is decisive for generating the coupling of perception and action. So, ‘stimulus-driven’ could be de1ned in the sense that the cue itself (the abrupt onset) allows a direct speci1cation of the motor response, without any further symbolic instruction-based transformation of the cue content. Finally, the saccades in our experiments were ‘conscious’ in the sense that subjects were aware of their motor action. It is an open question whether the coupling will still be found when the stimulus-driven saccades are involuntary, re2exive reactions that are not noticed by the subject. We have preliminary evidence that this type of re2exive saccade can be programmed without the involvement of visual attention (Mokler, Deubel, and Fischer 2000). What are the implications of our 1ndings for models of selective perception and motor target selection? Both VAM and the premotor theory of attention indeed postulate an obligatory link between motor programming and attention control. The basic suggestion of the premotor theory is that spatial attention is controlled by motor programs. In its original form (Rizzolatti et al. 1987), only saccadic eye movement control structures were considered to direct spatial attention. In its more recent form (Rizzolatti et al. 1994), other premotor structures, called ‘pragmatic maps’ (e.g. for arm movement control), have been claimed to be in charge of attentional control as well. Therefore, the effects of spatial–motor programming on perception as reported above are also compatible with the premotor theory. Given these common features, the question arises as to what the main differences are between the premotor theory and VAM. First of all, VAM is in one respect more speci1c than the premotor theory, in that it predicts an object-speci1c coupling between perception and spatial motor programming. No investigations are yet available that have directly studied this aspect—the high spatial selectivity of discrimination performance we found is a hint for object-based selection, but no proof. Movement
aapc30.fm Page 625 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
programming to overlapping objects may allow us to test whether the object-speci1city claim is valid. Second, the theories differ in their assumptions on the origin and the 2ow of attentional control (see e.g. Chelazzi and Corbetta 2000, for the concept of a attentional control signal). In short, VAM assumes that motor programming is a consequence of visual attention processes while the premotor theory claims just the opposite, namely, that visual attention follows motor programming. To be more precise, VAM implies that the control signal for attentional modulations of stimulus processing originates in those brain areas that code the task-de1ned or stimulus-driven selection attributes. The control signal then propagates via V1 to the other higher-level brain areas of the ventral and dorsal streams. In these areas, motor programming as well as conscious visual perceptual perception should occur simultaneously as a consequence of the priorized activation 2ow from V1. For instance, if a saccade to a red object is to be made, an attentional control signal will originate from the cortical area that codes the color ‘red’, will 2ow to area V1, and will spread from there, simultaneously, both to other ventral areas (allowing conscious perceptual report of the red object) and to dorsal motor areas (leading to motor programming). Premotor theory, on the other hand, claims that brain structures responsible for motor programming are the exclusive origin for the attentional control signals—a motor program is always established 1rst, and only then does the attentional signal 2ow from premotor areas to other parts of brain, implementing spatial attention effects. This control signal 2ow is not explicitly speci1ed within the premotor theory but it is an evident implication. In order to distinguish these two theoretical options, single cell recordings of the attentional control signal 2ow might be helpful. If the premotor theory is correct, the control signal 2ow should always start from premotor areas for programming movements, and attentional effects in the ventral areas should always occur later in time. If VAM is correct, the control signal 2ow should start in those areas that code the task attributes, and only later should attentional effects occur, simultaneously in other ventral areas responsible for conscious perception and in dorsal areas responsible for motor programming. A major drawback of the premotor theory of attention and, in part, also of VAM is that neither theory makes very speci1c assumptions about the attentional mechanisms. They can be considered more as frameworks on the relationship of motor programming and perception than as detailed theories specifying attentional processes at the mechanistic level. VAM is more speci1c than the premotor theory (e.g. by specifying parts of the control signal 2ow in the dorsal and ventral areas of the brain), but the theory is also ignorant of some important issues of conceptualizing visual attention processes. One of those issues is the question of how several task-dependent control signals (e.g. signals related to color and size in a task to ‘Search for the red and large square’) are combined in order to generate attentional effects—see, for example, Bundesen (1990, 1998) and Wolfe (1994) for theories that make speci1c assumptions on these central issues. Therefore, to obtain a theoretically more satisfying picture on the relationship between movement target selection on the one hand, and selective perceptual capabilities on the other hand, the frameworks should be combined with a mechanistically speci1c visual attention theory, such as Bundesen’s (1990, 1998) Theory of Visual Attention.
Acknowledgments We thank Silvia Bauer for running the subjects, as well as Bernhard Hommel and two anonymous reviewers for their constructive and helpful comments. The study was supported by the Deutsche Forschungsgemeinschaft (SFB 462 ‘Sensomotorik’, and Forschergruppe ‘Wahrnehmungsplastizität’, PR 118/19-1).
625
aapc30.fm Page 626 Wednesday, December 5, 2001 10:15 AM
626
Common mechanisms in perception and action
References Allport, D.A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer and A.F. Sanders (Eds.), Perspectives on perception and action, pp. 395–419. Hillsdale, NJ: Erlbaum. Allport, D.A. (1993). Attention and control. Have we been asking the wrong questions? A critical review of twenty-1ve years. In D.E. Meyer and S. Kornblum (Eds.), Attention and Performance XIV. Synergies in experimental psychology, arti1cial intelligence, and cognitive neuroscience, pp. 183–218. Cambridge, MA: MIT Press. Bullock, D. and Grossberg, S. (1988). Neural dynamics of planned arm movements: Emergent invariants and speed–accuracy properties during trajectory formation. Psychological Review, 95, 49–90. Bundesen, C. (1990). A theory of visual attention. Psychological Review, 97, 523–547. Bundesen, C. (1998). Visual selective attention: Outlines of a choice model, a race model and a computational theory. Visual Cognition, 5, 287–309. Castiello, U. (1996). Grasping a fruit: Selection for action. Journal of Experimental Psychology: Human Perception and Performance, 22, 582–603. Chelazzi, L. and Corbetta, M. (2000). Cortical mechanisms of visuospatial attention in the primate brain. In M.S. Gazzaniga (Ed.), The new cognitive neurosciences, pp. 667–686. Cambridge, MA: MIT Press. Craighero, L., Fadiga, L., Rizzolatti, G., and Umiltà, C. (1998). Visuomotor priming. Visual Cognition, 5, 109–125. Crane, H.D. and Steele, C.M. (1985). Generation-V dual-Purkinje-Image eyetracker. Applied Optics, 24, 527–537. Deubel, H. and Bridgeman, B. (1995). Fourth Purkinje image signals reveal eye lens deviations and retinal image distortions during saccades. Vision Research, 35, 529–538. Deubel, H. and Schneider, W.X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36, 1827–1837. Deubel, H., Schneider, W.X., and Paprotta, I. (1998). Selective dorsal and ventral processing: Evidence for a common attentional mechanism in reaching and perception. Visual Cognition, 5, 81–107. Duncan, J. (1996). Cooperating brain systems in selective perception and action. In T. Inui and J.L. McClelland (Eds.), Attention and Performance XVI: Information integration in perception and communication, pp. 549–578. Cambridge, MA: MIT Press. Hoffman, J.E. and Subramaniam, B. (1995). The role of visual attention in saccadic eye movements. Perception and Psychophysics, 57, 787–795. Jonides, J. (1981). Voluntary vs. automatic control over the mind’s eye’s movement. In J. Long and A. Baddeley (Eds.), Attention and performance IX. Hillsdale, NJ: Erlbaum. Kowler, E., Anderson, E., Dosher, B., and Blaser, E. (1995). The role of attention in the programming of saccades. Vision Research, 35, 1897–1916. LaBerge, D. and Brown, V. (1989). Theory of attentional operations in shape identi1cation. Psychological Review, 96, 101–124. Milner, A.D. and Goodale, M.A. (1995). The visual brain in action. New York: Oxford University Press. Mishkin, M., Ungerleider, L.G., and Macko, K.A. (1983). Object vision and spatial vision: Two cortical pathways. Trends in Neurosciences, 6, 414–417. Mokler, A., Deubel, H., and Fischer, B. (2000). Unintended saccades can be executed without presaccadic attention shift. Perception, 29 (Suppl.), 54. Müller, H.J. and Rabbitt, P.M. (1989). Re2exive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315–330. Nakayama, K. and Mackeben, M. (1989). Sustained and transient components of focal visual attention. Vision Research, 29, 1631–1647. Neumann, O. (1987). Beyond capacity: A functional view of attention. In H. Heuer and A.F. Sanders (Eds.), Perspectives on perception and action, pp. 361–394. Hillsdale, NJ: Erlbaum. Paprotta, I., Schneider, W.X., and Deubel, H. (in preparation). Visual attention mediates the coupling of perception and spatial motor programming. Pashler, H. (1997). The psychology of attention. Cambridge, MA: MIT Press. Pierrot-Deseilligny, C., Rivaud, S., Gaymard, B., Müri, R., and Vermersch, A.I. (1995). Cortical control of saccades. Annals of Neurology, 37, 557–567.
aapc30.fm Page 627 Wednesday, December 5, 2001 10:15 AM
A review of recent findings and new evidence from stimulus-driven saccade control
Posner, M.I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3–25. Rensink, R. (2000). Seeing, sensing, and scrutizing. Vision Research, 40, 1469–1487. Rizzolatti, G., Riggio, L., Dascola, I., and Umiltà, C. (1987). Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention. Neuropsychologia, 25, 31–40. Rizzolatti, G., Riggio, L., and Sheliga, B.M. (1994). Space and selective attention. In C. Umiltà and M. Moscovitch (Eds.), Attention and Performance XV. Conscious and nonconscious information processing, pp. 231–265. Cambridge, MA: MIT Press. Schneider, W.X. (1995). VAM: A neuro-cognitive model for visual attention control of segmentation, object recognition, and space-based motor action. Visual Cognition, 2, 331–375. Schneider, W.X. (1999). Visual–spatial working memory, attention, and scene representation: A neuro-cognitive theory. Psychological Research, 62, 220–236. Schneider, W.X. and Deubel, H. (1995). Visual attention and saccadic eye movements: Evidence for obligatory and selective spatial coupling. In J.M. Findlay, R. Walker, and R.W. Kentridge (Eds.), Eye movement research, pp. 317–324. Amsterdam: Elsevier. Shepherd, M., Findlay, J.M., and Hockey, R.J. (1986). The relationship between eye movements and spatial attention. Quarterly Journal of Experimental Psychology, 38A, 475–491. Simons, D.J. and Levin, D.T. (1997). Change blindness. Trends in Cognitive Sciences, 1, 261–267. Theeuwes, J. (1995). Temporal and spatial characteristics of preattentive and attentive processing. Visual Cognition, 2, 221–233. Theeuwes, J., Kramer, A.F., Hahn, S., and Irwin, D.E. (1998). Our eyes do not always go where we want them to go: Capture of the eyes by new objects. Psychological Science, 9, 379–385. Theeuwes, J., Kramer, A.F., Hahn, S., Irwin, D.E., and Zelinsky, G.J. (1999). In2uence of attentional capture on oculomotor control. Journal of Experimental Psychology: Human Perception and Performance, 25, 1595–1608. Tipper, S.P., Lortie, C., and Baylis, G.C. (1992). Selective reaching: Evidence for action-centered attention. Journal of Experimental Psychology: Human Perception and Performance, 18, 891–905. Tipper, S.P., Howard, L.A., and Houghton, G. (1998). Action-based mechanisms of attention. Philosophical Transactions of the Royal Society of London, B, 353, 1385–1383. Treisman, A. (1988). Features and objects: The fourteenth Bartlett memorial lecture. Quarterly Journal of Experimental Psychology, 40, 201–237. Wolfe, J.M. (1994). Guided search 2.0. A revised model of visual search. Psychonomic Bulletin and Review, 1, 202–238. Yantis, S. (1998). Control of visual attention. In H. Pashler (Ed.), Attention, pp. 223–256. Hove, UK: Psychology Press.
627
aapc31.fm Page 628 Wednesday, December 5, 2001 10:15 AM
31 Response features in the coordination of perception and action Gordon D. Logan and N. Jane Zbrodoff Abstract. Theories of congruency effects in Stroop, Simon, and S–R compatibility tasks often start with a node that represents a stimulus category and end with a node that represents a response category. We ask how a stimulus can activate a node and how a node can activate a response and suggest that congruency effects may stem from these prior and subsequent processes. Theories of word recognition propose several interacting stages between stimulus presentation and activation of a word node. Theories of speaking and typing propose several interacting stages between activation of a word node and the execution of a response. We focus on the response end, and show that the Stroop effect depends on individual features of the response ‘downstream’ from the response node. We show that the time for a vocal response depends on the speci1c phonemes that are activated by the Stroop distractor, whereas the time for a typewritten response depends on the speci1c 1nger movements that are evoked by the Stroop distractor. These results call for a theory of congruency effects that links current theories of word recognition with current theories of speaking and typing. Congruency effects may re2ect stimulus-driven interactions among response features.
31.1 Introduction If William James were alive today, he would probably say, ‘Everyone knows what the Stroop (1935) effect is.’ It is the interference observed in the time to name colors that is produced by distracting words that name incongruent colors (e.g. RED printed in green). It is de1ned relative to neutral (e.g. XXX in green) or congruent (e.g. GREEN in green) control conditions. It has been replicated hundreds of times for hundreds of reasons (for a review, see MacLeod 1991), but many important aspects of the effect remain unexplained. The purpose of this chapter is to add constraints to the explanation provided by dominant theories of the Stroop effect (Cohen, Dunbar, and McClelland 1990; Logan 1980; Logan and Zbrodoff 1979), suggesting ways in which the theories can be extended. We argue that the Stroop literature is relatively insular and could bene1t by incorporating ideas and theories from other parts of cognitive psychology. Speci1cally, we show that the Stroop effect depends on detailed features of the motor response, which indicates a locus of the effect that is much farther ‘downstream’ than the dominant theories suggest. We use theories of speaking (Dell 1986) and typing (Rumelhart and Norman 1982) to de1ne the parts further downstream and suggest ways to incorporate those theories into theories of the Stroop effect. We are not the 1rst to make these points. Indeed, several studies in the Stroop literature show evidence of effects downstream (Bakan and Alperson 1967; Besner, Stolz, and Boutilier 1997; Cutting and Ferreira 1999; Dalrymple-Alford 1972; Dennis and Newstead 1981; Guttentag and Haith 1978; Klein 1964; Logan and Zbrodoff 1998; Majeres 1974; McClain 1983; Melara and Mounts 1993; Pritchatt 1968; Proctor 1978; Redding and Gerjets 1977; Regan 1978; Singer, Lappin, and Moore 1975; Tannenhaus, Flanigan, and Seidenberg 1980; Virzi and Egeth 1985). We discuss these studies
aapc31.fm Page 629 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
in more detail later in the chapter. Several studies outside the Stroop literature show Stroop-like effects downstream as well. Greenwald (1970, 1971, 1972) manipulated the match between stimulus and response modalities and found strong effects in single- and dual-task conditions. Meyer and Gordon (1985; Gordon and Meyer 1987) found syllable-speci1c priming effects in speech production. Posnansky and Rayner (1977) and Rayner and Springer (1986) found Stroop-like interference between phonemes in a picture–word interference task. Indeed, theories of picture–word interference (e.g. Glaser and Glaser 1989) and speech production (e.g. Levelt et al. 1991) exploit effects upstream and downstream from the current Stroop theories to great advantage. However, the points made by these researchers have not been incorporated into the dominant theories of the Stroop effect, so it is worth repeating them and elaborating them in the context of the Stroop effect.
31.2 Reading, speaking, and theories of the Stroop task Two prominent theories of the Stroop task are depicted in Fig. 31.1. Panel A presents our model from 1979 and 1980 and Panel B presents the Cohen et al. (1990) model. In the 1gure and throughout the chapter, colors are represented as lower-case words like red and green, words are represented as uppercase words like RED and GREEN, and responses are represented as words in quotes like ‘red’ and ‘green’. The models in Fig. 31.1 are not the only models in the Stroop literature (see e.g. Phaf, Van der Heijden, and Hudson 1990; Sugg and McDonald 1994; Virzi and Egeth 1985; Zhang, Zhang, and Kornblum 1999) but they provide the most complete account of the data (MacLeod 1991). More importantly, they share a critical assumption with most of the other models of Stroop, Simon, and compatibility effects: stimulus and response categories can be represented as single nodes in a localist network. The input to each of the models is the activation of a node representing the attributes of the current stimulus. In the single-trial version of the Stroop task, each stimulus activates two input nodes, one for the word and one for the color. The output from each of the models is a response node that is activated above some threshold. On congruent trials (e.g. RED in red), the two attributes activate the same response node and that speeds reaction time (RT) relative to neutral control conditions. On incongruent trials (e.g. GREEN in red), the two attributes activate different response nodes and that creates competition between the response nodes that increases the time it takes for the winning node to reach the threshold.
31.2.1 How can a node read? The Stroop models presented in Fig. 31.1 beg an important question: How does a stimulus turn on a node? Many careers have been built on studying the processes by which colors are categorized and words are read. The processes by which the stimulus turns on nodes that represent categorizations of its attributes are multiple and complex. Considerable progress was made in the last century, resulting in a series of models of increasing complexity that account for an increasingly broad range of empirical phenomena (see, e.g. Coltheart, Curtis, Atkins, and Haller 1993; Harm and Seidenberg 1999; McClelland and Rumelhart 1981). We present some examples of word-reading models in Fig. 31.2 to illustrate the complexity of the processes ‘upstream’ from the input nodes in the Stroop models in Fig. 31.1. They show that reading generates outputs at several levels that may interact with the process of naming a color. For example, phonological or semantic codes or both may cause
629
aapc31.fm Page 630 Wednesday, December 5, 2001 10:15 AM
630
Common mechanisms in perception and action
Fig. 31.1 The currently dominant models of the Stroop effect according to MacLeod (1991). Panel A presents the information accumulation model of Logan and Zbrodoff (1979) and Logan (1980). The circles represent nodes and the lines represent connections between them. Solid lines represent long-term automatic connections. Broken lines represent short-term attentional connections. The thickness of the line represents the strength of the connection. Automatic connections are stronger between words and responses than between colors and responses, representing the greater automaticity of word reading. Attentional connections are stronger between colors and responses than between words and responses to allow the system to respond to color. Attentional connections between words and responses are reversed so the system is set to expect incongruent trials. Panel B presents the parallel-distributed-processing model of Cohen, Dunbar, and McClelland (1990). The circles represent nodes and the lines represent connections between them. Connection strength is not represented in this picture of the model but varies depending on the model’s experience in the simulations. The Word and Color nodes in the bottom right are ‘task demand units’ that set the system to report words or colors, depending on which node is activated.
aapc31.fm Page 631 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
Fig. 31.2 Two models of word recognition. Panel A presents the interactive activation model of McClelland and Rumelhart (1981). The ovals represent sets of nodes of particular types. Nodes within ovals are mutually inhibitory. The lines represent connections between nodes of different types. The number of lines in this picture does not represent the number of connections in the simulation model. Arrowheads represent the direction in which activation flows. Solid lines represent bottom-up activation. Broken lines represent top-down activation. Panel B presents the dual route model of Coltheart et al. (1993). The ovals represent collections of nodes and the lines represent connections between them. The arrowheads and dots represent the direction along which activation flows. Arrowheads represent excitatory connections and dots represent inhibitory connections. The input consists of visual features and the output is a phonetic description of the word, ready to be input to a speaking model. The branch on the right side represents the phonological route, in which pronunications are derived by applying grapheme-to-phoneme correspondence rules. The branch on the left side represents the visual route, in which pronunications are derived through a sequence of steps that begins with recognizing words.
631
aapc31.fm Page 632 Wednesday, December 5, 2001 10:15 AM
632
Common mechanisms in perception and action
Fig. 31.3 Models of speaking and typing. Panel A presents the spreading activation retrieval model of speaking by Dell (1986). The left side shows activation 2ow through three main stages. Syntax orders the words, Morphology chooses the word forms, and Phonology translates the words into vocal gestures. The right side shows a more detailed model of the activation 2ow from morphology to phonology—from word to utterance. Panel B presents the schema activation model of typing by Rumelhart and Norman (1982). The left side shows activation 2ow through three main stages. Word takes input from ideational or perceptual processes, Letter translates words into constituent letters, and Movement translates letters into movements of the hands and 1ngers. The right side shows the word ‘red’ cascading through the model, 1rst broken into a sequence of letters, then into a sequence of movements. interference. Thus, some of the Stroop effect may be produced by processes prior to those depicted in the Stroop models in Fig. 31.1. The theories may have to be extended to account for them.
31.2.2 How can a node speak? The Stroop models in Fig. 31.1 beg another important question: how can an active response node cause the series of events that unfolds when people speak or type a color name? Figure 31.3 contains a prominent model of speech production by Dell (1986) and a prominent model of typewriting by Rumelhart and Norman (1982). The input to each model is a word, which may be an activated response node in a Stroop model. The words are transformed into constituents (syllables in speaking;
aapc31.fm Page 633 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
letters in typing) and those constituents are transformed into smaller constituents—vocal gestures in speaking and 1nger movements in typing. Moreover, each model must solve the emergent problem of serially ordering the constituents at each stage of processing. The models in Fig. 31.3 illustrate the complexity of the processes downstream from the output nodes in the Stroop models (also see Levelt et al. 1991). As with the upstream processes, the downstream processes present several loci at which word reading may interfere with color naming. With vocal responses, there may be competition between words, between syllables, between phonemes, or between some combination of them. With typewritten responses, there may be competition between words, letters, and movements. Each of these loci could produce interference that contributes to the Stroop effect. As with reading, these downstream processes are outside the dominant Stroop models depicted in Fig. 31.1. Should the effects exist, the Stroop models would have to be extended to account for them.
31.2.3 What happens when nodes read aloud? It would be very interesting to put the models of reading in Fig. 31.2 together with the models of speaking and typing in Fig. 31.3 (for a similar suggestion, see Coltheart et al. 1993). The resulting model would provide a complete account of reading aloud, from print to sound. It would be interesting as well to apply the resulting model to the Stroop effect. It may turn out that the combined model would account for the Stroop effect without having to add the processes that intervene between stimulus nodes and response nodes in the Stroop models. Alternatively, the combined model may need features of the Stroop models to account for the Stroop effect. These intriguing questions await future research. The goal of the research reported in this chapter was to take a step toward this ultimate goal by demonstrating the contribution of response features downstream from the Stroop models to the Stroop effect. If we can show that response features modulate the Stroop effect, we will have shown that the dominant Stroop models are insuf1cient and must be extended to provide a more detailed account of the downstream processes.
31.3 Response features and the Stroop task 31.3.1 Response type effects One piece of evidence for a downstream locus of the Stroop effect comes from studies of response type effects. The dominant Stroop theories represent responses as abstract categories. Nothing in the nodes distinguishes between vocal and manual responses, for example. Several researchers have shown that the magnitude of the Stroop effect depends on the nature of the response. Stroop effects are smaller with manual (arbitrary keypress) responses than with the standard vocal responses (e.g. Logan and Zbrodoff 1998; Majeres 1974; McClain 1983; Melara and Mounts 1993; Pritchatt 1968; Redding and Gergets 1977; Virzi and Egeth 1985). This is not a response modality effect (cf. Virzi and Egeth 1985), because skilled manual responses (typewriting) can produce larger Stroop effects than vocal responses (Logan and Zbrodoff 1998). Instead, response type effects seem to re2ect the match or compatibility of the stimulus categories and the response categories: words map more naturally onto spoken and typewritten responses than onto arbitrary keypresses.
633
aapc31.fm Page 634 Wednesday, December 5, 2001 10:15 AM
634
Common mechanisms in perception and action
Response type interacts strongly with the judgment required for the Stroop task (Majeres 1974; Virzi and Egeth 1985). One of our unpublished studies illustrates the interaction. We showed subjects the words ABOVE, BELOW, LEFT, and RIGHT, above, below, left of, and right of the 1xation point, and had them report the word or the location. We used all possible combinations of words and locations, and three different response types: vocal responses, arbitrary keypresses, and compatibly mapped responses on the numeric keypad (i.e. 8 for above, 2 for below, 4 for left, and 2 for right). When subjects reported the word, the congruency effects were 7, 31, and 109 ms for vocal, arbitrary keypress, and keypad responses, respectively. When subjects reported the location, the pattern of effects was reversed. The congruency effects were 61, 43, and 29 ms for vocal, arbitrary keypress, and keypad responses, respectively. These effects show that the nature of the response makes a difference in the Stroop task. The data are consistent with the idea that the effect occurs downstream from the dominant Stroop theories, but they are also consistent with other ideas. It may be possible to account for the response type effects in terms of strength of connections between stimulus nodes and response nodes in the dominant theories (Cohen et al. 1990; Logan 1980; Logan and Zbrodoff 1979).
31.3.2 Response similarity effects Another piece of evidence that suggests a downstream locus of the Stroop effect comes from studies that manipulate the similarity between distractors and the responses required for the task. Several investigators showed that the magnitude of the Stroop effect depends on the similarity of the distractors to words in the set of response categories. Distractors that are outside of the response set produce smaller Stroop effects. Words that name colors produce less interference if they are not in the response set. For example, if subjects see red and green words but not blue and yellow ones, and the task is to say what colour the word is, then ‘red’ and ‘green’ are in the response set and ‘blue’ and ‘yellow’ are not. RED in green will produce more interference than BLUE in green (Klein 1964; Proctor 1978). One of our unpublished experiments illustrates response set effects in a different way. We presented words in four colors (red, blue, green, and yellow) in four locations (above, below, left of, or right of 1xation) and had subjects report either the color or the location. There were eight distractor words: four color names (RED, BLUE, GREEN, and YELLOW) and four location names (ABOVE, BELOW, LEFT, and RIGHT). All combinations of colors, locations, and words occurred equally often. This design provided two kinds of congruency effects: color congruency between the color and the color words and location congruency between the location and the location words. Location words were neutral with respect to color naming; they were outside the response set. Color words were neutral with respect to location naming, outside the response set. The results showed strong congruency effects that depended on subjects’ response set: when subjects reported color, there was a strong color congruency effect (132 ms) and virtually no location congruency effect (−5 ms). When subjects reported location, the pattern reversed. The location congruency effect was strong (54 ms) and the color congruency effect was weak (−8 ms). Stimulus conditions were the same in the two report conditions. All that changed was the nature of the response, and that determined the pattern of the congruency effects. Put differently, these results show that congruency between stimulus properties is not in itself suf1cient to produce a Stroop effect. The congruency between stimulus properties and response properties seems necessary.1 Stimuli outside the response set produce smaller Stroop effects, but the magnitude of the effects they produce depends on similarity to the words in the response set. Words semantically associated
aapc31.fm Page 635 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
with color (e.g. SKY, GRASS) produce less interference than color names but more interference than words that are not associated with color (Dalrymple-Alford 1972; Klein 1964; Majeres 1974; Regan 1978). Letters from color words produce a smaller Stroop effect than color words (Redding and Gerjets 1977; Regan 1978; Singer et al. 1975; also see Besner et al. 1997). These effects could be due to processes downstream from the dominant Stroop theories. They could re2ect the activation of nodes after the words (e.g. syllables, phonemes, and features in speaking; letters and movements in typing; see Fig. 31.3). However, the effects could also be due to processes that are already part of the dominant Stroop theories. Color words outside the response set may activate responses that do not compete with correct responses. Semantically related words may partially activate the color words in the response set. Some researchers manipulated the phonological similarity of distractors. Pronounceable nonwords produce a stronger Stroop effect than unpronounceable controls (Bakan and Alperson 1967; Guttentag and Haith 1978). Distractors that sound like color words produce a stronger Stroop effect as well (Dennis and Newstead 1981; also see Cutting and Ferreira 1999; Tannenhaus et al. 1980). These effects are more plausible candidates for a locus downstream from the dominant Stroop theories because the similarity is between the constituents of words rather than words themselves. Still, an advocate of a dominant theory could argue that the effects re2ect partial activation at wordlevel nodes rather than interference in downstream nodes (syllables, phonemes, or features in speech; letters or movements in typing).
31.4 The present experiments The present experiments were designed to 1nd more direct evidence for Stroop effects downstream from the dominant theories. We manipulated distractor similarity by varying the type of distractor, presenting words that name colors on some trials (e.g. RED, GREEN) and repetitions of the 1rst letters of words that name colors on others (e.g. RRR, GGGGG). To show that this manipulation affected processes downstream, we manipulated response type and response set. In Experiment 1, subjects responded vocally. Half of the subjects responded by saying the color name (e.g. ‘red’ for red) and half responded by saying the 1rst letter of the color name (e.g. ‘r’ for red). In Experiment 2, subjects responded by typing the color name. Majeres (1974) performed similar experiments using the original list format Stroop task (presenting 40 colored stimuli and measuring the time to name all of them). We used the single-trial version of the task. We expected that word distractors would activate color name responses and letter distractors would activate letter name responses. Color names and letter names should interact with response set (saying words or saying letters), producing stronger congruency effects when they correspond than when they do not. Color names and letter names should interact differently with spoken and typed responses because they differ in different ways at the level of constituents (syllables, phonemes and features in speech; letters and movements in typing). Across experiments, we expected an interaction between distractor type, response set, response type, and congruity such that the magnitude of the Stroop effect depends on the downstream properties of the response evoked by the distractor. The Stroop effect should be larger with word distractors than with letter distractors with color words as responses (Experiment 1), but not with single letters (Experiment 1) or typewritten words (Experiment 2) as responses. Our argument rests on this higher order interaction. Lower order interactions may be ambiguous, interpretable in terms of the dominant theories as well as interpretable as downstream effects. The higher order interaction
635
aapc31.fm Page 636 Wednesday, December 5, 2001 10:15 AM
636
Common mechanisms in perception and action
is unambiguous. If it comes out as predicted, it may resolve some of the ambiguity in the lower order interactions.
31.5 Experiment 1: word and letter distractors with vocal responses The 1rst experiment looked for an interaction between distractor type, response set, and congruity with vocal responses. We expected the Stroop effect to be smaller with letter distractors than with words when subjects named the whole word because the letters are less similar to the required words phonetically. The difference in similarity can be seen in the phonetic transcriptions for color names and 1rst letters of color names that appear in Table 31.1. The letters are not pronounced like the words and so should produce less interference at the phonetic and feature levels. The difference in interference should reverse or at least be less prominent when subjects respond by saying the 1rst letter of the color name. Letter distractors should activate responses relevant to the task at the phonetic and feature levels and so should produce a stronger Stroop effect. Word distractors should activate irrelevant word responses and so should produce a weaker Stroop effect. However, subjects may perform the task by 1rst retrieving the color name and then retrieving the 1rst letter from the color name. In that case, word distractors may interfere with the 1rst step and letter distractors may interfere with the second step. Word and letter distractors may produce the same amount of interference. Regardless of this, the pattern should be different from that observed when subjects respond by saying the whole color word; there should be a signi1cant interaction between distractor type, response set, and congruency.
31.5.1.1 Method Subjects. The subjects were 32 undergraduate students. Some served for credit in an Introductory Psychology course. Others served for $6.00 US. Apparatus and stimuli. The stimuli were the words RED, GREEN, BLUE, and YELLOW and the letter strings RRR, GGGGG, BBBB, and YYYYYY. They were presented in red, green, blue, and yellow (IBM colors 12, 10, 9, and 14, respectively) on a black background (IBM color 0) on Gateway2000 monitors controlled by Gateway2000 486 computers. The stimuli were presented in the center of the screen. Viewed at a distance of 60 cm, they subtended 0.48 deg of visual angle vertically. The horizontal visual angles were 0.95 deg for RED and RRR, 1.24 deg for BLUE and BBBB, 1.53 deg for GREEN and GGGGG, and 1.81 deg for YELLOW and YYYYYY. Each trial involved a series of three displays. The 1rst was a 1xation display containing a + sign centered in the screen (row 12, column 40 in the standard 24 row × 80 column IBM text screen) that
Table 31.1 Phonetic transcriptions of color names and 1rst letters of color names Color word
Phonetic transcription
Letter
Phonetic transcription
‘red’ ‘blue’ ‘green’ ‘yellow’
[rεd] [blue] [grin] [yεl ow]
‘r’ ‘b’ ‘g’ ‘y’
[⊃r] [bi] [ji] [way]
aapc31.fm Page 637 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
was exposed for 500 ms. The 1xation display was extinguished and replaced immediately by the imperative stimulus for that trial (a colored word or a colored letter string), which was left-justi1ed two spaces to the left of the 1xation point (i.e. it began on row 12, column 38 of the text screen). The imperative stimulus remained exposed until the subject responded, whereupon the screen went blank. The experimenter, who was present in the room, then typed in the subject’s response. When the experimenter’s response was registered, a 1000 ms intertrial interval began during which the screen remained blank. Procedure. The basic design of the experiment involved 32 trials formed by the factorial combination of two distractor types (word or letter), four distractors (the four color words, RED, BLUE, GREEN, and YELLOW, or the four strings of 1rst letters, RRR, BBBB, GGGGG, and YYYYYY), and four colors (red, blue, green, and yellow). The basic design was replicated 16 times for a total of 512 trials. The different combinations of conditions were presented in a different random order for each subject. Subjects were allowed short breaks every 64 trials. Subjects were told to respond vocally to the color and ignore the distractor. They were told to respond as quickly as possible without making errors. Half of the subjects were told to say the whole color name (e.g. ‘red’ for red) and half were told to say just the 1rst letter of the color name (e.g. ‘r’ for red). Subjects were tested individually. The experimenter was present throughout the session. She typed the subject’s response into the computer so we could check accuracy.
31.5.1.2 Results We analyzed the RT and accuracy data in 2 (distractor type: word vs. letter) × 2 (response set: word vs. 1rst letter) × 2 (congruency) analyses of variance (ANOVAs). The mean RTs and accuracy scores for each cell of the design are presented in Table 31.2. The mean RTs for the critical interaction between distractor type, response set, and congruency are plotted in Fig. 31.4. The RTs are converted to congruency scores (incongruent RT—congruent RT) to illustrate the interaction more clearly. RT was 22 ms faster with letter distractors than with word distractors, F(1, 30) = 33.43, p < 0.01, MSE = 457.20, 106 ms faster with words responses than with 1rst-letter responses, F(1, 30) = 9.81, p < 0.01, MSE = 37029.96, and 78 ms faster with congruent distractors than with incongruent distractors, F(1, 30) = 77.04, p < 0.01, MSE = 2468.63. These effects were quali1ed by several interactions. The most important was the highest order interaction between distractor type, response set, and congruency, F(1, 30) = 4.43, p < 0.05, MSE = 697.49. It is plotted in Fig. 31.4.
Table 31.2 Mean reaction times in ms and accuracy scores (percent correct, in parentheses) for vocal responses as a function of distractor type (words vs. repetitions of 1rst letter), response set (say color name vs. 1rst letter of color name), and congruency in Experiment 1 Distractor
Incongruent Congruent Stroop effect
Say word
Say 1rst letter
Word
First letter
Word
First letter
718 (95) 612 (98) 106 (3)
663 (97) 605 (99) 58 (2)
801 (95) 724 (96) 77 (1)
724 (94) 660 (97) 64 (3)
637
aapc31.fm Page 638 Wednesday, December 5, 2001 10:15 AM
638
Common mechanisms in perception and action
Fig. 31.4 Congruency effects as a function of distractor type and response set in Experiment 1, in which responses were vocal. The distractor type × response set × congruency interaction indicates that the congruency effect was modulated by distractor type with whole word responses but not with 1rst-letter responses. The modulation of the congruency effect with whole word responses was predicted by our analysis: whole word distractors should activate morphological, phonological, and featural levels in common with the response set and so should produce a large Stroop effect. Repeated letter distractors should not activate word responses at any level, though they may cause partial activation. The Stroop effect should therefore be smaller, as we observed. A contrast comparing the difference in congruency effects was highly signi1cant, F(1, 30) = 26.43, p < 0.01, MSE = 697.49. The constancy of the congruency effect with 1rst letter responses was also predicted. Subjects may have generated color names to determine which letter to say. RT was 106 ms longer to say letters than to say words, consistent with the idea that subjects generated color names. Word distractors may interfere with generation of color names while letter distractors interfere with generation of letter names, resulting in equivalent congruency effects. A contrast comparing the difference in congruency effects was not signi1cant, F(1, 30) < 1.0. In the RT analysis, there were signi1cant interactions between distractor type and response set, F(1, 30) = 6.11, p < 0.05, MSE = 457.20, and distractor type and congruency, F(1, 30) = 8.93, p < 0.01, MSE = 697.49. The meaning of these interactions is quali1ed by the higher order interaction between distractor type, response set, and congruency. The accuracy data corroborated the RT data. Accuracy was lower when RT was longer, suggesting no trade-off between speed and accuracy. The accuracy ANOVA revealed signi1cant main effects of distractor type, F(1, 30) = 5.53, p < 0.05, MSE = 2.65, response set, F(1, 30) = 6.15, p < 0.05, MSE = 14.47, and congruency, F(1, 30) = 17.03, p < 0.01, MSE = 7.95, and a signi1cant three-way interaction between distractor type, response set, and congruency, F(1, 30) = 6.70, p < 0.05, MSE = 3.19. The interaction between distractor type and response set approached signi1cance, F(1, 30) = 4.00, p < 0.06, MSE = 2.65.
31.5.1.3 Discussion The experiment showed a Stroop effect that was modulated by distractor type and response type. When subjects responded by saying the whole word, the Stroop effect was almost twice as strong
aapc31.fm Page 639 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
with word distractors as with letter distractors. When subjects responded by saying the 1rst letter of the word, the Stroop effect was about the same for word and letter distractors. These results are consistent with the idea that the Stroop effect occurs downstream from the dominant theories, in phonological and featural levels of processing. Word distractors activate the phonemes and features of words, whereas letter distractors activate the phonemes and features of letters. The activated phonemes and features have an impact on performance that depends on their similarity to the set of intended responses. Word distractors are more similar to word responses than letter distractors are (see Table 31.1) and so produce a stronger Stroop effect with word responses, just as we observed. Letter distractors are more similar to letter responses than word distractors are, and so tend to produce a stronger Stroop effect with letter responses. Subjects appeared to generate letter responses by 1rst generating the words that name the colors, and word distractors may interfere more with that process more than letter distractors do. Putting the two effects together results in equivalent Stroop effects for word and letter distractors, as we observed with letter responses. These results are consistent with a downstream locus for the Stroop effect but they are also consistent with a locus within the dominant Stroop theories. Words may interfere more with word responses because they activate word responses more than letters do. Letters may interfere more with letter responses because they activate letter responses more than words do. The additional step of retrieving the color name before getting the 1rst letter may suffer more interference at the word level. Downstream constituents need not be invoked to explain our results. Consequently, we ran a second experiment to obtain converging evidence on our conclusions.
31.6 Experiment 2: word and letter distractors with typewritten responses The second experiment was a replication of the whole-word response condition of the 1rst experiment with typewritten responses instead of vocal responses. Subjects saw colored color words and colored repetitions of the 1rst letter of color words and had to name the color by typing the whole color word. We expected the pattern of results to be different from the one we observed with whole-word vocal responses because words and letters relate to each other differently in typing and speaking. Letters are the constituents of typed words but they are not the constituents of spoken words: syllables and phonemes are instead. With typewritten responses, word distractors and letter distractors activate the same initial movement: ‘r’ is typed the same way whether it is the 1rst letter of the word RED or the single letter R. By contrast, with vocal responses, word distractors and letter distractors activate very different initial movements (see Table 31.1). Consequently, we expected the Stroop effect to be the same magnitude for word and letter distractors when subjects typed whole words. This prediction is based on the idea that the Stroop effect occurs at the letter level or the movement level. At the level of responses (word vs. letter responses), words and letters do not resemble each other. If the Stroop effect depended on responses rather than the constituents of the responses, the pattern of results observed in Experiment 1 should replicate here. Words should interfere more than letters. The crucial prediction is a null interaction (the magnitude of the Stroop effect should be the same with word and letter distractors) and that is weaker than predicting a signi1cant effect. We strengthen the prediction by comparing RTs in the word response condition in Experiment 2 with RTs in the word response condition in Experiment 1 in a 2 (Experiments: 1 vs. 2) × 2 (distractor type: word vs.
639
aapc31.fm Page 640 Wednesday, December 5, 2001 10:15 AM
640
Common mechanisms in perception and action
letter) × 2 (congruency) ANOVA. The crucial prediction is an interaction between experiments, distractor type, and congruency of the following form: the Stroop effect should be larger with word distractors than with letter distractors when subjects speak the color name (Experiment 1) than when they type the color name (Experiment 2).
31.6.1.1 Method Subjects. The subjects were 16 graduate and undergraduate students who were selected for their ability to touch type. They were paid $6 for participating. The average speed on Logan and Zbrodoff’s (1998) typing test was 48.7 words per minute with a range of 38.5 to 81.1 words per minute. The average accuracy on the typing test was 89.5% with a range of 81% to 97%. Apparatus and stimuli. These were the same as in the previous experiment, except that subjects registered their responses by typing on the computer keyboard rather than speaking. The typing test involved typing one of four paragraphs from the appendix of Logan and Zbrodoff (1998, p. 992), which were adapted from the book Border collies by Collier (1995). The paragraphs ranged in length from 111 to 117 words. The text was displayed on the computer’s monitor. Subjects read through the text once without typing it to familiarize themselves with the text before they typed it. During typing, the text remained on the screen, but the characters that were typed were not echoed on the screen. Some subjects would have preferred to see what they typed, so their typing speeds on our test may underestimate their true ability. Procedure. The procedure was the same as in the previous experiment except that subjects responded by typing rather than speaking and all of the subjects typed the word representing the color name. 31.6.1.2 Results We analyzed the RT and accuracy data in 2 (distractor type: word vs. letter) × 2 (congruency) ANOVAs. The mean RTs and accuracy scores are presented in Table 31.3. RT was 45 ms faster with letter distractors than with word distractors, F(1, 15) = 39.81, p < 0.01, MSE = 801.59, and 95 ms faster with congruent distractors than with incongruent distractors, F(1, 15) = 68.67, p < 0.01, MSE = 2073.60. The interaction between distractor type and congruency was not signi1cant, F(1, 15) < 1.0. Word distractors produced a 99-ms congruency effect while letter distractors produced a 90-ms congruency effect. The null interaction between distractor type and congruency is consistent with our predictions. To strengthen the result, we compared the present RT data with RT data from the whole word response condition of Experiment 1 in a 2 (Experiments) × 2 (distractor type) × 2 (congruency) ANOVA. The Table 31.3 Mean reaction times in ms and accuracy scores (percent correct, in parentheses) for typewritten responses as a function of distractor type (words vs. repetitions of 1rst letter) and congruency in Experiment 2 Distractor
Word
First letter
Incongruent Congruent Stroop effect
867 (93) 768 (96) 99 (3)
818 (94) 728 (95) 90 (1)
aapc31.fm Page 641 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
Fig. 31.5 Congruency effects as a function of distractor type and response type, comparing vocal and typed responses. Data are from the word-response conditions of Experiments 1 and 2. crucial interaction between experiments, distractor type, and congruency is plotted in Fig. 31.5 in terms of congruency scores. The interaction approached signi1cance, F(1, 30) = 3.17, p < 0.09, MSE = 971.90. Planned comparisons based on the error term from this three-way interaction showed that the two-way interaction between distractor type and congruency was signi1cant for vocal responses, F(1, 30) = 18.96, p < 0.01, but not for typed responses, F(1, 30) < 1.0. We analyzed the accuracy data in a 2 (distractor type) × 2 (congruency) ANOVA and found a signi1cant main effect of congruency, F(1, 15) = 8.39, p < 0.05, MSE = 3.30. No other effects were signi1cant.
31.6.1.3 Discussion This experiment, with typewritten responses, showed no effect of distractor type on the magnitude of the Stroop effect. The Stroop effect was the same size whether the distractors were words or letters. We argue that letter distractors were as effective as word distractors because they activated the same initial response; the letter ‘r’ is typed the same way whether it is a single letter or the 1rst letter in a word. The results contrast markedly with results with vocal responses from the wholeword response condition of Experiment 1, in which the Stroop effect was markedly smaller with letter distractors than with words. The 95-ms Stroop effect observed here is smaller than the Stroop effects we reported in an earlier study of typewritten responses, which averaged 231 ms across four experiments (Logan and Zbrodoff, 1998). The difference is due to the composition of the trials. The present experiment consisted of 25% congruent trials, 75% incongruent trials, and 0% neutral trials, whereas the 1rst three of our previous experiments consisted of 33% congruent, 33% incongruent, and 33% neutral trials. The Stroop effect is smaller when the proportion of congruent trials is smaller (Logan 1980; Logan and Zbrodoff 1979) and smaller when the proportion of neutral trials is smaller (Tzelgov, Henik, and Leiser 1990; Tzelgov, Henik, and Berger 1992), so it should be smaller in our present experiment than in our previous ones. In our fourth previous experiment, we manipulated the proportion of congruent and incongruent trials with no neutral trials. When 20% of the trials were congruent and 80% were incongruent, the Stroop effect was 123 ms, which is similar in magnitude to the present 95 ms effect.
641
aapc31.fm Page 642 Wednesday, December 5, 2001 10:15 AM
642
Common mechanisms in perception and action
31.7 General discussion The two experiments converge on the conclusion that at least part of the Stroop effect is due to processes downstream from the dominant Stroop theories. The magnitude of the Stroop effect depends on the similarity between the responses evoked by the distractors and the responses required for the task. Word distractors evoke vocal responses that are very different phonetically and featurally from the vocal responses that letter distractors evoke, and so cause a stronger Stroop effect when the required response is a spoken word. Word distractors evoke typing responses that are similar to those evoked by letter distractors. Typed words consist of typed letters, and a letter is the same at the letter and movement level whether it is part of a word or a single object. Consequently, word and letter distractors cause Stroop effects of the same magnitude. Our conclusion that the distractor type effect occurs downstream from the dominant Stroop theories depends on the contrast between Experiment 1 and Experiment 2. By themselves, the results of Experiment 1 can be explained by the dominant theories: word distractors are more likely to activate word responses than letter distractors, and so produce a stronger Stroop effect. However, the dominant theories cannot explain why typed words behave differently from spoken words. Downstream processes must be evoked to explain the difference. Our conclusion suggests that Stroop theories should be broadened to include downstream processes as well as those that are part of the dominant theories. Stroop theorists have already broadened their theories to cover other phenomena in the attention literature. Logan (1980) addressed semantic priming as well as Stroop effects. Phaf et al. (1990) addressed selective attention as well as Stroop effects. Kornblum and colleagues addressed Stroop, Simon, and compatibility effects in a comprehensive framework and theory (Kornblum, Hasbroucq, and Osman 1990; Zhang et al. 1999). Our results suggest broadening in another direction, outside the attention literature, toward language production and motor control on the downstream end and reading and categorizing colors on the upstream end. Broadening in those directions would connect the Stroop and attention literatures to the larger literature on cognitive psychology and form the beginnings of a general theory of cognition (also see Logan 1995; Logan and Zbrodoff 1999).
Acknowledgments This research was supported by Grant No. SBR 9808971 from the National Science Foundation. We are grateful to Julie Delheimer for testing the subjects and analyzing the data in these experiments and others in the series. We are grateful to Gary Dell for providing the phonetic transcriptions. We are grateful to Bernhard Hommel, Werner X. Schneider, Joseph Tzelgov, and an anonymous reviewer for helpful comments on the manuscript.
Note 1. Independently and jointly, Sylvan Kornblum and Gregory Stevens pointed out that the dimensional overlap model predicts Stroop-like interference from stimulus–stimulus congruence only when one or both of the stimulus properties is relevant to the response. Thus, the results of our unpublished experiment do not challenge the dimensional overlap model.
aapc31.fm Page 643 Wednesday, December 5, 2001 10:15 AM
Response features in the coordination of perception and action
References Bakan, P. and Alperson, B. (1967). Pronounceability, attensity and interference in the color word test. American Journal of Psychology, 80, 416–420. Besner, D., Stolz, J.A., and Boutilier, C. (1997). The Stroop effect and the myth of automaticity. Psychonomic Bulletin and Review, 4, 221–225. Cohen, J.D., Dunbar, K., and McClelland, J.L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332–361. Coltheart, M., Curtis, B., Atkins, P., and Haller, M. (1993). Models of reading aloud: Dual route and paralleldistributed-processing approaches. Psychological Review, 100, 589–608. Cutting, J.C. and Ferreira, V.S. (1999). Semantic and phonological information 2ow in the production lexicon. Journal of Experimental Psychology: Learning, Memory and Cognition, 25, 318–344. Dalrymple-Alford, E.C. (1972). Associative facilitation and interference in the Stroop color-word task. Perception and Psychophysics, 11, 274–276. Dell, G.S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93, 283–321. Dennis, I. and Newstead, S.E. (1981). Is phonological recoding under strategic control? Memory and Cognition, 9, 472–477. Glaser, W.R. and Glaser, M.O. (1989). Context effects in Stroop-like word and picture processing. Journal of Experimental Psychology: General, 118, 13–42. Gordon, P.C. and Meyer, D.E. (1987). Control of serial order in rapidly spoken syllable sequences. Journal of Memory and Language, 26, 300–321. Greenwald, A.G. (1970). A double stimulation test of ideomotor theory with implications for selective attention. Journal of Experimental Psychology, 84, 392–398. Greenwald, A.G. (1971). A choice reaction time test of ideomotor theory. Journal of Experimental Psychology, 86, 20–25. Greenwald, A.G. (1972). On doing two things at once: Time sharing as a function of ideomotor compatibility. Journal of Experimental Psychology, 94, 52–57. Guttentag, R.E. and Haith, M.M. (1978). Automatic processing as a function of age and reading ability. Child Development, 49, 707–716. Harm, M.W. and Seidenberg, M.S. (1999). Phonology, reading acquisition, and dyslexia: Insights from connectionist models. Psychological Review, 106, 491–528. Klein, G.S. (1964). Semantic power measured through the interference of words with color-naming. American Journal of Psychology, 77, 576–588. Kornblum, S., Hasbroucq, T., and Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus– response compatibility—A model and taxonomy. Psychological Review, 97, 253–270. Levelt, W.J.M., Schriefers, H., Vorberg, D., Meyer, A.S., Pechmann, T., and Havings, J. (1991). The time course of lexical access in speech production: A study of picture naming. Psychological Review, 98, 122–142. Logan, G.D. (1980). Attention and automaticity in Stroop and priming tasks: Theory and data. Cognitive Psychology, 12, 523–553. Logan, G.D. (1995). Linguistic and conceptual control of visual spatial attention. Cognitive Psychology, 28, 103–174. Logan, G.D. and Zbrodoff, N.J. (1979). When it helps to be misled: Facilitative effects of increasing the frequency of con2icting stimuli in a Stroop-like task. Memory and Cognition, 7, 166–174. Logan, G.D. and Zbrodoff, N.J. (1998). Stroop-type interference: Congruity effects in color naming with typewritten responses. Journal of Experimental Psychology: Human Perception and Performance, 24, 978–992. Logan, G.D. and Zbrodoff, N.J. (1999). Selection for cognition: Cognitive constraints on visual spatial attention. Visual Cognition, 6, 55–81. MacLeod, C.M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163–203. Majeres, E. (1974). The combined effects of stimulus and response conditions on the delay in identifying the print color of words. Journal of Experimental Psychology, 102, 868–874. McClain, L. (1983). Effects of response type and set size on Stroop color-word performance. Perceptual and Motor Skills, 56, 735–743.
643
aapc31.fm Page 644 Wednesday, December 5, 2001 10:15 AM
644
Common mechanisms in perception and action
McClelland, J.L. and Rumelhart, D.E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic 1ndings. Psychological Review, 88, 375–407. Melara, R.D. and Mounts, J.R.W. (1993). Selective attention to Stroop dimensions: Effects of baseline discriminability, response mode, and practice. Memory and Cognition, 21, 627–645. Meyer, D.E. and Gordon, P.C. (1985). Speech production: Motor programming of phonetic features. Journal of Memory and Language, 24, 3–26. Phaf, R.H., Van der Heijden, A.H.C., and Hudson, P.T.W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22, 273–341. Posnansky, C.J. and Rayner, K. (1977). Visual-feature and response components in a picture-word interference task with beginning and skilled readers. Journal of Experimental Child Psychology, 24, 440–460. Pritchatt, D. (1968). An investigation into some of the verbal processes that underlie associative verbal processes of the Stroop color effect. Quarterly Journal of Experimental Psychology, 20, 351–359. Proctor, R.W. (1978). Sources of color-word interference in the Stroop color-naming task. Perception and Psychophysics, 23, 413–419. Rayner, K. and Springer, C.J. (1986). Graphemic and semantic similarity effects in the picture-word interference task. British Journal of Psychology, 77, 207–222. Redding, G.M. and Gerjets, D.A. (1977). Stroop effect: Interference and facilitation with verbal and manual responses. Perceptual and Motor Skills, 45, 11–17. Regan, J. (1978). Involuntary automatic processing in color-naming tasks. Perception and Psychophysics, 24, 130–136. Rumelhart, D.E. and Norman, D.A. (1982). Simulating a skilled typist: A study of skilled cognitive–motor performance. Cognitive Science, 6, 1–36. Singer, M.H., Lappin, J.S., and Moore, L.P. (1975). The interference of various word parts on color naming in the Stroop test. Perception and Psychophysics, 18, 191–193. Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–661. Sugg, M.J. and McDonald, J.E. (1994). Time course of inhibition in color-response and word-response versions of the Stroop task. Journal of Experimental Psychology: Human Perception and Performance, 20, 647–675. Tannenhaus, M.K., Flanigan, H.P., and Seidenberg, M.S. (1980). Orthographic and phonological activation in auditory and visual word recognition. Memory and Cognition, 8, 513–520. Tzelgov, J., Henik, A., and Leiser, D. (1990). Controlling Stroop interference: Evidence from a bilingual task. Journal of Experimental Psychology: Learning, Memory and Cognition, 16, 760–771. Tzelgov, J., Henik, A., and Berger, J. (1992). Controlling Stroop effects by manipulating expectations for color words. Memory and Cognition, 20, 727–735. Virzi, R.A. and Egeth, H.E. (1985). Toward a translational model of Stroop interference. Memory and Cognition, 13, 304–319. Zhang, H., Zhang, J., and Kornblum, S. (1999). A parallel distributed processing model of stimulus–stimulus and stimulus–response compatibility. Cognitive Psychology, 38, 386–432.
aapc32.fm Page 645 Wednesday, December 5, 2001 10:16 AM
32 Effect anticipation in action planning Michael Ziessler and Dieter Nattkemper Abstract. The present experiments investigate the role of anticipated action effects in action planning. The 1rst two experiments show that action planning includes the anticipation of action effects. In an initial acquisition phase, participants learned that their response to a stimulus would produce a particular effect, that is, another stimulus contingent on the response. In a subsequent test phase, targets were presented together with 2anker stimuli taken from the set of effect stimuli. When the 2ankers were the learned effects of the response required by the target, the response was facilitated. The second series of experiments shows that, even within the learning of action effects, the effects are integrated with the action plan. Participants had to respond to two succeeding stimuli. The identity of the second stimulus depended on the identity of the 1rst stimulus. Hence, the second stimulus could be considered to be an effect of the response to the 1rst stimulus. Action-effect learning should accelerate the response to the second stimulus. The 1rst response was required only in half of all trials. In the other half of trials a late or an early NOGO signal stopped response planning. Action-effect learning depended on the presentation time of the NOGO signal; there were clear learning effects with late NOGO signals and rather small effects with the early signals. Discussing the results, an anticipatory or feed-forward learning mechanism is suggested: the planning of an action involves the anticipation of its environmental effects. If the effects are not yet known, there remain free slots de1ned by properties of the desired effects. Searching for suitable effects in the course of and resulting from action execution will 1ll in the slots.
32.1 Action planning includes effect anticipation The successful control of behavior requires that we know in advance the effects that particular actions will produce in the environment. First, we need advance knowledge of the effects to check whether or not the executed action has achieved the intended goal(s). Second, the knowledge of possible effects is a prerequisite for the selection of actions appropriate for the accomplishment of the intended goals. The 1rst of these notions has been elaborated in some detail; it is considered in all closed-loop theories of motor control. In these theories, representations of action effects are used for evaluation, or more speci1cally, for testing the observed outcome against the intended outcome after having executed a particular action. Knowledge of the environmental effects as well as of all sensory information connected with action execution is assumed to form an internal referential system (e.g. Adams 1971; Schmidt 1975, 1982). The system is used for comparing the actual sensory consequences (including proprioception, tactile information, etc.) and the actual environmental effects of the action with the desired effects. Correspondence indicates the execution of the correct movement. The second notion suggests that representations of action effects are not only relevant for evaluation, but play a functional role in action selection and preparation. This notion has received less attention in experimental research. Nevertheless, at least from a theoretical point of view, the role of effects for the selection of actions has been stressed repeatedly in the literature. As early as 1852, Lotze stated that the anticipation of a movement goal will make the body realize what the mind intends to do (see also Harless 1861). Münsterberg (1888) and James (1890) considered the anticipation
aapc32.fm Page 646 Wednesday, December 5, 2001 10:16 AM
646
Common mechanisms in perception and action
of an action goal, that is, the anticipation of the desired effect, to be a necessary precondition for executing that action. More recently, Prinz (1992, 1997) assumed that actions are cognitively represented by codes that capture their environmental effects. According to Prinz, action control is realized via the activation of the desired effects. Despite this strong theoretical emphasis on the role of action effects for the selection and control of actions, there have been only a small number of studies addressing the question of how action effects are integrated in action planning. Empirically, Hommel (1993a,b, 1996) has demonstrated that the learning of action-effects may be an elementary process of sensorimotor control. For example, in Hommel’s (1996) experiments, participants 1rst learned that their responses (different keypresses) would produce different effects (high and low pitched tones). Later they responded to visual stimuli with the keypresses. Together with each visual stimulus, one of the response effects was presented. Responses were faster if the presented effect was the effect of the required response. The results were interpreted as evidence for an automatic integration of the motor pattern representing a certain movement and the cognitive pattern representing the effects of that movement in the environment. After the effects have been learned they are used in turn to activate the response. In a more elaborated form, Elsner and Hommel (2001) proposed an associative two-stage model of voluntary action control. In the 1rst stage, the learners are assumed to acquire associations between movement representations and the representations of events, which frequently co-occur with these movements. In this stage of learning, voluntary actions do not exist; only random movements produce particular effects. The strength of the acquired association between movement and effect depends on the contingency and the contiguity between the movement and the following event. In the second stage the movement–effect associations can be used to select a movement which is appropriate to achieve a certain action goal, and this enables voluntary actions. The activation of the representation of an action effect is assumed to acivate also the corresponding movement representation via a reversal of the learned action–effect association. That codes of action effects have an impact on the control of movements even when they are not (as in the above experiments) physically present has recently been shown by Kunde, Hoffmann, and Zellmann (1999). The participants in this study performed a speeded choice–reaction task with four stimuli (colors) and four responses (keypresses). Two of the keypresses triggered the presentation of a low-pitched tone and the other two the presentation of a high-pitched tone. Shortly before a stimulus was displayed the required response was indicated by a cue. In 30% of the trials the cue was invalid, that is, the keypress that was actually required was not the cued one, but one of the three remaining keypresses. These invalid trials offered the opportunity to assess the effects of anticipated action effects on response preparation. In half of the trials, the actual keypress produced the same acoustic effect as the keypress being indicated by the cue, and in the other half of trials the keypress was coupled with the alternative auditory effect. The important 1nding was that after some practice an unexpected response was initiated faster when its effect corresponded to the effect of the initially prepared response. Finally, Hommel (1993a) and Kunde (2001) have demonstrated that response planning is affected by the compatibility between the responses and their effects. Response planning was facilitated when there was an overlap between features of the response and features of its effect, for example when the location of the response corresponded to the location of its effect. Altogether the results of these few studies are in line with the assumption that representations of action effects may play a role in action planning. Action planning was facilitated when the effect was either physically present or pre-activated by another response, and when there was a correspondence between features of actions and features of their effects. It is also interesting to note that
aapc32.fm Page 647 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
these results were found even though the effects were formally irrelevant for response selection. So far it appears hat action planning includes the activation of action-effect codes. The following four experiments were designed to provide further evidence for the integration of action effects in action planning. The 1rst two experiments demonstrate that action effects actually play a functional role in action planning. The remaining two experiments investigate the acquisition of action-effect relations. It will be shown that, even in learning, action effects are connected with the action plan rather than with action execution.
32.1.1 Experiment 1 The main goal of Experiment 1 was to investigate whether and how representations of action effects actually play a role in the planning and control of actions. Basically, the experiment was similar to those reported above: during response1 planning the effect of the required response or an effect of another response was presented simultaneously with the target stimulus. If representations of action effects are incorporated in response planning, the presentation of the correct effect should facilitate the planning process. Designing the experiment, we tried to avoid some of the problems with the earlier experiments. For example, in the study by Hommel (1996) the target (T)–response (R)–effect (E) mapping resulted in two sequences of events that occurred in the acquisition phase of the experiments: T1–R1–E1 and T2–R2–E2. If the critical associations learned by the participants were not related to the response– effect pairs (R1–E1 and R2–E2) but to target–effect pairs (T1–E1 and T2–E2), the facilitating effect of the action effects on RTs might be due to the facilitation of stimulus processing rather than the facilitation of response planning. A second problem concerns the type of stimuli that were used as effects. Usually, the effects were auditory stimuli. The choice of tones is presumably pragmatically driven by the assumption that tones are hard to ignore. Whether or not similar observations would emerge when less salient, visual events are presented as action effects is far from clear and needs further investigation. Thus, in our experiments the effects as well as the targets were visual stimuli. To overcome the possible confounding of the R–E and T–E relations we applied a 2:1 mapping for the targets to the responses and a 1:2 mapping for the responses to the action effects. For example, the two targets (T1 and T2) required an identical response (R), which was followed by two effects (E1 and E2). The effects were systematically coupled to the two stimuli; that is, responding to T1 produced E1 and responding to T2 produced E2. This resulted in two sequences of events: T1–R–E1 and T2–R–E2. When, after some training, the targets were presented together with the effect stimuli, two conditions could be distinguished. First, T1 or T2 could be presented together with the respective effect stimuli mapped to the particular target (T1 + E1 or T2 + E2). In this case, the effect stimulus represented both the effect of the required response and the mapping of this perceptual event to the particular target. Hence, both response–effect associations and target–effect associations might contribute to the performance. Second, T1 or T2 could be presented together with the effect stimulus mapped to the alternative target, that is, presenting T1 with E2 and T2 with E1. In this case the effect stimulus would represent the effect of the required response, but it would not correspond to the particular target. Hence, only response–effect associations could contribute to performance. A particular arrangement in our design was that targets and effect stimuli were taken from the same stimulus set. The idea behind this arrangement was that we wanted to use stimuli as effects
647
aapc32.fm Page 648 Wednesday, December 5, 2001 10:16 AM
648
Common mechanisms in perception and action
that were relevant for the participants in the present task. Though the effect stimuli were formally irrelevant for response selection, that should ensure the processing of these stimuli. The experiment was divided into an acquisition phase and a test phase. In the acquisition phase, participants performed a speeded four-choice response task with eight letters as targets. Immediately after each response one of two particular letters out of the same stimulus set was contingently presented. The identity of this effect letter depended on the identity of the target. For example, both letters W and S required a keypress with the left middle 1nger. When the target was W, the letter G was presented as effect letter; in case of target S, the effect letter was X. Thus each response was coupled with two different effects. Each of the effects could be used as a target in one of the next trials (see Fig. 32.1). In the test phase of the experiments we adopted the flanker paradigm originally developed by Eriksen and Eriksen (1974). In this type of task participants are required to respond to the central element (target) of a symbol string surrounded by neutral, compatible, or incompatible elements (2anker). Neutral 2ankers are not assigned to any of the instructed responses. Compatible 2ankers are assigned to the same response as the target, incompatible 2ankers to a different response. Usually, compatible 2ankers facilitate the responses, whereas incompatible 2ankers slow responses down, presumably by an automatic activation of the incorrect response (Coles et al. 1992; Wascher et al. 1999). In the original version of the 2anker paradigm, the distinction between compatible and incompatible 2ankers depends only on the question of whether the target and the 2ankers require the same or another response. In our version, we may now distinguish the target–2anker combinations with respect to effect compatibility as well: effect-compatible 2ankers are those representing the effect of the correct response to the target. On the contrary, effect-incompatible 2ankers represent effects of alternative, incorrect responses. As the targets assigned to the same response were not presented as effects of that response, response-compatible 2ankers were always effect-incompatible. However, response-incompatible 2ankers could be both effect-compatible and effect-incompatible (Table 32.1). For illustration, in the example described above, the stimuli W and S required the same response, producing G and X as response effects. Consider now a situation where the target W is 2anked by G (GWG). As a target, G would require a response with the right index 1nger. Hence, this situation would be classi1ed as incompatible in terms of response compatibility. Yet, in terms of effect compatibility it would be classi1ed as compatible because G represents one of the two possible effects of responding with the left middle 1nger. In the same way, the combination of the target letter W with the 2anker X also has to be classi1ed as response-incompatible, but effect-compatible, as X is also a possible effect of a response with the left middle 1nger. Both response incompatibility and effect incompatibility are given when the target W is surrounded by a response-incompatible 2anker other than the G or X, for example H (HWH). In this case too the 2anker is assigned to a response different from the correct response to the target. Yet, at the same time it is effect-incompatible because H corresponds to one of the effects of an alternative (incorrect) response. The two situations are clearly equivalent in terms of response compatibility; they are both response-incompatible. Nevertheless, they are clearly non-equivalent in terms of effect compatibility; the latter is effect-incompatible and the former effect-compatible. The critical question was whether the effect compatibility of the 2ankers would facilitate the execution of the correct response to the target. If the response effects play a role in response planning, the presence of the effect should decrease the RTs. Thus, within the response-incompatible
aapc32.fm Page 649 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
combinations of targets and 2ankers we expect shorter RTs for the effect-compatible compared with effect-incompatible combinations. Moreover, effect compatibility should exist for both effects of each response independently of the target stimulus. Related to the example, G as well as X as 2ankers should decrease the RTs to W, even though only G followed responses to W, whereas X was presented following the same response to S. The two conditions of effect compatibility provide the opportunity to separate possible effects of target–effect associations and response–effect associations within the same task. If target–effect associations played the major role, an effect of effect compatibility should only be obtained with 2ankers representing the regular effect of the response to the target. If, however, response–effect associations played the major role, an effect-compatibility phenomenon should also be observed with 2ankers representing the alternate effect of the response to the target. Before starting with the description of the experiment, it seems important to note that pilot studies with our experimental set-up revealed a serious problem, which is related to the effects of response compatibility. Effects of response compatibility were obtained with the majority of participants but they were clearly absent with a substantial minority (about 25–30%). This observation may be related to recent 1ndings showing that flanker interference is linked to the perceptual load associated with the processing of the relevant elements in the stimulus array. When the perceptual load of the task is low a robust by increasing the load effect is obtained which is reduced or even eliminated flanker, that is, by increasing the stimulus set size or by varying the processing requirements (Lavie 1995; Lavie and Cox 1997; Paquet and Craig 1997). Presumably our experimental setting with eight stimulus alternatives represents a situation with relatively high task demands, thereby reducing the overall probability of obtaining a consistent response-compatibility effect. Therefore, it seems reasonable to select subjects according to their sensitivity to flanker interference and to accept only those participants for the analysis of results who showed a minimum RT-bene1t of response-compatible compared with response-incompatible 2ankers. The criterion was set to 25 ms, which is slightly less than the effect found in the classical flanker paradigm (Eriksen and Eriksen 1974).
32.1.1.1 Method Participants. Forty students of different departments of Humboldt University Berlin took part in the experiments. They were randomly assigned to one of two experimental groups that differed with respect to the instruction in the 1rst part of the experiment. Participants either received course credits or were paid for their services. Stimuli and apparatus. Stimulus presentation and response recording was controlled by a Rhotron VME system with an Atari high-resolution monochrome monitor (Atari SM 124). The capital letters W, G, N, F, S, X, P, and H were used both as targets and as effect letters appearing after participants’ responses. The stimuli were presented at the center of the computer screen. With a viewing distance of about 60 cm, one letter subtended a visual angle of approximately 0.4 × 0.8 degree (width × height). The three-letter strings presented in the test phase horizontally subtended a visual angle of approximately 1.6 degrees with a center-to-center distance between two adjacent symbols of 0.6 degrees. Responses were keypresses with the index and middle 1nger of each hand. The two keys for each hand were mounted on a separate response panel and the panels were separated by 17 cm. Design and procedure. The experiment was divided into two phases; an acquisition phase and a test phase. In the acquisition phase participants performed a speeded four-choice response task with the eight stimulus alternatives. Figure 32.1 illustrates the target–response–effect mapping.
649
aapc32.fm Page 650 Wednesday, December 5, 2001 10:16 AM
650
Common mechanisms in perception and action
Immediately after each correct response one of two particular letters was contingently presented. The identity of this effect letter depended on the identity of the imperative stimulus. In the test phase of the experiment, we presented the targets together with 2ankers on either side of the target. The 2ankers were drawn from the same set of stimuli that served as targets, plus two additional letters (D and Z) which were not assigned to any of the instructed responses. Five types of target–2anker relations were distinguished (Table 32.1). In neutral trials the target was surrounded by the letters D or Z, which never required a response in the course of the experiment. In responsecompatible trials the target was 2anked by a letter that was assigned to the same response as that required by the target (e.g. SWS). In response-incompatible trials the target was surrounded by a letter that was assigned to a response different from the response assigned to the target (e.g. FWF). So far, these are the three traditional conditions of the flanker paradigm. Within the response-incompatible conditions, effect compatibility was varied. Effect compatibility was given in two ways: first, the 2anker could represent the effect that regularly followed the correct response to the target (e.g. GWG). Second, effect compatibility was also given when the 2anker represented the effect that followed to the alternative target requiring the same response (e.g. XWX). We will refer to these conditions in the following as regular-effect condition and alternate-effect condition. The remaining response-incompatible conditions were also effect-incompatible. In the following we will refer to this condition as the incompatible condition. Thus, altogether there were 1ve 2anker conditions: neutral, response-compatible, regular effect, alternate effect, and incompatible. Participants were tested individually. First, they received a sheet of paper illustrating the stimulus– response assignment. The sheet remained on the table throughout the experiment. The particular design of the 1gure depended on the experimental group. Two different instructions were applied. One group of participants received a 1gure showing the mapping of stimuli to responses and the mapping of responses to response effects (similar to Fig. 32.1). These participants were explicitly informed that each particular keypress would result in the presentation of particular effect letters (effect-related instruction). The other group of participants received a 1gure demonstrating only the mapping of stimuli to responses. That is, in this group emphasis was given to the stimulus– response assignment but not to the response–effect relations (key-related instruction), and there was only the general information that letters would follow the responses. The variation of the instruction was aimed to ensure that at least the explicitly informed group would learn the response– effect relations.
Fig. 32.1 The mapping of targets, responses, and response effects in Experiment 1. LM–left middle 1nger; LI–left index 1nger; RI–right index 1nger; RM–right middle 1nger.
aapc32.fm Page 651 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
Table 32.1 Overview of the 2anker conditions used in Experiment 1 Flanker condition
Example
Neutral Response-compatible Incompatible Regular effect Alternate effect
DWD SWS HWH GWG XWX
Response compatibility Neutral Compatible Incompatible Incompatible Incompatible
Effect compatibility Neutral Incompatible Incompatible Compatible Compatible
The examples are related to target W. Response compatibility was estimated by comparing the neutral condition with the response-compatible and the incompatible condition. To estimate the effect of effect compatibility the incompatible condition was compared with the two effect-compatible conditions (regular and alternate effect). The incompatible condition is response-incompatible as well as effect-incompatible. The relevant cells are marked with bold borders.
To familiarize the participants with the task, they worked through a two-block demonstration version of the experiment with 20 trials each. In the 1rst block the acquisition phase of the experiment was demonstrated, and in the second block the test phase was demonstrated. Thus, participants were informed in advance that the task would slightly vary in the second half of the experiment. The acquisition phase was then started. The acquisition phase consisted of 8 blocks with 40 trials each. Participants were required to respond as quickly as possible without making errors on appearance of a singular letter randomly chosen from the set of target letters. Each trial started with a warning signal (an exclamation mark) appearing at the center of the screen. 800 to 1200 ms later, one of the target letters appeared at the same location. Immediately after a correct response, the corresponding response-effect letter replaced the target. The effect letter remained visible for 500 ms. Following an erroneous response, a question mark was displayed instead of the effect letter for the same time span, accompanied by a beep signal of 50 ms. Thereafter the screen was cleared. 2500 ms later an exclamation mark indicated the beginning of the next trial. After each block, feedback was presented on the screen giving information about average RT and the number of errors in the preceding block. Participants were informed that the number of errors should never exceed 2 (5%). The test phase followed immediately after the acquisition phase, and consisted of 8 blocks with 40 trials each. The procedure in the test phase was identical to that in the acquisition phase, except that instead of a single target letter, three horizontally aligned letters appeared on the screen. Subjects were required to respond to the letter in the middle of the string while ignoring the 2ankers. All 1ve 2anker conditions were applied with the same frequency. The sequence of 2anker conditions was random. The experiment lasted about 75 minutes.
32.1.1.2 Results Presumably due to the strict instruction not to produce more than 5% errors in each block, erroneous responses were rare events. The error rate was 2.78%. Erroneous responses were excluded from the analysis of results. The performance in the acquisition phase is theoretically completely non-informative with respect to the aim of the experiment. Therefore, we restricted the analysis of results to the data from the test phase. As the 1rst step of the analysis, we selected participants according to their sensitivity for effects of noise letters. For this purpose, each participant’s difference between the response-compatible
651
aapc32.fm Page 652 Wednesday, December 5, 2001 10:16 AM
652
Common mechanisms in perception and action
trials and the incompatible trials was computed. As the response-compatible trials were effectincompatible, this difference re2ects the pure effect of response compatibility. In each group six participants did not meet the criterion of achieving a RT-bene1t of at least 25 ms for response compatible trials. These subjects were excluded. Hence, 14 subjects in each instruction group entered the analysis. The 1rst block of the test phase was considered a training block where participants familiarized themselves with the change of task requirements. Thus, mean RTs were computed for each of the 1ve 2anker conditions over the correct responses in Blocks 2 to 8 of the test phase. These data were subjected to an ANOVA including the within-subject factors 2anker condition (neutral, regular-effect, alternate-effect, incompatible, and response-compatible) and instruction group (effect-related, keyrelated). The ANOVA yielded only a main effect of the 2anker condition, F(4, 56) = 22.92, p < 0.01, which did not depend on group, F < 1. Therefore, we pooled the data over both instructional groups. Figure 32.2 presents the results. In evaluating the data, we consider 1rst the three conditions of the classical flanker paradigm. There was an inhibitory effect of response-incompatible 2ankers relative to neutral 2ankers of 60 ms, t(27) = 6.39; p < 0.01, and a facilitative effect of response-compatible 2ankers relative to neutral 2ankers of 30 ms, t(27) = 4.03, p < 0.01. Hence, we obtained the response-compatibility effect. Moreover, this effect was larger than in typical 2anker experiments. For example, Eriksen and Eriksen (1974) found a difference of about 30 ms between response-compatible and responseincompatible 2anker conditions. This is, of course, not that surprising because we selected participants according to their sensitivity to noise interference. More surprising, however, is that the effect-compatibility of the response-incompatible 2ankers also affected RTs: the costs induced by response-incompatible 2ankers were substantially reduced when
Fig. 32.2 Mean RTs for the 1ve 2anker conditions in Experiment 1. The incompatible condition is response-incompatible as well as effect-compatible. It is used for the estimation of effects of both response compatibility and effect compatibility.
aapc32.fm Page 653 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
these letters represented the effects of the required response to the target. To investigate that effect further, a separate ANOVA taking into account only the three response-incompatible conditions (regular-effect, alternate-effect, incompatible) was performed. We obtained only a main effect of the 2anker condition, F(2, 26) = 18.49, p < 0.01. Again this effect was independent of the type of instruction, F < 1. The reduction of RTs in the effect-compatible trials was found with 2ankers indicating the regular effect of the required response as well as with 2ankers indicating the alternate effect. In the regular-effect condition RTs were 32 ms shorter than in the incompatible condition, t(27) = 6.57, p < 0.01. The difference between the alternate-effect condition and the incompatible condition amounted to 28 ms, t(27) = 6.14, p < 0.01. The 4 ms difference between the regular- and alternate-effect 2ankers was not signi1cant, t < 1.
32.1.1.3 Discussion Besides the traditional 2anker effects depending on response compatibility, we obtained an effect of the effect compatibility of actually irrelevant 2ankers that were presented together with the target. Responses to targets surrounded by response-incompatible but effect-compatible 2ankers were faster than responses to targets surrounded by response- and effect-incompatible 2ankers. This suggests that participants acquired some knowledge of the relations between their own movements and the effects of these movements in terms of succeeding visual events. That the different instructions did not have any impact on the effect-compatibility effect may be taken as evidence that the acquisition of action-effect knowledge does not heavily depend on explicitly drawing attention to action-effects. More importantly, however, the main result of the present experiment also suggests that such actioneffect knowledge has some impact on the processes involved in response preparation. It seems that the presentation of the effect of the required response during response planning facilitates the planning process, resulting in shorter RTs with effect-compatible 2ankers. This seems to be a rather solid effect that cannot be eliminated by response incompatibility. Even though the 2anker, due to its response incompatibility, presumably evokes or pre-activates an incorrect response, its effect compatibility may facilitate the preparation of the correct response. Interestingly, the size of this effectcompatibility effect did not depend on whether 2ankers represented the regular effect or the alternate effect. This implies that the in2uence of effect-compatible 2ankers is essentially based on a linkage between responses and their environmental effects. Altogether, we assume that there are two different processes working against each other. On the one hand, the 2ankers activate the response that had to be performed if the 2ankers were the target, resulting in a cost–bene1t pattern for response-incompatible and response-compatible 2anker conditions compared with the neutral condition. On the other hand, the 2ankers also activate one of the representations of the effects of the response set. When the activated effect code corresponds to the correct response, response preparation will be facilitated. This reduces the costs due to the response-incompatibility of those 2ankers. Whether the activation of incorrect effect codes decelerates response preparation cannot be decided with the present data, because response-compatibility effects are always superimposed on effect-compatibility effects. Before discussing these results in more detail, we should consider an important objection against the present experiment: our conclusions rely on the differences between the incompatible 2anker condition and the two effect-compatible 2anker conditions. A problem might be that the incompatible 2anker condition was part of the selection criterion. For the evaluation of the effect-2ankers we considered only those participants with a minimum difference of 25 ms between the responsecompatible and the incompatible 2anker condition. Thus, in tendency we might have selected participants with larger RTs in the incompatible condition. If so, lower RTs for the effect-compatible
653
aapc32.fm Page 654 Wednesday, December 5, 2001 10:16 AM
654
Common mechanisms in perception and action
2anker conditions compared with the incompatible condition could also be due to statistical reasons rather than a facilitating effect of compatible effect-2ankers. Experiment 2 was designed to address this objection.
32.1.2 Experiment 2 Experiment 2 was completely identical to Experiment 1 with only one exception: the effect stimuli were not taken from the same stimulus set as the targets. Instead, a set of 8 new letters was used. Figure 32.3 illustrates the target–response–effect mapping. In the acquisition phase of the experiment, these new stimuli were presented as effects depending on the targets and the responses. This design offers the opportunity to construct effect-compatible and effect-incompatible 2anker conditions independent of response compatibility. Using stimuli from the set of targets as 2ankers allows the construction of response-compatible and response-incompatible 2anker conditions identical to Experiment 1. With respect to effect compatibility those 2ankers are neutral. In turn, using effect stimuli as 2ankers does not affect response compatibility. Thus, the modi1ed design allowed us to select participants who are sensitive to noise interference depending on response compatibility and to evaluate the effects of effect compatibility with completely independent data. Furthermore, because the effect of response incompatibility was not superimposed on the effect of effect compatibility, it became possible to relate the effects of effect compatibility to the neutral condition. Thus, we could analyze facilitating and/or inhibitory effects of effect compatibility and effect incompatibility.
32.1.2.1 Method Participants. Thirty-three students and pupils of different departments of Humboldt University Berlin or of different secondary schools took part in the experiment. They either received course credits or were paid for their services. Stimuli and apparatus. Stimulus presentation and response recording was controlled by the same device that was used in Experiment 1. The targets were also identical to the 1rst experiment. As effect stimuli, the eight capital letters Y, C, B, R, M, V, T, and K were used. The details of stimulus presentation and the mode of responding remained unchanged.
Fig. 32.3 The mapping of targets, responses, and response effects in Experiment 1. LM–left middle 1nger; LI–left index 1nger; RI–right index 1nger; RM–right middle 1nger.
aapc32.fm Page 655 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
Table 32.2 Overview of the 2anker conditions used in Experiment 2 Flanker condition
Example
Response compatibility
Effect compatibility
Neutral
DWD
Neutral
Neutral
Response compatible Response incompatible
SWS HWH
Compatible Incompatible
Neutral Neutral
Regular effect Alternate effect Effect incompatible
YWY CWC BWB
Neutral Neutral neutral
Compatible Compatible Incompatible
The examples are related to target W. Response compatibility could be estimated by comparing the responsecompatible and incompatible conditions with the neutral condition. These conditions were all neutral with respect to effect compatibility. The estimation of effect compatibility included the neutral, regular-effect, alternate-effect, and effect-incompatible conditions. These conditions were all neutral with respect to response compatibility. The relevant cells are marked with bold borders.
Design and procedure. The design of this experiment was basically identical to Experiment 1. The only difference concerned the flanker conditions in the test phase. Instead of 1ve 2anker conditions, six different 2anker conditions were applied (Table 32. 2). Besides a neutral condition, there was a response-compatible (e.g. SWS) and a response-incompatible 2anker condition (e.g. HWH). The latter conditions were both unrelated to effect compatibility. Furthermore, there were three effect-related conditions. Two of them were effect-compatible, one with the regular effect (e.g. YWY) and one with the alternate effect (e.g. CWC). In the other condition the 2ankers were effect-incompatible (e.g. BWB). All six 2anker conditions were applied with the same frequency to the eight targets, resulting in 8 blocks of 48 trials. The sequence of 2anker conditions was random. In the acquisition phase only the effect-related instruction was used, to ensure the learning of the action effects. The experiment lasted about 75 minutes. 32.1.2.2 Results The error rate amounted to 1.47%. Thus, erroneous responses were rare events and were excluded from analysis. The data analysis followed the procedure described for Experiment 1. Again it was restricted to the data from the test phase. Twenty participants met the criterion of achieving an RT-bene1t of at least 25 ms of the response-compatible relative to the response-incompatible 2anker conditions. Only these participants entered the following analysis. Mean RTs were computed for each of the 2anker conditions over the correct responses in Blocks 2 to 8 of the test phase. Figure 32.4 presents the results. The data were subjected to two separate ANOVAs; the 1rst one evaluated the effect of response compatibility, and the second the effect of effect compatibility. The 1rst ANOVA, including the responserelated 2anker conditions (neutral, response-compatible, response-incompatible), yielded a main effect of 2anker condition, F(2, 38) = 75.00, p < 0.001. A post hoc analysis showed that response-compatible 2ankers reduced the RT to the target compared with the neutral 2anker condition by about 61 ms, t(19) = 5.97, p < 0.01. By contrast, response-incompatible 2ankers increased the RT to the targets by 25 ms, t(19) = 2.44, p < 0.01. Again, this result re2ects our selection of participants for further analysis. It shows that the selected participants were sensitive to effects of flanker elements. Relative
655
aapc32.fm Page 656 Wednesday, December 5, 2001 10:16 AM
656
Common mechanisms in perception and action
Fig. 32.4 Mean RTs for the six 2anker conditions in Experiment 2. The neutral condition is integrated in the response-related data. However, the effect-related data can be related to this condition as well. to the neutral condition, we observed bene1ts with response-compatible 2ankers and costs with response-incompatible 2ankers. More important were the results from the second ANOVA with effect-related 2anker conditions (neutral, regular-effect, alternate-effect, effect-incompatible). The analysis provided clear evidence for a main effect of 2anker condition, F(3, 57) = 3.87, p < 0.05. In effect-compatible trials, RTs were on average 18 ms shorter than in effect-incompatible trials, t(19) = 2.03, p < 0.05. So far, Experiment 2 replicated the main result of the 1rst experiment. Obviously, the effect compatibility of the 2ankers affected the RTs to targets. As the comparison with the neutral condition suggests, this was mainly due to a facilitation of the responses with effect-compatible 2ankers. With the regular-effect 2anker, RTs were 30 ms shorter than with neutral 2ankers, t(19) = 2.93, p < 0.01. Nearly the same result was found with the alternate 2anker. The corresponding difference amounted to 29 ms, t(19) = 2.74, p < 0.01. In contrast, there was no reliable difference between the neutral condition and the effectincompatible 2anker condition, t(19) = 1.16, p < 0.05.
32.1.2.3 Discussion Experiment 2 replicated the main 1ndings of Experiment 1, showing that the results of the 1rst experiment were not due to an artifact. The effects seem to be consistent over participants and over different stimulus material. It can therefore be concluded that the presentation of a stimulus presenting the effect of the required response while preparing this response results in shorter RTs. It is important to note that we found this effect even under 2anker conditions in which the effect 2anker was not related to the target; that is, in the alternate-effect condition. This excludes the possibility that the 2anker only primed the processing of the target. Obviously, the effect 2ankers affected response preparation. Extending the 1ndings of the 1rst experiment, Experiment 2 suggests
aapc32.fm Page 657 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
a bene1t-only pattern of results; compared with neutral 2ankers, we found only bene1ts with effectcompatible 2ankers, but no indication of costs with effect-incompatible 2ankers. Given the present data, it is far from clear how to explain the facilitative effect of effect-compatible 2ankers. Two candidate explanations come to mind. The 1rst is to assume that representations of effects more or less automatically activate their associated actions. However, the problem with such an explanation is that costs of effect-incompatible 2ankers were not observed. As not only effectcompatible 2ankers should activate the associated actions, but effect-incompatible 2ankers as well, some interference should result from the fact that effect-incompatible 2ankers would activate a response different from the response activated by the target. This should result in some prolongation of RTs in effect-incompatible trials relative to the neutral trials. However, inhibitory effects of effect-incompatible 2ankers were clearly absent. The second candidate is to assume that the physical presence of an effect does not automatically activate the corresponding action representation, but rather affects action planning. Presupposing that the anticipation of the effect is part of action planning, and presupposing that action planning itself is initiated by representations of the target, the anticipation of the correct effect might be facilitated when its representation is already activated by an external stimulus, as in the case of effect-compatible 2ankers. In contrast, effect-incompatible 2ankers as well as neutral 2ankers do not activate a representation of the correct effect. Instead, both of them activate a representation of another stimulus that is not related to the effect of the current action. Thus, both effect-incompatible 2ankers and neutral 2ankers would have the same effect on response planning. This explanation seems more compatible with the present data. To summarize, our results support the idea that participants in our experiments acquired some knowledge of the relations between their own movements and the effects of these movements in terms of subsequent environmental events. Furthermore, they suggest that representations of action effects actually play a functional role in action planning. This immediately raises the question of how to conceptualize the mechanisms underlying the learning of action-effect relations.
32.2 The learning of action effects is bound to action planning2 If action planning actually includes the anticipation of the corresponding action effect, it seems reasonable to suppose that the learning of action-effect representations of the environmental events following the action execution are connected with the action plan. If so, then an activation of the action plan would automatically activate the action effects learned before. This in turn means that the components necessary for action-effect learning should be the action plan and the effect, whereas the execution of the action would not be necessary for learning. With this view we hypothesized that the learning of action effects might depend on the extent of action preparation. More exactly, we reasoned that highly elaborated action planning should result in more extensive actioneffect learning than less elaborated planning. Hence, the general approach of the experiments was to vary the extent of action planning. This was achieved by realizing a GO–NOGO paradigm; that is, in half of the trials response execution was required (GO trials), whereas in the other half of trials execution had to be suppressed (NOGO trials). In case of NOGO trials the extent of action planning can be manipulated by varying the time elapsing between stimulus onset and the presentation of the NOGO signal. An early NOGO signal should interrupt action planning in an early stage, whereas a late NOGO signal could allow full action planning excluding only action execution. Then the
657
aapc32.fm Page 658 Wednesday, December 5, 2001 10:16 AM
658
Common mechanisms in perception and action
question was whether or not NOGO trials would contribute to action-effect learning and, moreover, whether that learning would depend on the extent of action planning.
32.2.1 Experiment 3 In this experiment the participants had to respond to two successive stimuli. The 1rst stimulus was randomly chosen, and the identity of the second stimulus depended on the identity of the 1rst one. Hence, the second stimulus could be considered as an effect of the response to the 1rst stimulus. Response-effect learning should accelerate the response to the second stimulus, as the second stimulus could be anticipated. According to our hypothesis, the learning should depend on the planning of the 1rst response. As mentioned above, we applied a GO–NOGO procedure. The 1rst response had to be executed only in GO trials. The critical trials for testing the hypothesis were the NOGO trials. In case of the NOGO trials, a NOGO signal indicated that the preparation of the response should be stopped. One group of participants received the NOGO signal shortly after stimulus onset (Early-Signal Group), whereas in the other group the NOGO signal was presented when 90% of the mean reaction time had elapsed (Late-Signal Group). Late NOGO signals should lead to elaborated response planning, early NOGO signals should interrupt the planning shortly after stimulus onset. Both groups of participants should learn the response-effect relations with GO trials as these trials always include complete response planning. The critical question was whether the NOGO trials regarding the 1rst response would contribute to response-effect learning, as well. If elaborated planning of the 1rst response was a prerequisite of learning, participants in the Late-Signal Group should learn the response-effect relations between the 1rst response and the second stimulus as well in NOGO trials whereas, the participants in the Early-Signal Group would not. As a consequence, the Late-Signal Group should have more practice, resulting in stronger in2uences on response-effect learning. Furthermore, as soon as some response effects have been learned, the expression of the learned relations in performance should differ for GO trials and NOGO trials in the Early-Signal Group, but not in the Late-Signal Group. If the performance effects were due to the anticipation of the second stimulus as part of planning the 1rst response, only elaborated response planning could have an impact on performance. Hence, the Late-Signal Group should be able to anticipate the second stimulus for GO trials as well as for NOGO trials. On the contrary, in the Early-Signal Group an anticipation of the second stimulus would be expected only with GO trials, but not with the incompletely planned NOGO trials. Therefore, in NOGO trials with the early signal the learned response-effect relations would be unlikely to have an effect on the response to the second stimulus.
32.2.1.1 Method Participants. Seventy-two participants took part in the experiment. None of them had participated in the 1rst two experiments. Participants were students from several departments and pupils from secondary schools at a minimum age of 16 years. Students from the Department of Psychology of Humboldt University received course credit for their participation, the other participants were paid for their services. Forty-four participants were randomly assigned to the Early-Signal Group and 28 to the Late-Signal Group. After eliminating those participants who
aapc32.fm Page 659 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
did not follow the instruction, there were 20 participants in each group. The selection criteria are described in the Results section. Stimuli and apparatus. Stimulus presentation and recording of the responses were controlled by an IBM-compatible PC. Responses were made by key-presses on the keyboard. The keys <, y, ., and — on a German keyboard were assigned to the left middle 1nger, the left index 1nger, the right index 1nger, and the right middle 1nger, respectively. Throughout each block the 1ngers rested on these keys. The capital letters W, S, F, X, N, P, G, and H were used as stimuli. They were presented at the center of the black display screen. At a viewing distance of about 60 cm each stimulus letter appeared at a visual angle of 1.5°. Stimuli W and S required a response with the left middle 1nger (LM), stimuli F and X were assigned to the left index 1nger (LI). Stimuli N and P had to be responded to with the right index 1nger (RI), and stimuli G and H with the right middle 1nger (RM). Design and procedure. The experiment consisted of 12 blocks with 68 trials each. Figure 32.5 illustrates the experimental procedure. Each trial was started with an attention signal (+). 800 ms later the attention signal was replaced by the 1rst stimulus letter (S1). As 1rst stimuli only the letters W, S, F, and X were used. Thus, the 1rst response was always a left response. Except in trials 17, 34, 51, and 68, S1 was presented in white color. Participants were instructed not to respond to white stimulus letters, but to prepare the appropriate response while waiting for a color change of the stimulus letter. After some time the stimulus
Fig. 32.5 Illustration of the procedure in Experiment 3. The trial started with the presentation of the attention signal. Then stimulus S was presented in white color. The preparation interval was set to 100 ms for the Early-Signal Group, for the Late-Signal Group it was adapted to 90% of the RT to the second stimulus. The GO signal changed the color of S1 to yellow, the NOGO signal to red. Measurement of the 1rst RT started with the GO signal. The stimulus remained on the screen up to the response with GO trials; with NOGO trials the presentation time was adapted to that RT. Then S2 was presented immediately in yellow.
659
aapc32.fm Page 660 Wednesday, December 5, 2001 10:16 AM
660
Common mechanisms in perception and action
letter changed its color, becoming yellow or red. In the case of a yellow letter, the response was to be given as fast as possible (GO trials). If the color of the letter changed to red the execution of the response was not permitted (NOGO trials). In this case the stimulus remained on the screen for a time that was adapted to the mean RT for the 1rst response in the preceding block. (In the 1rst experimental block this time was set to 1500 ms.) The procedure warranted that the 1rst stimulus was on the screen for approximately the same time in both GO and NOGO trials. Half of the trials were GO trials and half were NOGO trials. Trials 17, 34, 51, and 68 were control trials to test whether participants really used the time between stimulus onset and color change for response preparation. In these trials the letter appeared immediately in yellow. Response preparation should be indicated for each participant by shorter reaction times (RTs) in the test trials compared with the control trials. After correct responses as well as after all NOGO trials the second letter (S2) was presented. The identity of S2 depended on the identity of S1: W was followed by G, S by N, F by H, and X by P. Thus, each of the two left hand responses had two possible effects. A response with the left middle 1nger could be followed by G or N, and a response with the left index 1nger could be followed by H or P. As a consequence of this stimulus–response assignment, all possible combinations of the 1rst and second responses occurred in the experiment. S2 was immediately presented in yellow and had to be responded to as fast as possible with a right-hand response. A facilitation of the responses to S2 should indicate the learning of the relation between the 1rst response and the second stimulus, that is, the response-effect learning. The response was followed by a blank screen for 1600 ms, after which the next trial started with the attention signal. The main experimental variable was the time between the onset of S1 and the presentation of the GO–NOGO signal, that is, the preparation time. In the Early-Signal Group the preparation time was 1xed to 100 ms. At this point in time the color of the stimulus letter changed to yellow or to red. In the Late-Signal Group the preparation time should be long enough to enable full response preparation. We decided to adapt the preparation time to 90% of the mean RT; however, as the RT to S1 was affected by response preparation in advance of the GO signal, we used the mean RTs to S2 in the preceding block for the adaptive procedure.3 The second response was structurally identical to the 1rst one, that is, 4 stimuli were assigned to 2 responses. Hence, at least in the 1rst experimental blocks, the RTs to the second stimuli should be a good estimation of the RTs to the 1rst stimuli. In later blocks, when the second response might be facilitated as a result of response-effect learning, the second RT presumably underestimates the 1rst RT. Yet, with respect to our hypothesis, the resulting shortening of preparation time due to response-effect learning should not be critical: learning effects on the second RT reduce the planning time for responses to S1 in NOGO trials; that is, the preconditions for learning in NOGO trials get worse. Thus, the adaptation of the preparation interval to the second RTs works against the hypothesis. In the 1rst block of the experiment the time was set to 2000 ms, resulting in a preparation time of 1800 ms. In cases of false responses to the 1rst stimulus, the second stimulus was replaced by a question mark and a beep tone indicated the error. Thus the second response was missing. False responses to the second stimulus were also indicated by a beep. Response-effect learning was tested three times throughout the experiment. In Blocks 4, 8, and 12 the second stimulus was randomly chosen. If the participants were learning to anticipate the second stimulus in the regular blocks, RTs should increase in these irregular test blocks. Participants were tested individually. First, they received a sheet of paper demonstrating the stimulus– response assignment. They were instructed to press the required response button as quickly and as accurately as possible. After a short demonstration block, the 12 experimental blocks were started.
aapc32.fm Page 661 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
On completion of each block, the computer calculated the mean RTs and the number of errors. Both scores appeared on the screen to motivate the participants to respond faster and more accurately in the next block. The experiment lasted about 80 minutes.
32.2.1.2 Results Manipulation check. The main variable of the experiment was the preparation time for the 1rst response. The experimental setting required the participants to use the time between the onset of S1 and the GO–NOGO signal to prepare the response. If participants were waiting for the color change to start response planning, there would be no response planning in NOGO trials. If so it would be impossible to check the effect of the extent of response planning on responseeffect learning. Thus, we had to make sure that only those participants who actually prepared the response in advance of the GO–NOGO signal entered the further analyses. To check response preparation we calculated the mean RTs to S1 in the experimental trials and in the control trials. If participants prepared their responses, RTs in the experimental trials should be shorter than RTs in the control trials. Response preparation in GO trials implies similar response preparation in the NOGO trials, as participants did not know in advance whether the given trial would be a GO or NOGO trial. Because there were only four control trials in each block, we summarized the data over the 12 experimental blocks. Only participants who produced a minimum preparation effect of 20 ms were considered for further analyses. This criterion was arbitrarily chosen and solely aimed at establishing that participants, and especially the participants in the Early-Signal Group, actually prepared the 1rst response. In the Early-Signal Group a preparation time of 100 ms was available. Hence, the maximum preparation effect should be about 100 ms, and stronger preparation effects were expected for the Late-Signal Group. There remained 32 participants in the Early-Signal Group and 26 participants in the Late-Signal Group. As errors also reduce the amount of training of the response-effect relations, we further decided to exclude participants with more than 5% false or missing responses to the second response. The percentage of errors ranged between 0.6% and 8.9%. After applying the error criterion, there were 20 participants in each of the two experimental groups.4 For the remaining participants, the mean preparation effect and the number of errors were calculated. The mean preparation effect amounted to 63 ms for the Early-Signal Group and to 338 ms for the Late-Signal Group. As the variance of the preparation effect was lower in the Early-Signal Group, the difference between both groups was tested with a t-test for unequal variances. The difference was signi1cant, t(21) = 11.82, p < 0.01. The percentage of errors was 2.7% in the Early-Signal Group and 3.1% in the Late-Signal Group. The difference was not signi1cant, t < 1. Thus, the remaining participants formed two experimental groups for which the manipulation of the experimental variable had been effective: first, all participants had a preparation effect; second, the preparation effect increased with preparation time; and third, errors were comparable. In the experimental trials, the mean RT to S1 amounted to 814 ms with the Early-Signal Group and to 659 ms with the Late-Signal Group. This difference, re2ecting the different preparation time, was signi1cant, F(1, 38) = 6.51, p < 0.05. Responses to the second stimulus. The RTs for the responses to the second stimuli were the main indicator for response-effect learning. As a 1rst step of the analysis, Fig. 32.6 presents the training curves for the two experimental groups. RTs were summarized over all correct responses to S2. The control trials were not considered. Separate ANOVAs were performed for the 9 regular blocks and
661
aapc32.fm Page 662 Wednesday, December 5, 2001 10:16 AM
662
Common mechanisms in perception and action
for the comparison between the three test blocks with random response-effect relations and the immediately preceding regular blocks. In a second step of the analysis, the data for GO trials and NOGO trials were considered separately. Regular blocks. For the blocks with regular response-effect relations a 2 (groups, between-subjects) × 9 (blocks, within-subjects) ANOVA was performed. There was only a main effect of blocks, F(3, 130) = 149.50, p < 0.01. The main effect of blocks re2ects both the learning of the stimulusresponse assignment and the learning of the response-effect relations. Numerically, RTs in the LateSignal Group were 24 ms shorter than in the Early-Signal Group and the difference between both groups increased with practice. That would indicate more response-effect learning in the Late-Signal Group. However, the main effect of groups, F < 1, as well as the interaction between groups and blocks, F(3, 130) = 1.60, were not signi1cant. Effect of random blocks. As a more direct test of response-effect learning, the RTs in the random blocks (blocks 4, 8, 12) were compared with those in the immediately preceding regular blocks (blocks 3, 7, 11) in both experimental groups. A 2 (groups, between-subjects) × 3 (blocks, withinsubjects) × 2 (regularity, within-subjects) ANOVA again indicated a main effect of blocks, F(1, 48) = 64.79, p < 0.01, re2ecting the general practice of the stimulus–response assignment. More importantly, the main effect of regularity was signi1cant as well, F(1, 38) = 14.79, p < 0.01. If S2 regularly followed to the 1rst response, RTs were 53 ms shorter than in cases of irregular response– stimulus relations, as in the random blocks. The effect of regularity increased with practice, F(2, 60) = 7.99, p < 0.01. Thus, there is 1rst evidence that the relations between the responses to S1 and S2 were actually learned and used in performing the task. The most important result was an interaction between regularity learning and the point in time at which the GO–NOGO signal was presented: the regularity effect depended on the experimental group, F(1, 38) = 5.99, p < 0.05. In the Early-Signal Group the mean learning effect amounted to 19 ms, whereas in the Late-Signal Group a learning effect of 87 ms was found; that is, the effect of response–effect learning increased with the planning time in advance of the GO/NOGO signal. The other interactions as well as the main effect of experimental group were not signi1cant, all F-values < 1.
Fig. 32.6 Mean RTs for the responses to the second stimulus for the Early-Signal Group and the Late-Signal Group in Experiment 3. The data are summarized over all trials; that is, no distinction was made between GO or NOGO trials with respect to the 1rst response. In blocks 4, 8, and 12 the second stimulus was not the regular effect of the 1rst response, but randomly selected.
aapc32.fm Page 663 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
Effects of regularity on GO and NOGO trials. The previous analysis has shown that the amount of response-effect learning depends on the amount of response planning. An additional differentiation between GO trials and NOGO trials could be used to investigate whether the expression of the learned relations would also depend on response planning. Figure 32.7 illustrates the learning effects (that is, the mean differences between the three random blocks and the preceding regular blocks), in the Early-Signal Group and in the Late-Signal Group separately for GO trials and NOGO trials. The figure again shows the general group effect already shown in the previous analysis; there was a stronger effect of regularity learning in the Late-Signal Group than in the Early-Signal Group. Furthermore, in the Early-Signal Group a difference between GO trials and NOGO trials becomes evident. A paired samples t-test indicated that the increase of RTs from regular to random blocks was stronger in GO trials than in NOGO trials, t(19) = 2.88, p < 0.01. Moreover, only for GO trials was the increase of 32 ms signi1cant, t(19) = 2.36, p < 0.05, whereas for NOGO trials the small increase of 7 ms did not differ from zero, t(19) < 1. On the contrary, in the Late-Signal Group both the GO trials and the NOGO trials produced a similar increase of RTs from regular to random blocks of 85 ms and 90 ms, respectively, t(19) < 1. The increase of RTs was signi1cant for GO trials, t(19) = 2.83, p < 0.05, as well as for NOGO trials, t(19) = 4.03, p < 0.01. Thus, learned response-effect relations seem to affect the second response when there was suf1cient planning of the 1rst response, that is, in all GO trials and in NOGO trials in the Late-Signal group, but not in NOGO trials in the EarlySignal Group.
32.2.1.3 Discussion In both experimental conditions of Experiment 3, evidence for learning the relations between the 1rst response and the second stimulus—that is, for learning some kind of action-effect relations— was found. Responses to S2 were executed faster, when S2 followed the regularity that was applied in nine of the twelve practice blocks. When another stimulus was presented in the three random blocks, RTs increased. The general effect of action-effect learning was modulated by the experimental conditions. There was a clear learning effect in the Late-Signal Group and a rather small effect in the Early-Signal Group. Both groups were completely identical, except with regard to the point in time at which the GO–NOGO signal was presented. The time interval between the onset of
Fig. 32.7 Mean effects of response-effect learning depending on whether the trial was a GO or a NOGO trial with respect to the 1rst response. Mean RT differences between the random blocks and the preceding regular responses to the second stimulus are shown.
663
aapc32.fm Page 664 Wednesday, December 5, 2001 10:16 AM
664
Common mechanisms in perception and action
S1 and the onset of the GO–NOGO signal aimed at manipulating the extent of action planning. GO trials always require complete response planning. In case of NOGO trials response planning can be interrupted as soon as the NOGO signal has been presented. Thus, in the Early-Signal Group response planning should have been rather incomplete, whereas in the Late-Signal Group the interruption was unlikely before 90% of the mean RT had elapsed, resulting in more elaborated planning. This difference in the extent of response planning in NOGO trials had an effect on response-effect learning: when response planning was more elaborated in NOGO trials, stronger learning effects were found. So far, the results are in line with our hypothesis that the processes of response-effect learning are integrated in response planning. This result has several implications: first, it shows that processes involved in the control of responses mediate the learning of the regularities. Learning was found when there was elaborated response planning even when the response had not to be executed. When the planning of the response was interrupted early in half of the trials, the effect of regularity learning was reduced. Second, this result also shows that whether or not the response has been actually executed. If the learning of response-effect relations consisted of learning the relations between executed responses and succeeding events in the environment, learning should have been con1ned to GO trials, and again no difference between the Early-Signal and the Late-Signal Group should arise. On the contrary, the data suggest that NOGO trials also contribute to learning the response-effect relations, given enough time for elaborated response planning. Third, the different effects of GO trials and NOGO trials in the Early-Signal Group and the LateSignal Group indicate that response planning is crucial not only for response-effect learning but also for the use of the learned relations. In the Early-Signal Group, a signi1cant increase of RTs from regular blocks to random blocks was found only with GO trials. There was no effect with NOGO trials. In contrast, in the Late-Signal Group the regularity effects on RTs were similar for GO trials and NOGO trials. Obviously, the learned relations were not effective in NOGO trials when response planning was interrupted by the early NOGO signal. However, if there was suf1cient time for response planning, the use of the learned relations in NOGO trials did not differ from their use in GO trials. This is compatible with the idea that response planning includes the anticipation of the response’s effect, and corresponds to the conclusions drawn from Experiments 1 and 2. Altogether, the experiment demonstrates that response planning is crucial for response-effect learning. However, there are also some possible objections to this interpretation. One objection might be that participants did not learn response-effect relations but stimulus–stimulus relations, that is, the relations between S1 and S2. Assuming that S1 is processed for a shorter time if the signal appears soon in NOGO trials, learning of the stimulus–stimulus relations could explain the data pattern instead. One would expect low learning in NOGO trials with the early signal and full learning in GO trials as well as in all trials with the Late-Signal Group. However, such an explanation seems rather unlikely. Even in NOGO trials, participants in the Early-Signal Group had 100 ms to process S1 before stopping response preparation. This time is suf1cient to activate the representation of letters in memory. Furthermore, we tried to ensure the processing of S1 by selecting only those participants who showed a minimum preparation effect for the 1rst response of 20 ms. The mean preparation effect for the remaining participants amounted to 63 ms. Hence, it can be assumed that the participants had at least started to process the 1rst stimulus in advance of the GO–NOGO signal. Thus, the participants should have the representations available to learn the relations between S1 and S2. Moreover, in NOGO trials, S1 remained on the screen for a time similar to GO trials. That was also true for the NOGO trials in the Early-Signal group, for which the presentation time after the NOGO signal was
aapc32.fm Page 665 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
even longer than for the Late-Signal Group (depending on the mean RTs to S1). Participants could use this time to anticipate S2 depending on S1. It seems important to note at this point that even in NOGO trials S1 was not irrelevant with respect to the task requirements. Using the S1–S2 relations would always help to facilitate responses to S2. NOGO trials, without parallel response preparation, should be more ef1cient than GO trials. As a consequence, similar effects would be expected independent of the kind of trials and the point in time at which the GO–NOGO signal was presented. The data do not support this view. Additionally, Experiment 1 and 2 as well as former experiments (Ziessler and Nattkemper, 2001) have shown that, under similar conditions, response-effect learning is more important than stimulus–stimulus learning. A second objection might be that it is not the extent of response planning in NOGO trials that is responsible for the difference in response-effect learning, but the different preparation time. If that is correct, the experiment would not re2ect differences in response-effect learning. Instead it would re2ect differences in the performance; learned response-effect relations can be applied more effectively to a given response, if there is suf1cient time in advance of the GO signal. Hence, as there was less time in the Early-Signal Group, the effects of the response-effect relations were rather low. This objection can easily be tested by discarding the NOGO trials from the experiment. If there were only GO trials, response-effect learning should be identical for both experimental groups. However, the possible impact of the time interval between stimulus onset and the GO signal on performance should persist. A third objection relates to the fact that different interstimulus intervals between S1 and S2 for both experimental groups were realized as a result of our adaptive procedure. In the Early-signal group the mean S1–S2 interval was about 900 ms, whereas it was about 1150 ms in the Late-Signal group. Thus, the different learning effects might simply be due to the fact that participants in the Late-Signal Group had more time to process S1. This objection can also be tested by applying the same adaptive procedure, but using only GO trials. If the different S1–S2 interval was responsible for the different effects of response-effect learning, we should again 1nd better learning for the Late-Signal Group with the longer S1–S2 interval and less learning for the Early-Signal Group. However, if response-effect learning depends on response planning, similar learning effects for both groups have to be expected, because GO trials always require complete response planning. The fourth experiment was aimed to investigate these latter objections.
32.2.2 Experiment 4 A major point in the interpretation of Experiment 3 was that the differences between the Early-Signal Group and the Late-Signal Group were caused by the NOGO trials. Participants would learn the response-effect relations in NOGO trials if there were suf1cient planning of the response to S1. Thus, they would have had more practice with the response-effect relations in the Late-Signal Group, resulting in stronger learning effects. An alternative interpretation might be that participants would learn the relations only in GO trials. Then, learning would be similar in both groups, but to apply the learned relations some additional time would be needed. This time is available with the late GO– NOGO signal, but not with the early signal. From that point of view, the presentation time of the GO–NOGO signal would affect only performance, rather than learning. Such an interpretation of Experiment 3 would have far-reaching consequences for our hypothesis: following this interpretation, the different increases of RTs from regular to random blocks in the Early-Signal Group and the Late-Signal Group could not be regarded as evidence for the integration of response-effect
665
aapc32.fm Page 666 Wednesday, December 5, 2001 10:16 AM
666
Common mechanisms in perception and action
learning in response planning. Furthermore, the anticipation of response effects would not be an obligatory component of response planning. Instead it would only take place if suf1cient time was available. The latter would question the general role of action effects for the control of behavior. To investigate this objection we replicated Experiment 3 with only one variation: all NOGO trials were replaced by GO trials. If the differences between both experimental groups in Experiment 3 were due to different learning in the NOGO trials, identical results should be found for both groups in Experiment 4. However, if the differences were due to the different preparation time in advance of the GO–NOGO signal or the different length of the interstimulus interval between S1 and S2, the increase of RTs from regular to random blocks should again be higher with the LateSignal Group compared to the Early-Signal Group.
32.2.2.1 Method Participants. Fifty-six participants took part in the experiment. None of them had participated in the previous experiment. Participants were students from several departments and pupils from secondary schools at a minimum age of 16 years. They either received course credit or were paid for their services. Twenty-six participants were randomly assigned to the Early-Signal Group and 30 participants to the Late-Signal Group. After selecting those participants who did not show a preparation effect or who produced too many errors, there were 20 participants in each group. The selection criteria were the same as in Experiment 3. Stimuli and apparatus. These were the same as in Experiment 3. Design and procedure. Design and procedure were identical to Experiment 3. The only exception was that all NOGO trials were replaced by GO trials. Thus, the 1rst stimulus always changed its color from white to yellow. Again, in the Early-Signal Group 100 ms elapsed between stimulus onset and the onset of the GO signal, and in the Late-Signal Group this time amounted to 90% of the mean RT. The replacement of the NOGO trials by GO trials did not affect the frequency of responses to S2; it only doubled the frequency of responses to S1. Again the experiment lasted about 80 minutes. 32.2.2.2 Results Manipulation check. The manipulation check followed the procedure described in Experiment 3. All of the participants produced a preparation effect in their responses to S1 of at least 20 ms. However, there were 16 participants who exceeded the error criterion of 5%. These participants were excluded from the further analysis. Twenty participants remained in each of the two experimental groups. For both groups the mean preparation effect and the number of errors were calculated. The mean preparation effect amounted to 114 ms for the Early-Signal Group and to 430 ms for the Late-Signal Group. As the variance of the preparation effect was lower in the Early-Signal Group, the difference between both groups was tested with a t-test for unequal variances. The difference was signi1cant, t(30) = 11.54, p < 0.01. The percentage of errors was 2.4% in the Early-Signal Group and 2.7% in the Late-Signal Group. The difference was not signi1cant, t < 1. Again the manipulation check makes sure that all participants had a preparation effect for their responses to S1 that increased with preparation time. In the experimental trials, the mean RT to S1 amounted to 674 ms with the Early-Signal Group and to 445 ms with the Late-Signal Group. This difference, re2ecting the different preparation time, was signi1cant, F(1, 38) = 45.09, p < 0.01.
aapc32.fm Page 667 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
Responses to the second stimulus. Most important for testing the effects of response-effect relations were again the RTs for responses to the second stimulus. Figure 32.8 presents the mean RTs for the two experimental groups in each of the twelve blocks. Mean RTs were calculated for correct responses to S2 in the experimental trials, excluding the control trials. The data were analyzed as in Experiment 3. Regular blocks. Figure 32.8 illustrates that the responses to S2 in the regular blocks were not affected by the different preparation time for responses to S1. In the Early-Signal Group the mean RT amounted to 557 ms and in the Late-Signal Group it amounted to 543 ms. A 2 (groups, between-subjects) × 9 (blocks, within-subjects) ANOVA yielded only a signi1cant effect of blocks, F(3, 102) = 133.05, p < 0.01. The effect of group as well as the interaction between both factors were not signi1cant, all F-values < 1. Effect of random blocks. A within-subjects test of the effect of the relations between the 1rst response and the second stimulus is provided by the comparison of the RTs in the random blocks (blocks 4, 8, 12) with those in the immediately preceding regular blocks (blocks 3, 7, 11). A 2 (groups, between-subjects) × 3 (blocks, within-subjects) × 2 (regularity, within-subjects) ANOVA indicated only a main effect of blocks, F(1, 57) = 86.48, p < 0.01, and of regularity, F(1, 38) = 45.09, p < 0.01. The effect of regularity increased with practice, F(2, 61) = 34.97, p < 0.01. The effect of group as well as all interactions were not signi1cant, all F-values < 1. The mean regularity effect amounted to 220 ms and 190 ms in the Early-Signal Group and the Late-Signal Group, respectively. Thus, the time interval between the onset of S1 and the onset of the GO signal did not affect the use of response-effect regularities in responding to S2. Similarly, the different S1–S2 interval had no effect on response-effect learning. The mean S1–S2 interval was about 300 ms shorter for the Early-signal group. However, response-effect learning was not affected.
32.2.2.3 Discussion The results of Experiment 4 are incompatible with the view that the interaction between responseeffect regularity and preparation time that was found in Experiment 3 was due to a performance effect. If there was such a performance effect, in Experiment 4 a difference between the Early- and
Fig. 32.8 Mean RTs for the responses to the second stimulus for the Early-Signal Group and the Late-Signal Group in Experiment 2. In blocks 4, 8, and 12 the second stimulus was not the regular effect of the 1rst response, but randomly selected.
667
aapc32.fm Page 668 Wednesday, December 5, 2001 10:16 AM
668
Common mechanisms in perception and action
the Late-Signal Group should have appeared. Instead, the data indicate similar regularity effects, independent of the time at which the GO signal for the 1rst response was presented. Thus, the time of presenting the GO signal did not affect performance, and it seems likely that the effects found in the 1rst experiment were really learning effects. Moreover, the different S1–S2 intervals were also not responsible for differences in response-effect learning. In Experiment 3 and Experiment 4 the S1–S2 interval was longer with the Late-signal group, but the difference in learning only appeared when response planning was varied, as in Experiment 3, and not when only the interval was varied, as in Experiment 4. At this point it seems interesting to compare the data of both experiments with each other. Strictly speaking, the data of both groups in Experiment 4 should be identical to the data of the Late-Signal Group in Experiment 3. Whereas participants in the latter group were assumed to learn the response-effect relations in GO trials and NOGO trials, in Experiment 4 both groups performed the same amount of trials in the form of GO trials, which should in any case contribute to learning. The data show, however, that under the conditions of Experiment 4 the responses to S2 were performed more quickly and the regularity effects were twice as large as for the Late-Signal group of Experiment 3. At least two interpretations are possible: first, the dependence of the learning effects on the point in time of presenting the GO–NOGO signal in Experiments 3 provided evidence for the role of response planning in response effect learning, but response learning might bene1t further from response execution. That could be simply due to the fact that only in case of response execution is full response planning required. Moreover, there could be an additional preparedness of the cognitive system to accept a succeeding event as response-effect when the response has been executed. Second, if there are no NOGO trials, participants might tend to learn also sequences of responses following S1. For example, they could learn that W had to be responded to with the response sequence left middle 1nger–right middle 1nger. Then the second response would no longer be a response to the second stimulus. Indeed, the very short RTs in the last regular block are in line with this idea. However, that does not hurt the main result of Experiment 3: under comparable conditions the learning effects differed dependent on the point in time at which the GO–NOGO signal was presented. As Experiment 4 shows, this difference cannot be caused by the GO trials. Therefore, the extent of response planning in the NOGO trials seems to be responsible for the strength of response-effect learning. In summary, Experiments 3 and 4 provided evidence that action-effect learning does not start only after having executed certain actions but seems to be essentially bound to processes involved in the planning of those actions.
32.3 General discussion and outlook The 1rst two experiments provided strong evidence for an integration of action effects in action planning. The presentation of the effect together with the target facilitated the response to the target. Comparing the different effect conditions used in the experiment, it has to be concluded that this facilitation is due to response-related processes. Interestingly, there was only a facilitating effect of the correct effect, whereas an incorrect effect did not decelerate response planning. Thus, we concluded that action planning is not initiated by the presentation of the effect, but that the planning includes the anticipation of the action effect. That means, even in the case of our simple responses, action planning is much more than only the preparation of the muscle commands. It involves expectations about changes in the environment that will be caused by this action.
aapc32.fm Page 669 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
The last two experiments show that even in learning the relations between actions and their effects the planning process is most important. In Experiment 3, response-effect learning was found whenever there was elaborated response planning, but diminished when response planning was rather incomplete. In addition, Experiment 4 showed that such learning will bene1t further from response execution, but execution seems not to be essential for learning. The action plan is the cognitive representation of the action. If changes in the environment, which regularly follow an action, are connected to the action plan, action planning becomes likely to activate representations of the effect. What remains unclear at this point is exactly which processes constitute action-effect learning. As a simple learning mechanism one could assume an associative mechanism connecting the action plan with changes in the environment following action execution. Such a mechanism would require distinguishing between the learning of action effects and the use of effect codes for action control. In fact, that is the idea of the two-stage model proposed by Elsner and Hommel (1999) and described in the introduction. The 1rst stage describes the acquisition of action-effect relations in terms of an automatic associative mechanism, whereas the second stage describes the control of actions based on the activation of effect codes. Related to such a model, the present experiments suggest that in the acquisition phase the effect codes are integrated into the action plan. With respect to the second stage, the experiments show that the action planning includes at least the activation of effect codes. However, the experiments provide no evidence for the stronger assumption in Elsner and Hommel’s model that the activation of effect codes controls the activation of the action plan. The two-stage model seems to be very simple and suitable. Nevertheless, the distinction between learning and control in two different stages might turn out to be a problem for this model. How much learning is necessary in the 1rst associative stage before the learned action-effect relations might be used in the second stage for action control? At the least, one has to assume that after some repetitions of common occurrences of a movement and changes in the environment both stages run in parallel. However, as soon as the second stage is included the system loses its associative character: the anticipation of action effects in the second stage should affect the perception of environmental effects in the 1rst stage. Thus, it seems interesting to also take into account a model that uni1es the learning of action effects and their use for action control in the same stage. Such a model could be described as a feed-forward model or an anticipatory learning mechanism (cf. Hoffmann 1993). The assumption is that the anticipation of effects is an obligatory part of planning voluntary movements. Even if the particular effects are not yet known, there should be some anticipation of possible changes in the environment. In that case, effect anticipation could be on a more abstract level; for example concerning feature dimensions in the environment at which an effect would be likely to occur. At the least, one could assume that there would remain some free slots in the action plan as long as the effects are unknown. As a result, the cognitive system would be focused to search in the course of movement execution for those changes in the environment which are appropriate to 1ll in the free slots or to specify the more abstract expectations. Following such a model would not entail forming associative relations between movements and changes in the environment but instead involve a stepby-step speci1cation of effect anticipation starting with very rough anticipations of possible changes and ending with speci1c expectations of particular effects. At least some plausibility for an anticipative learning mechanism may be drawn from observations in Developmental Psychology that seem to support such a view. Whereas Piaget (1953) assumed that sensory–motor coordination would be established by accident when the sensory and the motor
669
aapc32.fm Page 670 Wednesday, December 5, 2001 10:16 AM
670
Common mechanisms in perception and action
system work simultaneously, Bruner (1970) argued that effective skill acquisition takes place when the infant has a de1nite goal, which serves to organize and constrain behavior. In line with this idea, Willatts (1989) observed that infants frequently display all the signs of goal-directed behavior. Infants typically persist in their efforts to achieve a goal, repeat actions, correct for errors, and then stop when the goal is achieved. That seems to be valid even with newborns. Bloch (1990) found that three-day-old infants made signi1cantly more movements towards the location of a target if the target was present than if the target was not present. Also, reaching movements toward the mouth clearly show signs of goal-directed behavior. The mouth is signi1cantly more likely to open before the hand reaches the face when the hand actually ends in the mouth than when it does not end there (Butterworth 1986). Von Hofsten (1990) found evidence that six-month-old infants do not randomly apply all kinds of behavior to a present object. In contrast, they tend to perform and repeat behaviors that seem to be relevant for the object. For instance, the infants repeated the shaking of a bell but not the shaking of a doll. Again, this observation indicates that even early in development movements are expected to produce particular effects instead of being accidental movements that may be associated with changes in the environment. Altogether, the data of our experiments are compatible with both models. Up to now, we do not have clear evidence for one or the other model. The anticipative feed-forward model is admittedly very speculative, but it seems to have some components that might be important for action-effect learning. Further experiments are necessary to investigate whether the learning of action-effect relations is indeed affected by that what has been anticipated before.
Acknowledgements The experiments were supported by grants ME 1362/5-1 and ZI 425/2-2 from the Deutsche Forschungsgemeinschaft to Michael Ziessler. We thank Eliot Hazeltine, Bernhard Hommel, and Wilfried Kunde for their helpful suggestions and comments on an earlier version of this paper; and Britta Bruland, Ilka Engel, Ulrike Gudjulla, Barbara Kothe, Markus Metzler, and Berit Scheer for their assistance in performing the experiments. Furthermore, we appreciate the very kind help of Gary L. Brase in improving the English writing.
Notes 1. With respect to the experiments we prefer the usual term ‘response’ instead of ‘action’. Nevertheless, from our point of view responses in arti1cial experiments are also a kind of goal-directed action. 2. This part of the research was performed in collaboration with Peter Frensch. 3. Of course, it would also have been possible to use the RTs in the control trials to estimate the RTs to the 1rst stimulus without response preparation. The problem was that there were only four control trials in each block. Thus, the estimation of the RTs was not very reliable. To control response preparation in the preparation interval this was less critical, as we summarized the trials over the blocks. 4. Of course, we realize that it is questionable to exclude 32 of the 72 participants. However, during the experiment it was impossible to check whether the participants would follow the instruction to prepare the response in advance of the GO–NOGO signal. For this a greater number of control trials would have been required. As control trials had to be GO trials, additional control trials would contribute to learning in both groups. Thus, the expected differences would disappear. The error criterion was necessary to reduce the effects of any speed–accuracy trade-off.
aapc32.fm Page 671 Wednesday, December 5, 2001 10:16 AM
Effect anticipation in action planning
References Adams, J.A. (1971). A closed-loop theory of motor learning. Journal of Motor Behavior, 3, 111–149. Bloch, H. (1990). Status and function of early sensory–motor coordination. In H. Bloch and B.I. Bertenthal (Eds.), Sensory–motor organizations and development in infancy and early childhood. Dordrecht, Boston, London: Kluwer Academic Publishers. Bruner, J.S. (1970). The growth and structure of skill. In K. Connolly (Ed.), Mechanisms of motor skill development. London: Academic Press. Butterworth, G. (1986). Some problems in explaining the origins of movement control. In M.G. Wade and H.T.A. Whiting (Eds.), Motor development in children: Aspects of coordination and control. Dordrecht: Martinus Nijhoff. Coles, M.G.H., Gehring, W.J., Gratton, G., and Donchin, E. (1992). Response activation and veri1cation: A psychophysiological analysis. In G.E. Stelmach and J. Requin (Eds.) Tutorials in motor behavior II. Amsterdam: Elsevier. Elsner, B. and Hommel, B. (2001). Effect anticipation and action control. Journal of Experimental Psychology, Human Perception and Peformance, 27, 229–240. Eriksen, B.A. and Eriksen, C.W. (1974). Effects of noise letters on the identi1cation of a target letter in a nonsearch task. Perception and Psychophysics, 16(1), 143–149. Harless, E. (1861). Der Apparat des Willens [The apparatus of will]. Zeitschrift für Philosophie und philosophische Kritik, 38, 50–73. Hoffmann, J. (1993). Vorhersage und Erkenntnis [Anticipation and cognition]. Göttingen: Hogrefe. Hofsten, C. von (1990). Development of manipulation action in infancy. In H. Bloch and B.I. Bertenthal (Eds.), Sensory-motor organizations and development in infancy and early childhood. Dordrecht, Boston, London: Kluwer Academic Publishers. Hommel, B. (1993a). Inverting the Simon effect by intention. Psychological Research, 55, 270–279. Hommel, B. (1993b). The relationship between stimulus processing and response selection in the Simon task: Evidence for a temporal overlap. Psychological Research, 55, 280–290. Hommel, B. (1996). The cognitive representation of action: Automatic integration of perceived action effects. Psychological Research, 59, 176–186. James, W. (1890). The principles of psychology. New York: Holt. Kunde, W. (2001). Response–effect compatibility in manual choice reaction tasks. Journal of Experimental Psychology: Human Perception and Performance, 27, 387–394. Kunde, W., Hoffmann, J., and Zellmann, P. (1999). Response preparation includes the anticipation of response effects. Paper presented at the 11th Conference of the European Society for Cognitive Psychology, Gent, Belgium. Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 451–468. Lavie, N. and Cox, S. (1997). On the ef1ciency of visual selective attention: Ef1cient visual search leads to inef1cient distractor rejection. Psychological Science, 8, 395–398. Lotze, R.H. (1852). Medicinische Psychologie oder die Physiologie der Seele. Leipzig: Weidmann′sche Buchhandlung. Münsterberg, H. (1888). Die Willenshandlung. Ein Beitrag zur physiologischen Psychologie [The intended act. A contribution to physiological psychology]. Freiburg: Mohr. Paquet, L. and Craig, G.L. (1997). Evidence for selective target processing with a low perceptual load 2ankers task. Memory and Cognition, 25, 182–189. Piaget, J. (1953). The origin of intelligence in children. New York: Routledge. Prinz, W. (1992). Why don’t we perceive our brain states? European Journal of Cognitive Psychology, 4, 1–20. Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9, 129–154. Schmidt, R.A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 82, 225–260. Schmidt, R.A. (1982). Generalized motor programs and schemas for movement. In J.A.S. Kelso (Ed.), Human motor behavior: An introduction. Hillsdale, NJ: Lawrence Erlbaum. Vogt, B.A., Finch, D.M., and Olson, C.R. (1992). Functional heterogeneity in cingulate cortex: The anterior executive and posterior evaluative regions. Cerebral Cortex, 2, 435–443. Wascher, E., Reinhard, M., Wauschkuhn, B., and Verleger, R. (1999). Spatial S–R compatibility with centrally presented stimuli: An event-related asymmetry study on dimensional overlap. Journal of Cognitive Neuroscience, 11, 214–229.
671
aapc32.fm Page 672 Wednesday, December 5, 2001 10:16 AM
672
Common mechanisms in perception and action
Willats, P. (1989). Development of problem solving in infancy. In A. Slater and G. Bremner (Eds.), Infant development. London: Lawrence Erlbaum. Ziessler, M. and Nattkemper, D. (2001). Learning of event sequences is based on response-effect learning: Further evidence from a serial reaction task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 595–613.
aapc33.fm Page 673 Thursday, December 6, 2001 11:11 AM
33 The representational nature of sequence learning: evidence for goal-based codes Eliot Hazeltine Abstract. Motor sequence learning guides much of our behavior yet the nature of the underlying representations remains elusive: some have argued that stimuli are incorporated in the representations; others have emphasized codes that specify anatomical units; and still others have proposed that the representations are based on locations in egocentric space. To examine what information is learned, the present study used variants of the serial reaction time task in which the responses were associated with task-relevant environmental consequences. In two experiments, these environmental consequences determined whether sequence knowledge transferred to new sets of stimuli and responses, indicating that the learned representation includes the intended goals of the component actions. Such an encoding scheme can account for apparently con2icting 1ndings in the sequence learning literature.
Motor sequence learning is a fundamental component of human behavior. Almost every activity requires individuals to produce series of actions to achieve desired goals. In some cases, such as throwing a dart, complex muscle commands must be precisely scheduled so that the arm and hand move along a particular trajectory. In other cases, such as typing numbers on a keypad, the required movements may be quite simple and relatively unconstrained, but the order in which they are executed is critical for successful performance. Given the richness and complexity of behavior, motor learning certainly involves multiple systems. Evidence for distinct systems has been garnered from neuropsychological studies showing that focal lesions to particular neural structures can impair learning for some types of motor tasks while sparing other forms of motor learning (e.g. Gabrieli, Stebbins, Singh, Willingham, and Goetz 1997; Packard and White 1991; Willingham, Koroshetz, and Peterson 1996). However, the domains of the various systems are not yet clearly delineated, and their respective roles in the acquisition of new behaviors are not well understood. For rigorous computational theories of motor learning to be developed, it critical to determine what information is encoded during particular experimental tasks. A common experimental procedure to measure motor learning is the serial reaction time (SRT) task. The SRT task consists of a series of choice reaction time trials, in which the stimuli can appear in a random order or according to a 1xed sequence. The basic 1nding is that after training, reaction times are faster during sequenced trials than during random trials, even when participants perform a concurrent distractor task and report being unaware of the sequence. Because it provides a robust measure of learning, the SRT task has been used extensively with clinical populations, in neuroimaging experiments, and in developmental studies. This thriving industry of research has heavily in2uenced our understanding of the functional and neural substrates of motor learning: behavioral studies have used the task to investigate the relationship of attention and automaticity to motor learning (e.g. Cohen, Ivry, and Keele 1990; Nissen and Bullemer 1987), and neuroimaging and
aapc33.fm Page 674 Thursday, December 6, 2001 11:11 AM
674
Common mechanisms in perception and action
neuropsychological studies have examined what brain structures appear to be critical for encoding the sequential information. Despite the task’s prevalence, there is considerable debate as to what information is actually encoded during learning. Researchers have addressed this question with various transfer tests in which participants learn to make a sequence of responses under one set of conditions and then are tested under another set of conditions. For example, Keele, Jennings, Jones, Caulton, and Cohen (1995) trained individuals to perform using either the four 1ngers of their right hands or a single index 1nger moved by their arms and wrists. The stimuli were the same for both conditions, and performance was evaluated on test blocks during which half the participants switched their modes of responding. Critically, measures of sequence learning were identical for those who switched and those who maintained their response modes. This 1nding—that sequence knowledge appears to transfer completely across different sets of effectors—suggests that the learned representation does not include the particular muscle commands. Nonetheless, it leaves open the possibility that the sequence of stimuli or some more abstract response-based representation is learned.
33.1 Evidence for response-based learning One line of evidence that SRT learning is based largely on abstract response codes rather than stimulus information comes from a study in which participants performed two interleaved sequential tasks (Hazeltine, Chan, and Ivry, under revision). In one condition, termed the common input channel condition, both tasks required responses to visual stimuli (shapes). Participants made responses with the middle and ring 1ngers of the right hand for one task and vocal responses (‘high’ vs. ‘low’) for the other. In the alternative condition, termed the common output channel condition, both tasks required manual responses, but one of the tasks used visual stimuli and the other auditory stimuli. The two conditions contained the same visual–manual task, but in the common input channel condition it was paired with a visual–vocal task and in the common output channel condition it was paired with an auditory–manual task. The two tasks were performed in an alternating fashion, and, unbeknownst to the participants, the stimuli for the two tasks in both conditions formed a 3-element, repeating sequence. For example, in the common output channel condition, the visual stimuli repeated diamond-diamond-square-, and the auditory stimuli repeated high-low-high-. Because the two sequences were the same length, regularities existed not only within the sequences but between them as well. Thus, the two 3-element sequences could be encoded independently or they could be encoded as a single 6-element sequence by allowing associations to be formed between the two tasks. We tested whether associations between the two sequences were learned using what we termed a phase-shifted probe (see Schmidtke and Heuer 1997). The phase-shifted probe consisted of the same two within-task sequences that were used during training, but they were shifted in relation to one another. That is, if the low tone always preceded a square during training, it might precede a triangle during the phase-shifted probe. The associations between the visual and auditory sequences were disrupted by this probe, but the associations within a modality (e.g. shape to shape or tone to tone) were preserved. Both the common input channel and common output channel conditions demonstrated similar costs during a random probe, indicating that participants had learned the sequence. However, the cost produced by the phase-shifted probe differed between the two conditions. For the common input channel condition, there was essentially no cost for the phase-shifted probe on either task, indicating that participants had encoded the visual–manual and visual–vocal sequences separately.
aapc33.fm Page 675 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
For the common output channel condition, the phase-shifted probe produced a signi1cant cost for the visual–manual task, indicating that the visual–manual and the auditory–manual sequences were encoded in a common representation. Because the sequence information was integrated when the two tasks shared a common output channel but not when they shared a common input channel, it appears that output properties provide the medium for the sequence representation. That is, the responses are linked together, rather than the stimuli. A second line of evidence that the sequence representation is response-based comes from studies in which the same responses can be indicated by multiple possible stimuli. In an earlier study (Hazeltine, under revision), I examined the representational nature of sequences learned during the SRT task by changing the sequence of stimuli while keeping the sequence of responses the same (for a similar procedure, see Nattkemper and Prinz 1997). The stimuli belonged to two distinct sets, a set of four colored circles and a set of four (white) letters, with one stimulus from each set indicating one of the four possible responses. Participants were told they were performing a matching task, and when they pressed a response key, a colored letter appeared. For example, when the leftmost key was pressed, a red A appeared. This key was therefore the appropriate response for the red circle or white letter A. Using the many-to-one mapping allows the sequence of stimuli to be changed without altering the sequence of responses. The sequence repeated every 12 stimuli and every 6 responses. After training, participants performed three probe blocks. One of the probe blocks, termed the 1rst-order probe, used a sequence that shared most (11 of 12) of the 1rst-order stimulus associations with the training sequence. That is, when considering only consecutive stimuli, the stimulus pairs were nearly identical during this probe to the pairs experienced during training. If the learned sequence representation was primarily composed of 1rst-order associations, then performance on this probe block should be highly similar to the training blocks. A second probe, termed the response probe, used a sequence that shared none of the 1rst-order stimulus associations with the trained sequence: every stimulus was preceded by a different stimulus from the one that had preceded it during training. However, this probe required an identical sequence of responses as was made during training. If the learned sequence representation was based on responses rather than stimuli, then little or no cost should be observed here. A third probe, termed the alternative probe, served as a control and consisted of a new sequence that repeated every 12 stimuli and 6 responses and matched the trained sequence in terms of the stimulus probabilities. The results from this matching-task experiment were clear-cut: there was no cost for the response probe, whereas both the 1rst-order and alternative probes produced signi1cant costs of about 50 ms. Critically, performance was not disrupted when the sequence of responses was the same even though the sequence of stimuli was entirely novel. In contrast, when the 1rst-order associations were maintained, signi1cant performance costs were observed. Similar observations have been reported by Nattkemper and Prinz (1997), who also found that stimuli indicating the same response were interchangeable during SRT learning. These 1ndings clearly indicate that SRT learning is largely response-based (see also, Willingham, Wells, Farrell, and Stemwedel 2000). Participants appear to be insensitive to the speci1c order of the stimuli as long as the sequence of responses is maintained. Thus, the learned representation appears to be composed of codes that specify responses rather than stimuli. However, there are many kinds of information that can be included in a response-based representation. For example, Hallett and Grafman (1997) have suggested that a series of low-level motor outputs might be learned. Alternatively, Willingham (1998) has suggested that locations in egocentric space might be encoded. This
675
aapc33.fm Page 676 Thursday, December 6, 2001 11:11 AM
676
Common mechanisms in perception and action
proposal can account for transfer to novel sets of effectors and novel sets of stimuli. However, Grafton, Hazeltine, and Ivry (1998) observed transfer of sequence knowledge from a series of closely spaced 1nger movements to a series of larger amplitude arm movements. To accommodate such a result, Willingham’s spatial-coding hypothesis must allow that the representation can be scaled.
33.2 The goal-based hypothesis Even with more 2exible representations, the spatial-coding hypothesis does not appear adequate to account for all forms of learning during the SRT task. In a second matching-task experiment (Hazeltine, under revision), the same responses produced different environmental consequences (EC) depending on the initial stimuli. When color stimuli appeared, then the responses produced colored circles and when letter stimuli appeared, then the responses produced white letters. Because the same physical actions produced different EC, the role of goal-based codes including EC could be distinguished from response-based codes specifying only motor-output information. As in the experiment described above, there was a response probe that included the same sequence of responses but a different sequence of stimuli; a 1rst-order probe that contained many of the same stimulus pairs, and an alternative probe that differed in terms of both the responses and stimulus pairs. Under these conditions, the response probe produced a signi1cant 27 ms increase in the reaction times, while the 1rst-order and alternative probes produced 33 and 44 ms costs, respectively. The response probe cost differed from the results obtained during the 1rst matching task experiment in which the responses’ EC were identical for the two sets of stimuli and indicated that responses identical in terms of their movement requirements were coded differently when they produced distinct EC. Therefore, the EC associated with the constituent responses appeared to be included in the sequence representation. This proposal was termed the goal-based hypothesis. According to the goal-based hypothesis, participants can learn to effect a series of changes in the environment without specifying the movements that are implemented to achieve the changes. An encoding scheme that emphasizes the desired goals rather than speci1c stimuli or responses would be ideal in terms of 2exibility: what is important during the performance of an action is the accomplishment of a goal, not the execution of a particular set of movements. It is therefore computationally ef1cient to encode the intended consequences of actions in representations that guide behavior. Such representations are applicable in situations in which individuals must perform in novel circumstances that prevent the production of the practiced sets of movements. Although the 1ndings from the matching-task experiments are suggestive, the existence of goalbased representations has not yet been established. Thus far, we have only considered transfer in cases in which there is overlap in the motor commands or in the spatial relationships among the responses. In general, the 1ndings have shown that sequence knowledge can transfer to conditions in which one or more of these properties are invariant across the training and test phases. Moreover, such transfer can be disrupted when the sequence of EC changes, suggesting that learning is sensitive to more than output properties alone. While these experiments indicate that EC can affect the transfer, they do not specify the information that is included in the sequence representation. It may be the case that, under some conditions, the stimulus information is encoded and in other conditions response information dominates the representation. However, if goals are encoded, then shared EC should determine how learning transfers to novel situations.
aapc33.fm Page 677 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
33.3 Experiment 1 Experiment 1 tests a central prediction of the goal-based hypothesis: learning should transfer to conditions in which the movements are highly dissimilar but the EC remain the same. In order to minimize the potential for transfer based on motor commands or spatial coordinates, two response modes were chosen that varied along distinct output properties. During training, participants learned to produce a sequence of tones using an isometric forcekey with their right hands. This apparatus required that the responses not vary in terms of their spatial components but differ in the amount of force exerted. Transfer was tested under conditions in which participants were required to produce the same EC using a keyboard with their left hands. Unlike the forcekey device, the keyboard required responses that differed in terms of both the location and the moving digit. As participants practiced the sequential task with the forcekey, they also practiced making random responses with the keyboard in alternating blocks so that the EC associated with the device would be learned. If the learned representation speci1es the intended EC, then learning to produce a sequence of tones with the forcekey should bene1t performance when the same sequence of tones is produced with the keyboard. Because the responses for both devices were signaled by the same stimuli, a control group performed the same tasks but received no tones during the keyboard blocks. This group was included to differentiate between stimulus-based learning and learning that incorporates EC. If the EC provide for the transfer of sequence knowledge, then no sequence bene1t should be observed for the keyboard task when tones are not associated with these responses. Therefore, the control group should show less transfer despite receiving the same exposure to the stimulus sequence. A quick note about the procedure: although somewhat controversial, there is considerable evidence that motor sequence learning can occur with or without explicit knowledge and that explicit knowledge is supported by distinct systems that can affect behavior dramatically. To ensure that the data re2ected motor learning processes, two widely-used procedures were used to limit explicit knowledge: participants were required to count random stimuli that were interleaved with the SRT task, and they were tested with an interview and generation task after completing the experiment. Only participants who did not show explicit knowledge according to a conservative criterion were included in the analyses (see below). Because the focus of these experiments is the nature of the learned representation rather than the relationship between implicit and explicit knowledge, I report only the data from individuals who did not show indications of explicit knowledge, although the excluded data showed a similar pattern.
33.3.1 Method 33.3.1.1 Apparatus and stimuli The appropriate manual responses were signaled by letter stimuli (A, B, C, and D). They were white, about 2.4 degrees of visual angle, and appeared for 1000 ms in the center of the computer monitor. The counting task involved circular color patches that were approximately the same size and appeared for 200 ms 4 degrees above the center of the computer monitor. The circles were either red or green. Responses were recorded with two devices, a keyboard and a forcekey. The forcekey was an isometric strain gauge that measured forces up to 8.2 Newtons (N), with a resolution of 0.002 N. Once the forcekey detected a load above a threshold of 0.4 N, the computer tracked the force
677
aapc33.fm Page 678 Thursday, December 6, 2001 11:11 AM
678
Common mechanisms in perception and action
exerted until the force fell below the maximum reading for at least 100 ms, at which point the force pulse was considered to have ended. The maximum force was converted into a categorical response, the corresponding tone was presented, and the reaction time and category were recorded. The tones lasted 100 ms and had frequencies of 80, 300, 900, or 3000 Hz. When the maximum force was less than 1.0 N, the lowest tone was produced; when the force was between 1.0 and 2.4 N, the second lowest tone was produced; when it was between 2.4 and 4.8 N, the second highest tone was produced; and when it was greater than 4.8 N, the highest tone was produced. The computer emitted tones whenever forces were produced and, for the Tones group only (see below), whenever keys were pressed. Only the 1rst force pulse (or keypress) that occurred after a letter stimulus produced a tone. Forcekey responses were always made with the right hand and keyboard responses were always made with the left.
33.3.1.2 Procedure The participants were randomly assigned to one of two groups. For the Tones group, participants were told that the purpose of the experiment was to examine how people learned to play musical instruments. They were instructed that they were to learn to play two instruments, one that required force responses and one that required keypresses. They were told that the letter ‘A’ indicated that the lowest frequency tone should be played, the letter ‘B’ indicated that the second lowest frequency tone should be played, and so on. The S–R mappings were always explained in terms of stimuli to tones or in terms of forces and keys to tones, but never in terms of stimuli to forces or keys. Participants practiced the forcekey task interactively with the experimenter for approximately 5 minutes before beginning the experiment. During the practice, but not during the experimental blocks, the letter stimuli were presented along with the desired tone and participants were asked to produce that same tone with the forcekey device. Participants also practiced the keyboard task brie2y before beginning the experiment. For the Tones group, letters and tones were presented and participants were asked to mimic the tones by pressing the appropriate keys. They were also warned that starting with their second keypressing block (block 6) they would have to count the green circles and ignore the red circles while performing the keypressing task. For the No-tones group, participants were told that they would be learning to play a simple instrument and performing a reaction time task with the stimuli. The forcekey task was identical to the one performed by the Tones group. However, their keyboard task differed from the keyboard task performed by the Tones group in that no tones were produced when keys were pressed either during practice or during the experiment. Participants were simply told that this task was ‘to measure their reaction times’. As with the Tones group, participants had to report the number of green circles at the end of each keyboard block. The experiment consisted of 16 blocks of 60 trials each, depicted in Table 33.1. A trial proceeded as follows: first, the letter stimulus appeared, signaling to the participant that they should make their response as quickly as possible. The keypressing task was introduced on block 4 and was used on all subsequent even-number blocks thereafter. On these keypressing blocks, colored circles appeared 900 ms after the onset of the letter. The color of the circle was determined randomly so that on about 60% of the trials it was green and on the remaining trials it was red. Participants were required to keep a running count of the number of green circles and report this number after completing keypress blocks. The next letter stimulus appeared 1500 ms after the onset of the preceding one.
aapc33.fm Page 679 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
Table 33.1 The task requirements for the 16 blocks in Experiment 1 for both Tones and No-Tones groups. F indicates that the forcekey task was performed with the right hand; K indicates that the keyboard task was performed with the left hand. R indicates that the stimuli appeared in a random order, and S indicates that they appeared according to the repeating sequence. The asterisks indicate that participants were performing the color counting task while performing the keyboard task. P indicates a probe block (see text). Block
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Task Sequence
F R
F R
F R
K R
F R
K* R
F S
K* R
F S
K* R
F S
K* R
F S
K* P
F S
K* P
Starting with the block 7, the stimuli for the forcekey blocks followed a repeating sequence. The sequence grammar, 1–2–1–3–2–4, was held constant, but the actual sequence of stimuli (and therefore tones) differed among experimental participants. That is, the sequence elements were assigned to different stimuli for different participants. Four element–stimulus mappings were used so that differences in reaction times to speci1c elements did not re2ect variation in the speeds of the four 1ngers. This mapping did not affect the assignment of stimuli to responses, which was held constant across participants (i.e. the letter A always indicated that the lowest tone should be produced). However, some participants received the sequence A–B–A–C–B–D, for example, and others received a sequence such as D–A–D–B–A–C. The two probes (Same and Alternative) occurred on blocks 13 and 15. For the Same probe, the sequence of letters was the same as that used during the forcekey blocks. Thus, for the Tones group (but not for the No-tones group), the responses produced the same sequence of EC as the responses made during the forcekey blocks. For the alternative probe, the letters appeared in a new six-element sequence, 2–1–2–3–1–4. This alternative sequence had the same stimulus probabilities as the forcekey sequence. The order of the probes was counterbalanced across subjects for each group. After completing 16 blocks of the SRT task, participants were tested to assess their explicit knowledge of the sequence. They were asked whether they used any special strategies while performing the task, and whether they noticed anything in particular about the tones or visual stimuli. Next, they were asked to judge whether the stimuli ever occurred in a 1xed sequence and were asked to produce as much of the sequence as they could. Participants who mentioned a sequence in response to the 1rst two questions, or who able to report more than three consecutive elements of the sequence on the generation task, were considered aware; the remaining participants were considered unaware. The aware participants were excluded from the reported analyses, although the pattern of their data was highly similar, though somewhat faster, to that of the unaware participants.
33.3.2 Results and discussion Ten of the 34 experimental participants were excluded from the data analysis: 4 participants were unable to perform the forcekey task and averaged more than 18 incorrect responses per block of 60 responses, and 6 showed some signs of being aware. Of the remaining 24 participants, there were 6 in each of the 2 probe orders for each of the 2 groups. Reaction times less than 200 ms were counted as incorrect. This occurred on less than 1% of the trials. Producing the correct tone with the forcekey was fairly dif1cult for participants, with accuracy averaging 79% across the 5 sequence
679
aapc33.fm Page 680 Thursday, December 6, 2001 11:11 AM
680
Common mechanisms in perception and action
blocks. In contrast, accuracy for the keyboard responses averaged 95%. The analyses of reaction times were restricted to trials in which the correct response was made. The two tasks differed in terms of the mean reaction times, as well as the accuracy rates. For the forcekey task, reaction times averaged 756 ms, starting at 775 ms and decreasing to 732 ms on the 1nal block. This decrease does not provide a direct measure of sequence learning, since it may re2ect improvements in participants’ abilities to use the forcekey. Reaction times for the keyboard task showed consistent decreases during the random blocks before the two probes appeared. On block 6, the 1rst block during which the keyboard task was performed with the circle counting task, reaction times averaged 611, and on block 12, the 1nal random keyboard block before the probes, they averaged 534 ms. However, there was no evidence that the order of the probes affected their reaction times, as the means for these blocks (14 and 16), averaged across the two probe types, were 523 ms and 522 ms, respectively. The purpose of the Experiment was to determine whether shared EC would confer an advantage to the performance of response sequences even when the underlying movements differed. Thus, the mean reaction times for the two probes (Fig. 33.1) were submitted to a two-way ANOVA, with Probe-Type (Same vs. Alternative) as a within-subjects factor and Condition (Tones vs. No-tones) as a between-subjects factor. Neither factor yielded a signi1cant main effect [both Fs < 1], but the Probe-Type × Condition interaction was signi1cant [F(1, 22) =7.63; p < 0.5; MSE = 417.32]. Follow-up simple comparisons revealed that the Same probe was performed signi1cantly faster than the Alternative probe for the Tones group [F(1, 11) =9.46; p < 0.5; MSE = 293.11], but the difference between the two probes was not signi1cant for the No-tones group [F(1, 11) = 1.36; p > 0.25; MSE = 541.52]. There were no signi1cant effects on the accuracy rates, which were nearly identical (0.97 – 0.98) across the two probes in each group. This pattern of data supports the proposal that the learned sequence representation includes information about the intended EC of the actions. When the keyboard responses caused the same sequence of EC as the forcekey responses, reaction times were faster than when they formed a repeating sequence with a different series of EC. Because no such transfer was observed for the No-tones group, sequence knowledge does not appear to be stimulus-based. Given that overlapping sequences of stimuli on their own were insuf1cient to allow transfer, it appears that the EC played a critical role in the encoding of the representation.
Fig. 33.1 Mean reaction times for the two groups on the two probes for Experiment 1. The error bars represent the standard error of the mean between subjects.
aapc33.fm Page 681 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
One feature of this experiment that should be borne in mind is the difference between the motor properties of the keyboard and the forcekey responses. This dissimilarity is important because, along with eliminating location-based accounts, it rules out the possibility that learning is based on associations between the tone stimuli and the subsequent motor responses. That is, prior to the probes, there should be no associations between the tone stimuli and particular keypresses, because the keypresses are random up until the probe blocks. The transfer of sequence knowledge, therefore, appears to re2ect encoding that is related to associations among the tone stimuli. In accordance with the goal-based hypothesis, it appears that associations are formed between the intended consequences of the responses rather than between consequences and subsequent responses. Finally, it should be noted that the magnitude of the interaction, though signi1cant, was fairly small. In our previous SRT experiments, measures of implicit learning are generally about 40 ms. While the interaction re2ects a swing of about 30 ms, the Same probe is performed only 22 ms faster than the Alternative probe. However, the comparison between the Same and Alternative probes for the Tones group does not provide the same measure of learning that the more widely-used comparison between sequence blocks and random blocks does. First, any learning of the Alternative sequence within the probe block will reduce the magnitude of the advantage for the Same sequence probe. Second, the EC may not completely determine the nature of the sequence representation. That is, given the limited exposure to the tasks, responses are likely coded according to additional properties beyond their EC. Third, the random blocks were interspersed between the sequence blocks throughout training. This repeated exposure to the random blocks may have diminished learning.
33.3.3 Goal-based vs. stimulus-based learning The 1ndings from Experiment 1 indicate that stimulus information can affect the sequence representation when the stimuli are perceived as consequences of the participants’ actions. In fact, one straightforward explanation for the results is that participants in the Tones group were able to form stimulus-based representations whereas those in the No-tones group were not. The difference between this proposal and the goal-based hypothesis is that a stimulus-based representation need not include information about tones but instead may be restricted to the letter stimuli. While such an account is somewhat unlikely given the strong similarities between the two groups, stimulus information can play a critical role in sequence encoding when the stimulus information is not related to the EC. Such results have been reported in SRT tasks that require visual search (e.g. Stadler 1989) and in tasks using simple stimulus displays without distractors. Chan, Ivry, and Hazeltine (in preparation) followed up on a previous report by Pascual-Leone and colleagues about how learning transferred from one hand to the other (Wachs, Pascual-Leone, Grafman, and Hallett 1994). Because homologous effectors are symmetric with respect to the primary axis of the body, the relationship of the digits to their left-right position in extrapersonal space is mirror-reversed with respect to the two hands. That is, a leftmost response requires a keypress with the thumb of the right hand but a keypress with the little 1nger of the left hand. Thus, it is of interest whether, after training with one hand, performance with the other hand is better when the responses follow the same sequence of relative spatial positions (e.g. leftmost response, rightmost response), termed the parallel sequence, or when the responses follow the same sequence of effectors (e.g. thumb response, index 1nger response), termed the mirror sequence. In our second experiment, Chan et al. used centrally presented letters to indicate the appropriate responses. However, with these stimuli, it is possible to give the S–R mappings for the left and right
681
aapc33.fm Page 682 Thursday, December 6, 2001 11:11 AM
682
Common mechanisms in perception and action
hands two distinct relationships: the letters can be used to indicate particular keys, which we termed a spatial-mapping because each letter was always be associated with the same response location. Alternatively, the letters can be used to indicate particular digits, which we termed a digit-mapping. To gain understanding of the role stimuli play during intermanual transfer, we employed both types of S–R mappings with separate groups of participants. Note that for the spatial-mapping group, the order of stimuli was identical to the training sequence during the parallel probe, whereas for the digitmapping group, the order of stimuli was identical to the training sequence during the mirror probe. Distinct patterns of transfer were observed for the two S-R mapping groups. For the spatialmapping group, reaction times for the mirror and alternative probes were nearly identical, whereas the parallel probe was 36 ms faster. These costs are most consistent with either stimulus-based or response location-based accounts. However, for the digit-mapping group, the alternative and parallel probes were much more similar, with the parallel probe being faster by just 11 ms, whereas the mirror probe was 28 ms faster than the alternative probe.1 This group by probe interaction suggests that sequence learning involves the encoding of stimulus-based codes. How can these 1ndings be reconciled with the experiments favoring response-based accounts? In Experiment 1, transfer of sequence knowledge to identical series of stimuli occurred only when the two sets of responses were linked by common expected outcomes. That is, identical sequences of stimuli alone were insuf1cient to allow for the transfer of knowledge. An alternative explanation for the stimulus-dependent patterns of intermanual transfer relates to participants’ conceptualization of their responses. It may be that when participants encode the responses as movements of particular digits, mirror transfer is observed, and when they encode the responses as keypresses, parallel transfer is observed. This account can be seen as an extension of the EC hypothesis in that it supposes that identical movements are coded based not only on EC but also on participants’ conceptualizations of their actions. Such a proposal seems reasonable when one considers how the EC must act on the sequence representation. The EC for responses necessarily occur later than the selection processes, so obviously it is the expected EC, rather than the actual EC, that are critical for facilitating performance (see Ziessler and Nattkemper, this volume, Chapter 32). Furthermore, Hommel (1993) manipulated the relevance of the EC for the performance of a choice reaction time task using different sets of instructions and found that this procedure determined the pattern of S–R compatibility costs and bene1ts. Thus, the composition of the goal-based codes appears to be highly 2exible and subject to top-down control processes. Along these lines, Experiment 2 asks whether shared EC can act on the sequence representations in a similar manner as the shared stimuli did in the Chan et al. study.
33.4 Experiment 2 In Experiment 1, shared EC provided bene1ts in the performance of different sets of movements. Experiment 2 attempted to extend these 1ndings to examine whether EC affect the representations supporting intermanual transfer, thereby providing a potential explanation for some con2icting 1ndings in the SRT literature (e.g. Chan et al. in preparation; Wachs et al. 1994; Willingham et al. 2000). Moreover, to eliminate the possibility that transfer was being mediated by stimulus-based representations, distinct sets of stimuli were used for the two hands. For left-hand responses, letter stimuli were used as in the Chan et al. intermanual transfer experiment. For right-hand responses, spatial stimuli were used, akin to those used in the 1rst intermanual transfer experiment, except that these stimuli were arranged vertically.
aapc33.fm Page 683 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
Two key-to-tone mappings were used to determine whether EC would affect intermanual sequence transfer. The Spatial-mapping group used a key-to-tone mapping that entailed corresponding key locations producing the same tones. That is, for these participants, the leftmost keys on each keyboard would produce the same tone, as would the rightmost keys, and so on. The Digit-mapping group used a key-to-tone mapping in which corresponding digits produced the same tone. For these participants, movements of the index 1ngers of both hands made the same tones, as did the middle 1ngers, and so on. In this way, the tones assigned to particular stimuli were identical across the two groups, but the 1nger movements associated with the stimuli differed. Two probe blocks were used to assess the transfer of sequence knowledge as subjects performed with the right-hand after practicing with the left. For the parallel probe, the sequence of response locations was the same as was learned during training. For the mirror probe, the sequence of homologous effectors (e.g. left index 1nger and right index 1nger) was the same as was learned during training. Under these conditions, three proposed forms of sequence representation make distinct predictions about the patterns of reaction times across the two probes. If the representation is effector-based as proposed by Wachs et al. (1994), then the sequence representation should transfer best to the mirror probe, and it should be performed faster. Alternatively, if the representation is location-based as proposed by Willingham (1998), then the parallel probe should be performed faster. However, if the sequence of EC is learned, then the pattern of transfer should differ for the two Digit-mapping and Spatial-mapping groups. This interaction is predicted because the sequence of tones is dependent on both the mapping and the probe. Speci1cally, the Digit-mapping group produces the same sequence of tones during the mirror probe whereas the Spatial-mapping group produces the same sequence during the parallel probe.
33.4.1 Method 33.4.1.1 Apparatus and stimuli Two sets of stimuli were used to indicate the SRT responses: letter stimuli and location stimuli. The letters are described in the method section of Experiment 1. The location stimuli were white Xs that appeared in one of four locations arranged vertically in the center of the monitor. The tones emitted after responses and the color circles to be counted were identical to those used in Experiment 1. 33.4.1.2 Procedure Participants were told, as in Experiment 1, that they were learning to play musical instruments, two keyboard devices, and that the stimuli depicted which notes should be played. The procedure was similar to the one used in Experiment 1: the same circle-counting task was used and the timing of the stimuli was identical. Participants were randomly assigned to one of two experimental groups: the Spatial-mapping group or the Digit-mapping group. They were told that the letter ‘A’ indicated that the lowest frequency tone should be played, that the letter ‘B’ indicated that the second lowest frequency tone should be played, and so on. Similarly, they were instructed that the bottom X indicated that the lowest frequency tone should be played, the top X that the highest frequency tone should be played and so on. The mappings were always expressed in terms of stimuli to tones or keys to tones, but never in terms of stimuli to keys. The experiment consisted of 18 blocks of 60 trials each. Participants were instructed to use the four 1ngers of their left hands during the blocks with the letter stimuli and the four 1ngers of the right hands during the blocks with the spatial stimuli. Responses for each hand were made on separate
683
aapc33.fm Page 684 Thursday, December 6, 2001 11:11 AM
684
Common mechanisms in perception and action
response boards that were identical to those used in Experiment 1, except that a right-handed version was used in addition to the left-handed version. The two keyboards were mirror images of each other and possessed a natural correspondence to the digits of the left and right hands, respectively. The two groups differed only in the following respect: for the Spatial-mapping group, the leftmost key on the right-hand keyboard produced the low (80 Hz) tone; the key second from the left produced the second lowest (300 Hz) tone, and so on. For the Digit-mapping group, the opposite arrangement was used: the leftmost key on the right-hand keyboard produced the highest (3000 Hz) tone; the second from the left, the second highest (900 Hz), and so on. For both groups, the mapping for the left-hand keyboard was the same: the leftmost key produced the low tone; the key second from the left produced the second lowest tone, and so on. In this way, the mappings for the two groups induced different key–tone relationships between the two hands: corresponding locations produced identical tones for the Spatial-mapping group, whereas corresponding digits produced identical tones for the Digit-mapping group. Even-numbered blocks were performed with the left-hand and odd-numbered blocks with the right. The sequence was present on all even-numbered blocks starting on block 4. The odd-numbered blocks, except for the probes, were random. The two probes (parallel and mirror) occurred on blocks 15 and 17, with the order of the two counterbalanced across participants. The same procedure was used as in Experiment 1 to assess explicit knowledge of the sequence.
33.4.2 Results and discussions Four of the 22 experimental participants were not included in the data analysis. One participant was unable to perform the two tasks simultaneously and three others demonstrated some explicit knowledge of the sequence. Of the remaining 16 participants, there were 8 in each group with the 2 probe orders counterbalanced within each group. The analyses of reaction times were restricted to trials in which the correct response was made. Median reaction times for each participant were computed and averaged. The mean reaction times on the two probes for the two groups are shown in Fig. 33.2. The data were submitted to a two-way ANOVA, with Group (Spatial-mapping vs. Digit-mapping) as a between-subjects factor and Probe (parallel probe vs. mirror probe) as a within-subjects factor.
Fig. 33.2 Mean reaction times for the two groups on the two probes for Experiment 2. The error bars represent the standard error of the mean between subjects.
aapc33.fm Page 685 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
There was no main effect of Group [F < 1] or of Probe [F = 1.45; p > 0.2; MSE = 246.59], but the Group × Probe interaction did achieve signi1cance [F = 13.30; p < 0.005; MSE = 246.59]. As shown in Fig. 33.2, reaction times for the Spatial-mapping group were faster during the parallel probe [t(7) = 2.69; p < 0.05], whereas reaction times for the Digit-mapping group were faster during the mirror probe [t(7) = 2.72; p < 0.05]. In sum, participants were faster with the right hand when they produced the same sequence of tones as they did with the left hand, regardless of the actual movements required to produce the sequence of tones. There were no signi1cant effects on the accuracy data. These 1ndings indicate that intermanual sequence transfer can be affected by EC even when the critical stimuli differ between the two sequences. Eliminating the overlap between the two sets of stimuli left three forms of overlap between the left- and right-hand tasks: overlap between the spatial locations of the responses, overlap between the digits, and overlap between the EC. Given that there was no main effect of Probe but a signi1cant interaction between Probe and Group, the EC appear to play the critical role in the learned representation. As in Experiment 1, overlap between the output properties of the two sets of responses was not required for transfer. Thus, it appears that if participants conceptualize a response, such as pressing a key, as producing a particular tone, then sequence knowledge may transfer to other conditions under which different actions are given this meaning. How does this conclusion bear on studies whose results have favored stimulus-based sequence representations? If it is assumed that the goal-based representations can include, in addition to observable EC, intentions that are determined by the individual’s conceptualization of the task, then the present framework can account for transfer that is driven by the S–R mappings. In other words, it may be that the learned representation includes information about the intended meaning of the response. The present experiments have used EC to encourage participants to conceptualize particular responses (e.g. a keypress and a force pulse or a keypress with one hand and a keypress with the other) as being related, and this conceptualization is apparently suf1cient to allow for the transfer of sequence knowledge from one set of responses to the other. Perhaps other manipulations can perform the same role as EC when they in2uence the meanings that participants assign to their responses.
33.5 General discussion The present experiments indicate that sequence learning can incorporate the goals that the component responses are intended to achieve. While previous work has demonstrated that sequence knowledge does not necessarily transfer to identical movements with novel EC, Experiment 1 shows that novel sets of movements can bene1t from the performance of sequences that produced the same EC. Moreover, such transfer can occur between sets of responses with very different motor properties, making even 2exible versions of location-based accounts unlikely. Experiment 2 extends these 1ndings to show that the pattern of intermanual transfer can be affected by EC, providing a potential account for some seemingly contradictory 1ndings in the SRT literature. Theories of motor learning relying on different experimental paradigms have also emphasized that the encoded representations are abstract (e.g. Schmidt 1975). For example, MacKay and Bowman (1969) had bilingual individuals practice saying sentences in either English or German. During transfer, the equivalent sentences were produced in the other language. When their performance was compared to that of control participants who did not switch, no difference was observed. This
685
aapc33.fm Page 686 Thursday, December 6, 2001 11:11 AM
686
Common mechanisms in perception and action
1nding was interpreted as re2ecting the fact that the production of the individual words was already well practiced, and thus bene1ted little from the training. In contrast, the structure of the ideas conveyed by the sentences was novel, and was, therefore, the target of practice effects. The present experiments can be viewed as a highly analogous situation: the simple 1nger movements required by the SRT task likely do not bene1t much from practice; a more probable site for learning is the linking together of the goals associated with the responses. A related 1nding has recently been reported by Palmer and Meyer (2000). In their transfer study, pianists of varying skill levels practiced playing a short musical piece and then were tested on four different pieces. For experienced pianists, performance was best for the piece that was conceptually related (that is, consisted of the same melody), even compared with a piece requiring the same 1nger movements but producing a different melody. In other words, these participants appeared to be learning the melody rather than the speci1c motor commands necessary to produce the melody. As with the present experiments, the results indicate that practice establishes representations that are primarily based on abstract response codes that can include the intended EC. Imitation, like sequence learning, can support the acquisition of new behaviors, and often involves multiple components whose relative order is critical. Bekkering and Wohlschläger (this volume, Chapter 15) review evidence indicating that imitation is sensitive to the perceived goal of the original performer. For example, Gleissner, Meltzoff, and Bekkering (in press) report that three-year-old children are more likely to imitate the accomplishment of a goal than reproduce an actual pattern of movements. That is, when shown someone grabbing their own ear with the contralateral hand, children often grab their corresponding ear with the ipsilateral hand, producing a very different movement that does not cross the body’s midline. However, when shown similar movements that do not actually involve the grabbing of the ear (i.e. the movements terminate a few inches from the ear so that no salient goal is evident), the imitations are much less likely to deviate from the original movement. The manipulation of the goal salience in such experiments is analogous to the use of tones in the SRT tasks described here. When present, the tones appear to dominate the learned sequence representation so that transfer is observed across various effectors and movement types. However, when no clear EC are provided, many SRT studies report that learning is largely responsebased, just as the absence of salient goals in the imitation studies can lead to the more movementbased mimicry. In speci1c relation to SRT learning, Ziessler and Nattkemper (2001) have recently proposed that associations are formed between responses and subsequent stimuli. According to this account, participants interpret the critical stimulus for a given trial as the EC of the response made on the previous trial. In this way, a series of response-to-stimulus associations can be used to link successive trials together in an internal representation. Such a hypothesis bears an important similarity to the present proposal that associations among EC are encoded; both assume that the perceived outcomes of the individual’s actions form the sequence representation. The critical difference between the two hypotheses is that goal-based sequence representations proposed here entail the linking together of successive goals, rather than motor acts with environmental events. This property allows for transfer to novel sets of both stimuli and responses. The present proposal can be viewed as an extension of the egocentric coordinates hypothesis (Willingham 1998; see also Willingham et al. 2000), in that both emphasize the goal of the actions rather than the underlying motor commands. In a sense, the goal-based hypothesis takes this principle one step further to include events that are extrinsic to the body. Including EC in the sequence representation is, by itself, insuf1cient to explain many results that are also not well
aapc33.fm Page 687 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
accounted for by egocentric coordinate-based learning. In some cases, the EC for responses was altered and yet good transfer was observed (Grafton et al. 1998; Keele et al. 1995), or the EC remained the same and yet poor transfer was observed (e.g. Fendrich et al. 1991; Frensch and Miner 1995). The particular instructions and experimental setup may be critical for determining how responses are represented, and thus, how the sequence is encoded (see Hommel 1993; Bekkering and Wohlschläger, this volume, Chapter 15). In the present experiments, the tasks emphasized the importance of the EC, and participants likely focused on this aspect of their performance while producing the responses. These circumstances may have led to formation of sequence representations that minimized the in2uence of motor properties, allowing for the observed patterns of transfer. Ziessler and Nattkemper (this volume, Chapter 32) have provided independent support for the proposal that actions are programmed based in part on their expected EC. Using a very different experimental procedure (the 2anker task), the researchers report that when stimuli are presented simultaneously with the action effects associated with correct responses, reaction times are shortened. This facilitation is observed even though the action effects themselves require different responses from the critical stimuli. Thus, it appears that the programming of actions includes the anticipation of the EC. This proposal is clearly consistent with the pattern of sequence transfer in the present experiments. When identical motor acts produce different EC, they are not interchangeable during sequence performance, suggesting that they are encoded differently in the learned representation.
33.5.1 Bene1ts of learning based on EC One advantage of encoding a sequence in terms of desired outcomes is 2exibility. Learned behaviors may be relevant under a variety of circumstances with different starting conditions and motor parameters. Actions requiring a series of discrete movements, such as opening a milk carton or hoisting a sail, can be performed on objects that vary considerably in terms of their physical properties. Moreover, we may wish to use the left hand because the right is otherwise occupied. Encoding the intended goals of actions in the sequence allows the learned representation to be applicable across a range of such conditions. Because the representation links one desired outcome to the next, there can be minimal disruptions in performance when particular steps must be repeated or extended. For example, our implicit knowledge about how to make a peanut butter sandwich should be unaffected by the number of turns the lid requires to open the jar. The slow pace and simplicity of the single-joint movements made during the SRT task most likely encourages abstract encoding while leaving little room for lower-level motoric learning. However, the latter form of learning presumably dominates when more elaborately articulated movements are required. For example, when learning to throw a baseball, the goal—getting the ball to the intended target—remains constant but improvements occur in the timing and coordination of the muscle commands controlling the movements of the throw (Hore, Watts, Tweed, and Miller 1996). Such situations contrast sharply with the SRT task, during which the goals change rapidly but the movements are simple (see MacKay 1982; Willingham 1998). The distinction between goal-based learning and effector-based learning is supported by different patterns of transfer among various motor learning tasks. Sequence knowledge in the SRT task transfers to untrained effectors (Cohen et al. 1990; Keele et al. 1995; Willingham et al. 2000) but many motor skills (such as the ability to throw a baseball) do not (see also, Wright 1990).
687
aapc33.fm Page 688 Thursday, December 6, 2001 11:11 AM
688
Common mechanisms in perception and action
The present 1ndings provide a link between research demonstrating that practice effects can occur at relatively abstract levels of representation (e.g. MacKay 1982; Schmidt 1975) and recent studies proposing a common-coding theory of response selection (e.g. Hommel 1993; Prinz 1990). Already, many structures thought to be predominantly motoric have been shown to be sensitive to higher-level task parameters (e.g. Boussaoud et al. 1996; Carpenter et al. 1999; Crammond and Kalaska 1994; di Pellegrino et al. 1992; Wise et al. 1996). Recombining movements with their meanings may provide important insight into the cognitive architecture of motor learning.
Acknowledgements The author wishes to thank Rich Ivry and Steve Keele for many useful discussions about the data and their interpretation; Eric Ruthruff for helpful comments on an earlier draft of this manuscript; and Davina Chan, Sharon Badillo, and Colin Holbrook for assistance designing the experiments and collecting the data.
Note 1. The text describes a subset of the participants who did not show awareness; however, there were 18 participants who showed some indications of explicit knowledge of the sequence for each S–R mapping group. For these individuals, the pattern was similar though more dramatic. The spatial-mapping group had reaction times of 509, 513, and 403 ms, for the alternative, mirror, and parallel probes, respectively. The digit-mapping group had reaction times of 488, 427, and 468 ms, for the alternative, mirror, and parallel probes, respectively. Thus, the basic pattern, that the participants in the spatial-mapping group show bene1ts on the parallel probe whereas participants in the digit-mapping group show bene1ts on the mirror probe, is robust when all individuals are considered.
References Bekkering, H. and Wohlschläger, A. (2002). Action perception and imitation. This volume, Chapter 15. Boussaoud, D., di Pellegrino, G., and Wise, S.P. (1996). Frontal lobe mechanisms subserving vision-for-action versus vision-for-perception. Behavioural Brain Research, 72, 1–15. Chan, D., Ivry, R., and Hazeltine, E. (in preparation). Implicit and explicit learning of sequence information. Crammond, D.J. and Kalaska, J.F. (1994). Modulation of preparatory neuronal activity in dorsal premotor cortex due to stimulus–response compatibility. Journal of Neurophysiology, 71, 1281–1284. di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992). Understanding motor events: a neurophysiological study. Experimental Brain Research, 91, 176–180. Fendrich, D.W., Healy, A.F., and Bourne, L.E. Jr. (1991). Long-term repetition effects for motoric and perceptual procedures. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 137–151. Frensch, P.A. and Miner, C.S. (1995). The role of working memory in implicit sequence learning. Zeitschrift für Experimentelle Psychologie, 42, 545–575. Gabrieli, J.D.E., Stebbins, G.T., Singh, J., Willingham, D.B., and Goetz, C.G. (1997). Intact mirror-tracing and impaired rotary-pursuit skill learning in patients with Huntington’s disease: Evidence for dissociable memory systems in skill learning. Neuropsychology, 11, 272–281. Gleissner, B., Meltzoff, A.N., and Bekkering, H. (2000). Children’s coding of human action: Cognitive factors in2uencing imitation in 3-year-olds. Developmental Science, 3, 405–414.
aapc33.fm Page 689 Thursday, December 6, 2001 11:11 AM
The representational nature of sequence learning: evidence for goal-based codes
Goschke, T. (1998). Implicit learning of perceptual and motor sequences: Evidence for independent learning systems. In M.A. Stadler and P.A. Frensch (Eds.), Handbook of implicit learning, pp. 401–444. Thousand Oaks, CA: Sage. Grafton, S.T., Hazeltine, E., and Ivry, R. (1998). Abstract and effector-speci1c representations of motor sequences identi1ed with PET. Journal of Neurophysiology, 18, 9420–9428. Hallett, M. and Grafman, J. (1997). Executive function and motor skill learning. International Review of Neurobiology, 41, 297–323. Hazeltine, E. (under revision). The representational nature of sequence knowledge. Hazeltine, E., Chan, D., and Ivry, R.B. (under revision). Attention and the modularity of sequence knowledge. Hommel, B. (1993). Inverting the Simon effect by intention: Determinants of direction and extent effects of irrelevant spatial information. Psychological Research, 55, 270–279. Hore, J., Watts, S., Tweed, D., and Miller, B. (1996). Overarm throws with the nondominant arm: Kinematics of accuracy. Journal of Neurophysiology, 76, 3693–3704. Howard, J.H., Mutter, S.A., and Howard, D.V. (1992). Serial pattern learning by event observation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 1029–1039. Keele, S.W., Jennings, P., Jones, S., Caulton, D., and Cohen, A. (1995). On the modularity of sequence representation. Journal of Motor Behavior, 27, 17–30. MacKay, D.G. (1982). The problem of 2exibility, 2uency, and speed–accuracy trade-off in skilled behavior. Psychological Review, 89, 483–506. MacKay, D.G. and Bowman, R.W. (1969). On producing the meaning in sentences. American Journal of Psychology, 82, 23–39. Nattkemper, D. and Prinz, W. (1997). Stimulus and response anticipation in a serial reaction time task. Psychological Research, 60, 98–112. Packard, M.G. and White, N.M. (1991). Dissociation of hippocampus and caudate nucleus memory systems by posttraining intracerebral injection of dopamine agonists. Behavioral Neuroscience, 105, 295–306. Palmer, C. and Meyer, R.K. (2000). Conceptual and motor learning in music performance. Psychological Science, 11, 63–68. Prinz, W. (1990). A common coding approach to perception and action. In O. Neumann and W. Prinz (Eds.), Relationships between perception and action, pp. 167–201. Berlin, New York: Springer-Verlag. Schmidt, R.A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 94, 84–106. Schmidtke, V. and Heuer, H. (1997). Task integration as a factor in secondary-task effects on sequence learning. Psychological Research, 60, 53–71. Stadler, M.A. (1989). On learning complex procedural knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 1061–1069. Wachs, J., Pascual-Leone, A., Grafman, J., and Hallett, M. (1994). Intermanual transfer of implicit knowledge of sequential 1nger movements. Neurology, 44 (Suppl. 2), A329. Willingham, D.B. (1998). A neuropsychological theory of motor skill learning. Psychological Review, 105, 558–584. Willingham, D.B., Koroshetz, W.J., and Peterson, E.W. (1996). Motor skills have diverse neural bases: spared and impaired skill acquisition in Huntington’s disease. Neuropsychology, 10, 315–321. Willingham, D.B., Wells, L.A., Farrell, J.M., and Stemwedel, M.E. (2000). Implicit motor sequence learning is represented in response locations. Memory and Cognition, 28, 366–375. Wise, S.P., di Pellegrino, G., and Boussaoud, D. (1996). The premotor cortex and nonstandard sensorimotor mapping. Canadian Journal of Physiology and Pharmacology, 74, 469–482. Wright, C.E. (1990). Generalized motor programs: Reexamining claims of effector independence in writing. In M. Jeannerod (Ed.), Attention and performance, XIII, pp. 294–320. Hillsdale, NJ: Lawrence Erlbaum. Ziessler, M. (1994). The impact of motor response on serial pattern learning. Psychological Research, 57, 30–41. Ziessler, M. and Nattkemper, D. (2001). Learning of event sequences is based on response-effect learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 595–613. Ziessler, M. and Nattkemper, D. (2002). Effect anticipation in action planning. This volume, Chapter 32.
689
aapc33.fm Page 690 Thursday, December 6, 2001 11:11 AM
This page intentionally left blank
aapd01.fm Page 691 Wednesday, December 5, 2001 10:25 AM
Author index Abell, F. 372, 377 Abernethy, B. 198, 200 Ackroyd, K. 59, 61 Acosta, E., Jr. 474–5, 492, 552 Actis Grosso, R. 164, 174 Acuna, C. 116, 140, 155 Adams, J.A. 645, 671 Adelson, E.H. 385–6, 397 Agid, Y. 441 Aglioti, S. 57, 61, 65, 98–9, 113, 123, 134, 147, 152, 395, 398 Ahroon, W.A. 441 Aitken, A.M. 527, 535 Akhtar, N. 307, 312 Alexander, G.E. 114 Allard, T. 147, 153, 155 Allison, T. 351, 353, 376–7, 380 Allman, J.M. 371, 377 Allport, A. 18, 46, 470, 471 Allport, D.A. 153, 532, 535, 551, 610, 626 Alperson, B. 628, 635, 643 Anand, S. 114, 124, 134 Andersen, R.A. 138, 140, 153, 156, 353, 412 Anderson, E. 611, 626 Anderson, G.J. 438 Andreators, M. 299, 313 Andres, K.H. 137, 155 Anisfeld, M. 300, 311 Annet, J. 426–7, 437 Anstis, S. 413, 437 Anzola, G.P. 151, 153 Arbib, M.A. 76, 116, 290, 293, 315, 317, 321, 328, 332–3, 335, 354, 376, 378 Archibald, Y.M. 69, 116 Arnell, K.M. 522, 528, 535, 561–2, 583, 585 Arzi, M. 114, 117–18 Asanuma, C. 353 Aschersleben, G. 47, 174–5, 198, 200, 227–30, 234–6, 241–4, 263–4, 266, 308, 313, 316, 333, 434, 439, 521, 536, 541, 551 Ashbridge, E. 367, 370–1, 377, 380 Ashburner, J. 377 Ashby, F.G. 600–1, 605–7
Ashmead, D.A. 177, 185, 192 Athenes, S. 116, 321, 333 Athwal, B. 240, 242 Atkins, P. 629, 643 Attali, M. 117 Audley, R.J. 46 Avikainen, S. 354, 398 Azimi-Sadjadi, M.R. 228, 244, 252, 265
Babcock, M.K. 428, 437 Bachmann, T. 161, 175 Bahrick, L.E. 152, 153 Baizer, J.S. 372, 377 Bakan, P. 628, 635, 643 Baker, C.I. 362, 371–2, 374, 377–9 Baldo, J.V. 456, 471 Balint, R. 138, 153 Band, G.P.H. 501, 518–19 Bandura, A. 295–6, 312 Banks, B.S. 562, 585 Bar, M. 67, 113 Barash, S. 140, 153 Barber, P. 13, 445, 456, 471 Barclay, C. 389, 397 Bard, C. 229–31, 235, 242–3 Baron-Cohen, S. 372, 377 Barthelemy, C. 335, 353 Bashore, T.R. 481, 491, 496, 519, 607 Bassignani, F. 446, 473, 476, 492 Bassili, J.N. 410, 438 Batista, A.P. 140, 153, 156 Battaglia-Mayer, A. 154 Battaglini, P.P. 140, 154 Baud-Bovy, G. 382, 399, 419, 442 Bauer, P.J. 306, 312 Baum, A. 271, 285 Baylis, G.C. 21, 45, 47, 59, 61, 151, 156, 358, 371, 378, 380, 477, 491, 600, 607, 612, 627 Beall, A.C. 188, 192 Beardworth, T. 410, 438 Beatty, J. 520, 536 Beek, P.J. 197–9, 201–2, 205, 218, 221–5
aapd01.fm Page 692 Wednesday, December 5, 2001 10:25 AM
692
Author index
Behne, K.-E. 246, 264 Behrmann, M. 147, 155 Bekkering, H. 282, 284, 289–90, 292–4, 297, 303–5, 307–9, 312, 314–17, 331–3, 354, 378, 396, 398, 686–8 Bell, C. 407, 438 Beltzner, M.A. 84, 115 Bender, D.B. 376, 378 Benedetti, F. 147, 153 Benguérel, A.P. 429, 438 Benson, P.J. 354, 379–80 Bentin, S. 380 Benvenuti, F. 117 Berendsen, E. 517–18 Berger, J. 641, 644 Berkeley, G. 407, 438 Berlucchi, G. 395, 398 Bernieri, C. 114 Bernstein, N. 435, 438 Berry, D.M. 184, 187, 189, 192–3 Bertelson, P. 19–21, 29, 44–7, 477, 480, 490, 521, 535 Bertenthal, B.I. 389, 396, 398, 410, 438 Berthoz, A. 180, 183, 191 Berti, A. 59, 61 Bertoloni, G. 151, 153 Besner, D. 575, 579, 583, 628, 635, 643 Bettinardi, V. 290, 293, 332–3, 355, 380, 438 Bevan, R. 317, 333, 354, 379–80 Bhalla, M. 84, 108–10, 113, 174, 176 Bianchi, L. 154–5, 310, 312 Biederman, I. 68, 596–7, 606–7 Billon, M. 229–31, 234, 237–8, 242 Bingham, G.P. 199, 201 Binkofski, F. 111, 113, 292, 353 Bioulac, B. 140–1, 153, 156 Bisiacchi, P. 213, 225 Bjork, E.L. 534–5 Blake, R.R. 296, 312, 567–8, 583 Blakemore, S.J. 241–2, 271, 284 Blaser, E. 611, 626 Bles, W. 182, 191 Bloch, H. 670–1 Blouin, J. 229, 242 Boë, L.J. 429, 440 Boer, L.C. 20, 48 Boisson, D. 72, 84, 117–18, 135 Bonda, E. 376–7, 382, 398 Bonfiglioli, C. 321, 332 Bootsma, R.J. 198, 200 Borghese, A. 118 Börjesson, E. 412, 438 Bornstein, M.H. 412, 438 Boscagli, I. 117
Bosco, C.M. 490 Botvinick, M. 58–61, 136, 149–50, 153, 393, 395, 476, 490 Bourne, L.E. Jr. 688 Boussaoud, D. 119, 361, 372, 377, 396, 399, 688–9 Boutilier, C. 575, 583, 628, 643 Bowman, R.W. 685, 689 Boxtel, G.J.M., van 501, 518 Boyer-Zeller, N. 119 Bracewell, R.M. 140, 153 Bradley, D.C. 140, 153 Brandt, T. 183, 191 Brass, M. 293, 308–10, 312, 316, 332–3, 354, 378, 398 Braunstein, M.L. 412, 438 Brecht, M. 284, 539, 551 Breitmeyer, B. 269, 284 Brenneise-Sarshad, R. 300, 313 Brenner, E. 99, 111, 113, 119 Brewster, J.M. 228, 243 Bridgeman, B. 57–8, 60–1, 63–5, 81, 96–8, 100, 104–7, 110, 113–14, 120–2, 124–5, 127, 132–4, 145, 156, 159, 174–6, 269, 284, 381, 398, 587, 606, 614, 626 Brierly, K. 379 Broadbent, D.E. 444, 471 Brodsky, W. 260, 264 Broennimann, R. 379 Brothers, L. 372, 377, 396, 398 Brown, S. 231, 242 Brown, V. 626 Bruce, C. 381, 396, 398 Bruce, V. 357, 379 Brugger, P. 395, 398 Bruner, J.S. 670–1 Bryan, W.L. 429, 438 Buccino, G. 292, 351, 353 Buchtel, H.A. 151, 153 Buffuto, K.A. 441 Bukner, T. 410, 438 Bull, R. 116 Bullier, J. 71, 74, 77, 85, 114, 116–17, 119, 372, 379 Bullock, D. 610, 626 Bülthoff, H.H. 115, 123, 135, 154, 597, 606, 608 Bundesen, C. 625–6 Buneo, C.A. 140, 153 Bunz, H. 206, 209, 211, 224 Burbaud, P. 141, 153 Burchfiel, J.L. 139, 154 Burgess, P.R. 137, 153 Burnett, M. 243 Burns, B.D. 219, 225 Burt, P. 390, 398 Burton, P.C. 574, 584
aapd01.fm Page 693 Wednesday, December 5, 2001 10:25 AM
Author index
Butterworth, G. 670–1 Byrne, J.M.E. 302, 312 Byrne, R.W. 297–8, 301–2, 311–12
Caan, W. 317, 333, 350, 354, 379 Caessens, B. 528, 535 Cajal, S.R. 73, 114 Call, J. 297, 299, 312, 314 Camarda, R. 154, 335, 353–4 Caminiti, R. 141, 154–5, 310, 312, 396, 399 Campadelli, P. 422, 442 Campbell, F.G. 64, 114 Campbell, K.C. 477, 490 Caramazza, A. 411, 438 Carey, D.P. 58, 61, 69, 95, 98, 101, 114–17, 122, 135, 371, 378–9, 396, 398 Carpenter, M. 297, 307, 312, 688 Carrier, L.M. 522, 535, 563–4, 573, 577 Carrozzo, M. 118 Carson, R.E. 378 Carter, C.S. 490 Castiello, U. 28, 87, 114, 291–2, 294, 315, 321–2, 332, 395, 612, 626 Caulton, D. 673, 689 Cavada, C. 353 Cave, K.R. 559, 586, 608 Celebrini, S. 371, 380 Chan, D. 674, 681–2, 688–9 Chapman, C.D. 98, 119 Chelazzi, L. 625–6 Cherry, B. 213, 225 Chieffi, S. 151, 153 Chitty, A. 350, 354, 379, 399 Chmiel, N.R.J. 551 Chun, M.M. 562, 585 Chung, H.H. 140, 156 Church, R.M. 213, 218, 224, 226 Ciccarielli, C. 189, 192 Claparède, E. 407, 438 Clark, F.J. 137, 153 Clark, S. 281, 284 Clark, S.A. 147, 153 Clarke, E.F. 245, 257, 261–2, 264 Clarke, V.L. 230, 235, 243 Clymer, A.B. 67, 116 Clynes, M. 245, 264 Cochin, S. 335, 353 Coello, Y. 100–1, 114, 119 Cohen, A. 527, 536, 587–92, 594, 596–601, 605–7, 674, 689 Cohen, D.A.D. 141, 155 Cohen, J.D. 149–50, 153, 463, 471, 490, 628–30, 634, 643
Cohen, L. 441 Cohen, S.P. 20, 47 Cohen, Y. 161, 176 Colby, C.L. 77, 114, 139–40, 153, 352–3, 408, 439 Cole, J.D. 227, 231–2, 234, 240, 242–4 Colent, C. 84, 114 Coles, M.G.H. 476, 481–2, 490–1, 517–18, 589, 607, 648, 671 Coltheart, M. 567, 578, 583, 629, 631, 633, 643 Cook, M. 296, 313 Cooke, D.F. 142, 154 Cooke, J.D. 231, 242 Cools, R. 517–18 Cooper, F.S. 433, 440 Cooper, L.A. 413, 427, 433, 441 Cooper, M. 300, 313 Corbetta, M. 625–6 Coren, S. 412, 438 Corradini, M.L. 332 Costes, N. 292, 335, 354, 377, 398, 438 Coulter, J.D. 140, 155 Cowan, H.A. 429, 438 Cowan, W.B. 501, 518 Cowey, A. 94, 115, 119 Cox, S. 649, 671 Craft, J.L. 474–5, 490, 492, 595, 607 Craig, G.L. 649, 671 Craighero, L. 292, 318, 332, 352–3, 612, 626 Crammond, D.J. 139, 155, 426, 438, 688 Crane, H.D. 614, 626 Craske, B. 146, 153 Crebolder, J. 549, 551, 563, 582, 584 Creelman, C.D. 213, 224 Crelier, G. 398 Critchley, M. 138, 153 Curtis, B. 629, 643 Cutting, J. 389, 397 Cutting, J.C. 643 Cutting, J.E. 119, 438, 440
d’Amato, T. 119 d’Ydewalle, G. 491 Daffertshofer, A. 198, 201, 221–2, 224–5 Dalakas, M.C. 239, 244 Dalery, J. 119 Dalrymple-Alford, E.C. 628, 635, 643 Dancer, C. 156 Daprati, E. 115, 123, 135 Darwin, C. 315, 333 Dascola, I. 161, 176, 610, 627 Da-Silva, J.A. 188, 192 Dassonville, P. 161, 175 Davidson, R.J. 433, 439
693
aapd01.fm Page 694 Wednesday, December 5, 2001 10:25 AM
694
Author index
Davis, G. 147, 153 Day, B.L. 95, 97–8, 114, 300 de’Sperati, C. 422, 427–9, 438 Decety, J. 58, 69, 116, 290–3, 318, 331–2, 335, 351, 353–4, 376–7, 380, 382, 394, 398–9, 426–7, 438–9 Deecke, L. 481, 491 Dehaene, S. 68, 114 Dehaene-Lambertz, G. 114 Deininger, R.L. 444, 472 DelColle, J.D. 219, 225 Dell, G.S. 628, 632, 643 Dell’Acqua, R. 522, 528, 535, 549, 551, 560–3, 568, 575, 577–8, 582–4 Della Sala, S. 267, 284 Dennett, D. 199, 201 Dennis, I. 628, 635, 643 Desimone, R. 119, 361, 370–2, 377, 379, 381, 398 Desmurget, M. 65, 97, 98, 114–15, 117, 119, 150–1, 153, 156 DeSouza, J.F.X. 57, 61, 65, 113, 123, 134 Deubel, H. 95, 159, 161, 176, 282, 284, 556, 609, 611, 613–14, 623–4, 626–7 DeYoe, E.A. 538, 551, 588, 607 Di Lollo, V. 562, 568, 578, 584 Dichgans, J. 183, 191 Dickinson, A. 270, 284 Dijkerman, H.C. 61, 114, 116 Dimitrov, G. 159, 172, 175 Ding, M. 225 Dittrich, W.H. 389, 398, 410–11, 415, 439 DiZio, P. 148, 153 Dobbins, A.C. 371–2, 377 Dohle, C. 113 Donchin, E. 20, 48, 476, 481–2, 490–1, 517–18, 569, 583, 607, 671 Donders, F.C. 443–4, 470–1 Done, D.J. 495, 518 Donkelaar, P., van 243 Dosher, B. 611, 626 Doupe, A.J. 299, 312 Drake, C. 246, 262, 264 Drewing, K. 241, 242 Driver, J. 18, 47, 114, 116, 147–8, 153, 600, 607 Dubois, B. 441 Dubon, D. 183, 192 Duffy, F.H. 139, 154 Dugas, C. 116 Duhamel, J.-R. 139–40, 153, 352–3, 408, 439, 441 Dumas, P. 119 Dunbar, K. 463, 471, 628, 630, 643 Duncan, J. 447–8, 467, 471, 477, 479, 488, 490, 522, 528, 535, 568, 586, 610, 626 Dunlap, K. 228, 242
Duren, L., van 448, 458, 468–9, 473 Dutta, A. 496, 519 Dwivedi, A. 20, 47
Echallier, J.E. 156 Eckman, P. 433, 439 Edwards, M.G. 60–1, 118, 289, 315 Egeth, H.E. 534, 537, 628–9, 633–4, 644 Ehrenstein, A. 447–8, 458, 467–9, 471 Eimas, J.L. 411, 439 Eimer, M. 68, 114, 268–9, 273–4, 283, 285, 403, 456, 471, 496, 501, 517–18 Ellis, R.R. 98, 114 Elsner, B. 241–2, 270, 284, 550–1, 646, 669, 671 Emery, N.J. 357, 372, 377 Endo, H. 426, 440 Engbert, R. 219––21, 224–5 Engel, A.K. 161, 176, 270, 284, 539, 551 Epstein, C.M. 114 Ercolani, L. 154 Ericksen, C.W. 490 Eriksen, B.A. 534, 536, 569, 583, 589, 607, 648–9, 652, 671 Eriksen, C.W. 501, 518, 534, 536, 569–70, 583, 589, 607, 648–9, 652, 671 Ertsey, S. 159, 176 Essen, D.C., van 538, 551, 588, 607 Essick, G. 353 Ettlinger, G. 74, 115, 139, 154–6 Evans, A. 376–7, 382, 398 Evarts, E.V. 239, 244
Fadiga, L. 154, 161, 175, 289–90, 292–3, 309, 312–13, 315, 317–18, 332–5, 349, 352–5, 376–8, 380, 382, 399, 426, 438–9, 441, 612, 626, 688 Fagot, C. 570, 576, 584 Fahle, M. 115, 123, 135 Falmagne, J.C. 19, 20, 47 Farah, M.J. 411, 439 Farné, A. 117–18, 148, 153 Farrell, J.M. 675, 689 Fattori, P. 140, 154 Faugier-Grimaud, S. 74, 115, 350, 353 Fazio, F. 290, 292–3, 332–3, 355, 377, 380, 398, 438 Fearnley, S. 526, 536 Feintuch, U. 556, 587, 592, 594, 607 Fendrich, D.W. 687–8 Ferraina, S. 140, 154–5, 310, 312 Ferraresi, P. 154 Ferreira, V.S. 628, 635, 643 Ferrier, D. 138, 154
aapd01.fm Page 695 Wednesday, December 5, 2001 10:25 AM
Author index
Finch, D.M. 671 Findlay, J.M. 611, 627 Fink, E.A. 441 Fink, G.R. 353 Finke, R.A. 158, 175, 409, 439 Fischer, B. 624, 626 Fiser, J. 371, 377 Fisher, D.M. 98, 130, 396, 399 Fissell, K. 490 Fitts, P.M. 10, 47, 100, 115, 433, 444–5, 471–2 Flanagan, J.R. 84, 98, 114–15 Flanders, M. 151, 156 Flanigan, H.P. 628, 644 Flash, T. 416, 439, 441 Fleury, M. 229–30, 242–3 Flinchbaugh, B.E. 410–11, 439 Fodor, J.A. 411, 439 Fogassi, L. 140, 144, 153–4, 161, 175, 289–90, 292–3, 309, 312–13, 315, 317, 332, 334–6, 349, 352–5, 377, 382, 399, 426, 438–9, 441, 688 Fonlupt, P. 291, 293, 394, 399 Ford, S.K. 219, 225 Forget, R. 231, 242–3 Forrin, B. 21, 47, 458–9, 469, 472 Forsman, T. 125, 134 Forss, N. 354, 398 Found, A. 591, 607 Fox, P.T. 427, 441 Fox, R. 396, 398, 567–8, 583 Frackowiak, R.S.J. 240, 242, 377 Fraisse, P. 228–30, 234, 243, 255, 264 Franck, N. 119 Frange, H. 117 Frankish, C.R. 281, 285 Franklin, P.E. 574, 586 Franz, V.H. 99, 115, 123, 135 Franzel, S.L. 559, 586 Frassinetti, F. 59, 61 Frenois, C. 115, 350, 353 Frensch, P.A. 670, 687–8 Freund, H.-J. 113, 229, 243, 292, 353 Freyd, J.J. 158, 175, 390–2, 394, 398–9, 409–10, 413–15, 428, 434, 437, 439, 441 Friedman, C.J. 441 Friedman-Hill, S.R. 608 Fries, P. 284, 539, 551 Frigen, K. 569, 585 Friston, K. 377 Frith, C.D. 240–3, 267, 271, 284, 377, 495, 518 Frith, U. 377 Frost, B.J. 183, 193 Frost, D. 74, 117, 121, 135 Frykholm, G. 441
Frymire, M. 184, 191, 193 Fujita, I. 371, 378 Fukada, Y. 380 Fukushima, K. 376–7 Fukusima, S.S. 188, 192 Fulton, J.F. 139, 156
Gabrieli, J.D.E. 673, 688 Galdikas, B.M. 298, 313 Gallagher, S. 145, 154 Gallese, V. 140, 154, 161, 175, 289–94, 309–10, 312–13, 315, 317–18, 331–2, 334–6, 340, 350, 352–5, 371, 376–7, 379, 382, 393, 396, 399, 426, 438–9, 441, 688 Galletti, C. 140, 154 Gandhi, S. 144, 154 Garasto, M.R. 154 Garber, S.R. 189, 192 Garing, A.E. 182, 185, 191–2 Gattas, R. 114 Gattis, M. 290, 292, 303, 312, 314, 316, 332 Gaunet, F. 117, 118 Gauthier, G.M. 231, 242–3 Gawryszewski, L. 475, 492, 573, 585 Gaymard, B. 613, 626 Geesken, R.H.J. 496, 519 Gegenfurtner, K.R. 115, 123, 135 Gehring, W.J. 671 Gehrke, J. 174–5, 228–30, 242–3, 266 Gelade, G. 527, 537, 540, 559, 586, 591, 608 Gelder, T., van 435, 441 Gemmer, A. 125, 134 Gentilucci, M. 67, 98, 106–7, 115, 123, 135, 141, 144, 154, 156, 321–2, 332, 337, 349–50, 353–4 Gentner, D.R. 218, 226 Georges-François, P. 380 Georgopoulos, A.P. 116, 119, 140–1, 154–5, 161, 176, 380, 416, 440, 538, 551 Gerjets, D.A. 628, 635, 644 German, W.J. 139, 156, 550, 659, 685 Ghahramani, Z. 150, 157, 240, 244 Ghez, C. 150, 154 Ghilardi, M.F. 154 Giannopulu, I. 183, 191 Gibbon, J. 213, 218, 224 Gibbs, B.J. 545, 552 Gibson, E.J. 428, 439 Gibson, J.J. 180, 191–2, 412, 419, 439 Giesbrecht, B.L. 568, 578, 584 Gillam, B. 412, 439 Girard, P. 74, 114–15 Girgus, J.S. 412, 438 Glaser, M.O. 629, 643
695
aapd01.fm Page 696 Wednesday, December 5, 2001 10:25 AM
696
Author index
Glaser, W.R. 629, 643 Gleason, C.A. 267, 285 Gleissner, B. 303–5, 312, 686, 688 Gleitman, H. 296, 312 Glickstein, M. 335, 354 Gnadt, J.W. 140, 153 Goetz, C.G. 673, 688 Goldberg, M.E. 140, 156, 408, 439 Goldman, A. 291–2 Goldman-Rakic, P.S. 353 Gollwitzer, P.M. 311–12 Good, P.F. 372, 377 Goodale, M.A. 57–8, 61, 63–7, 69–71, 74, 76, 84, 86, 94, 96–9, 101, 110–11, 113, 115–17, 121–3, 134–5, 140, 154, 156, 177, 193, 290–1, 293, 321–2, 333, 352–4, 356, 371, 378–9, 381, 383, 398–9, 587, 607, 610, 626 Gordon, J. 151, 154 Gordon, P.C. 629, 643–4 Gore, J.C. 380 Gormican, S. 588, 608 Goschke, T. 689 Gossweiler, R. 174, 176 Goten, K., van der 528, 535 Govier, L.J. 208, 224 Grady, C.L. 378 Grafman, J. 675, 681, 689 Grafton, S.T. 114–15, 290, 293, 317, 332, 335, 351, 354, 376, 378, 676, 687, 689 Grainger, J. 575, 583 Grassi, F. 292, 377, 398, 438 Gratton, G. 476–7, 481–2, 490, 517–18, 607, 671 Graziano, M.S.A. 58–61, 136, 140, 142–4, 151, 154, 156, 335, 349, 354, 374, 378, 393, 395 Gréa, H. 65, 69, 95, 98, 115, 117 Greenwald, A.G. 241, 243, 308, 312, 454, 472, 527, 536, 629, 643 Gregory, M. 9, 444, 471, 642 Gregson, R.G. 205, 224 Grèzes, J. 292, 318, 332, 335, 351, 353–4, 377, 398, 438 Grieser, D. 411, 439 Gross, C.G. 114, 138, 140, 144, 151, 154, 156, 335, 349–50, 354, 357, 374, 376, 378, 381, 398 Grossberg, S. 610, 626 Grossenbacher, P.G. 147, 153 Grüsser, O.J. 408, 439 Guendry, F.E. 180, 182, 192 Guiard, Y. 491 Guic-Robles, E. 155 Guigon, E. 155, 310, 312
Guth, D.A. 177, 192 Guttentag, R.E. 628, 635, 643
Hackley, S.A. 404, 453, 473–4, 476, 478, 481, 490–1, 493, 519 Haffenden, A. 57, 61, 67, 84, 97–9, 115, 123, 135 Haggard, P. 174–5, 199, 200, 266, 268–9, 273–4, 281, 283–5 Hahn, S. 627 Haith, M.M. 628, 635, 643 Haken, H.H. 206, 211–12, 221, 224–5 Haller, M. 629, 643 Hallett, M. 675, 681, 689 Ham, R. 297, 302, 314 Hambuch, R. 208–9, 222, 226 Hanes, D.P. 119 Hansen, R. 174–5 Happe, F. 377 Hari, R. 335, 354, 396, 398 Harless, E. 161, 645, 671 Harm, M.W. 629, 643 Harries, M.H. 350, 354, 371, 378–80, 382, 399 Harris, C.S. 148, 154 Harris, M.H. 317, 333 Harris, P.L. 306, 314 Harter, N. 429, 438 Hartje, W. 74, 115 Harvey, M. 116 Hary, D. 252, 264 Hasbroucq, T. 9, 47, 445, 472, 475, 491, 495, 518, 552, 588, 607, 642–3 Hasselmo, M.E. 358, 364, 366–7, 378 Havings, J. 643 Hawley, K.J. 529, 536 Haxby, J.V. 370–1, 376, 378, 538, 552 Hay, J.C. 148, 154 Hazeltine, E. 199, 201, 218, 225, 241, 243, 270, 285, 556–7, 673–6, 681, 688–9 Head, A.S. 359, 368, 379, 614 Healy, A.F. 688 Hedge, A. 446, 472, 475, 491, 500, 518 Hefter, H. 113 Heijden, A.H.C., van der 159, 176, 532, 537, 629, 644 Heit, F. 114 Heit, G. 122, 134 Held, R. 74, 117, 121, 135 Heller, D. 591, 607 Hellige, J.B. 213, 225 Helmholtz, H., von 63, 112–3, 182, 407–8, 439 Helmuth, L.L. 218, 224 Henaff, M.A. 376, 380 Hendry, D. 64, 114
aapd01.fm Page 697 Wednesday, December 5, 2001 10:25 AM
Author index
Hendry, S.H.C. 140, 155 Henik, A. 260, 264, 641, 644 Hennings, M. 241–2 Hepp-Raymond, M.C. 398 Heptulla-Chatterjee, S. 387, 392–3, 398–9, 410, 415, 439, 441 Herscovitch, P. 378 Hershberger, W.A. 161, 174–5 Hershman, R.L. 579, 584 Hertz, H. 409–10, 439 Heuer, H. 224, 674, 689 Heywood, C.A. 94, 115 Hietänen, J.K. 354, 370, 373, 378–80, 396, 398 Hildreth, E. 384, 398 Hill, E.W. 177, 192 Hillix, W.A. 579, 584 Hinrichs, J.V. 474, 492 Hirstein, W. 145, 148, 151, 156 Hockey, R.J. 611, 627 Hoesen, G.W., van 77, 119 Hoff, B. 321, 332 Hoffman, D.D. 410–11, 439 Hoffman, E.A. 371, 376, 378 Hoffman, J.E. 611, 626 Hoffmann, J. 527, 536, 646, 669, 671 Hofsten, C., von 410, 412, 438–9, 670–1 Holmes, G. 138, 155 Holst, E., von 161, 176, 270, 285, 408, 439 Hommel, B. 3, 34, 41, 47, 68, 112, 115, 161, 175, 241–3, 270, 283–5, 405, 434, 439, 444, 454, 457, 466–8, 470, 472, 474–7, 484, 488–9, 491, 496, 501, 518, 521, 527–9, 533–42, 545–52, 564, 571–4, 576, 578, 582–6, 646–7, 669, 671, 682, 687–9 Hood, B.M. 58, 61 Hore, J. 687, 689 Horwitz, B. 378 Houghton, G. 612, 627 Houle, S. 379 Howard, D.V. 689 Howard, J.H. 689 Howard, L.A. 156, 612, 627 Hsieh, H. 470–1 Hsieh, S. 18, 46 Hu, X.T. 144, 154, 349, 354, 374, 378 Hubbard, T.L. 158, 171, 175 Hubel, D.H. 383, 398, 588, 607 Huber, L. 297, 312 Hudson, P.T.W. 629, 644 Huemer, V. 125, 132–4 Hueting, J.E. 20, 48, 481 Hume, D. 271, 285 Hummel, J.E. 596–7, 606–7 Humphrey, D.G. 505, 518
Humphreys, G.W. 57, 59–1, 118, 315 Humphreys, N.K. 73, 115 Husain, M. 116 Hyde, M.L. 141, 155 Hyman, R. 19, 47 Hyvärinen, J. 115, 335, 337, 348–9, 354
Iacoboni, M. 290, 293–4, 310, 312, 317, 333, 335, 351, 354, 376, 378, 396, 398 Iggo, A. 137, 155 Ikeda, K. 148, 154 Ingram, H.A. 239, 243 Inhoff, A.W. 213, 225 Irwin, D.E. 282, 284, 627 Ishio, A. 213, 225 Ito, M. 371, 378 Ivry, R.B. 199, 201, 218, 224–5, 261, 265, 527, 536, 591, 600–1, 605–7, 673–4, 676, 681, 688–9 Iyer, V.S. 245, 264
Jackson, S.R. 70, 98–9, 115 Jagacinski, R.J. 219–20, 225 Jakobson, L.S. 69, 115–16, 122, 135, 321–2, 333, 371, 378 James, W. 161, 175, 241, 243, 408, 439, 527, 536, 628, 645, 671 Jansson, G. 410, 439 Jeannerod, M. 58, 65, 68–9, 74, 76, 84, 87, 99, 110–11, 114, 116–17, 153, 156, 292, 332, 352, 354, 376–8, 398, 426, 438–9, 527, 536 Jeeves, M.A. 69, 116, 379 Jeka, J.J. 219, 225 Jellema, T. 289–91, 294, 310, 335, 351, 356, 362, 367–8, 372–4, 377–8, 382, 393, 396 Jenkins, W.M. 147, 153, 155 Jennings, P. 673, 689 Jeo, R.M. 371, 377 Jersild, A.T. 470, 472, 551 Johansson, G. 370, 378, 409–12, 415, 439 Johnson, C. 189, 192 Johnson, P.B. 140, 154–5, 396, 399 Johnson, W.S. 228, 243 Johnston, J.C. 521, 536, 563, 565, 570–1, 576, 581, 585–6 Johnston, W.A. 529, 536 Jolicœur, P. 175, 281, 285, 522, 526, 528, 532, 535–6, 542, 549, 551, 555, 558, 560–8, 576–8, 582–6 Jolley-Rogers, G. 214, 225 Jones, E.G. 140, 155 Jones, M.R. 219, 225, 263–4 Jones, S. 673, 689
697
aapd01.fm Page 698 Wednesday, December 5, 2001 10:25 AM
698
Author index
de Jong, R. 161, 175, 448, 454, 456–7, 459, 461, 466, 469, 471, 474–5, 481, 488, 490, 496, 500, 517–18, 522, 526, 528, 532, 535, 565, 567, 578, 582–3 Jonides, J. 612–13, 626 Jordan, J.S. 148, 155, 158–9, 171, 173–5 Jordan, M.I. 150, 153, 157, 240, 244 Jouen, F. 183, 192 Jusczyk, P. 411, 439
Kaas, J.H. 137, 155 Kahneman, D. 520, 532, 536, 545, 662 Kalaska, J.F. 141, 155, 688 Kandel, S. 429–31, 440 Kanizsa, G. 389, 398 Kant, I. 272, 285 Kapur, S. 370, 379 Kaseda, M. 371, 379 Kawamichi, H. 426, 440 Kawarasaki, A. 156 Kawato, M. 213, 226, 271, 285 Kay, B.A. 211, 225 Keele, S.W. 10, 47, 527, 536, 673, 687, 689 Keillor, J.M. 115 Keir, R. 296, 313 Keith, D. 146, 153 Kelso, J.A.S. 206, 209, 211–12, 216–17, 219, 224–5, 435, 440 Kenemans, J.L. 481, 492 Kennard, C. 116 Kennedy, T.M. 219, 226 Kenny, F.T. 146, 153 Kerzel, D. 158–9, 162, 164, 171–2, 175–6, 316, 331, 333 Keuss, P.J.G. 492 Keysers, C. 372, 377 Kieras, D.E. 521, 532, 536, 565–6, 585 Kikuki, Y. 426, 440 Kim, C.C. 140, 156 King, D.J. 85, 119 Kirby, N.H. 19, 20, 22, 46–7, 477, 491 Kirch, M. 65, 114, 122, 134 Kirveskari, E. 398 Kirveskari, S. 354 Klapp, S.T. 219, 225 Klatzky, R.L. 177, 180, 192 Klein, G.S. 643 Klein, R. 175 Kleine, B.-U. 476, 491 Kliegl, R. 224–5 Klinger, M.R. 574, 584 Klotz, W. 68, 116 Knoblich, G. 534, 537
Knoll, R.L. 280–1, 285 Knuf, L. 158 Koch, R. 228–9, 243–4 Koechlin, E. 114 Köhler, S. 370, 379 Köhler, W. 296, 312 Kok, A. 481, 492 Kolb, B. 138, 152, 155 Kolers, P.A. 228, 243, 413, 440 Kollias, S. 398 Komilis, E. 65, 116, 156 König, P. 161, 176, 284 Kornblum, S. 9, 10, 12–14, 16, 18, 19, 22, 42, 46–8, 51, 445, 463, 472–3, 475, 480, 488, 491–2, 495–6, 500, 518–19, 552, 587–8, 607, 629, 642–4 Kornhuber, H.H. 180, 192, 481, 491 Koroshetz, W.J. 673, 689 Korte, A. 413–14, 440 Kosslyn, S.M. 113, 116 Kourtzi, Z. 383, 398 Kovacs, I. 98, 116 Kowler, E. 611, 626 Koyama, T. 380 Kozlowski, L.T. 389, 397, 410, 412, 438, 440 Kramer, A.F. 490, 505, 518, 627 Kramer, S.J. 410, 438 Krampe, R.T. 209, 219, 224–5 Krams, M. 377 Krappmann, P. 86, 95, 97, 116 Kristofferson, A.B. 206, 208, 222, 226 Kruger, A.C. 298, 301, 314 Kruysbergen, N.W., van 228, 244 Kugler, P.N. 435, 440 Kuhl, P.K. 299, 312, 411–12, 439, 440, 527, 536 Kunde, W. 241, 243, 646, 671 Kurths, J. 224 Kusunoki, M. 118, 380 Kutas, M. 481, 491 Kuypers, H.G.J.M. 137, 156 Kymissis, E. 299, 313
LaBerge, D. 626 de Labra, C. 404, 453, 473–4, 493, 519 Lachmann, T. 522, 537 Lackner, J.R. 146, 148, 153, 155, 182, 192 Lacquaniti, F. 118, 140, 154–5, 310, 312, 416, 440 Ladavas, E. 148, 153 Laissard, G. 429, 442 Lajoie, Y. 230, 239, 242–3 Lamarre, Y. 231, 242–3 Lamberts, K. 491 Lammertyn, J. 528, 535
aapd01.fm Page 699 Wednesday, December 5, 2001 10:25 AM
Author index
Langton, S.R.H. 357, 359, 379 Lappin, J.S. 628, 644 Large, E.W. 263–4, 326, 330, 339, 420 Larish, J.F. 505, 518 Lassonde, M. 74, 117 Lathan, C.E. 148, 153 Lauber, E. 474, 490, 496, 518 Lauer, E.R. 496, 519 Lavie, N. 600, 607, 649, 671 Le Bihan, D. 114 Le Clec, H.G. 114 Leavitt, J.L. 321, 333 Lederman, S.J. 98, 114, 180, 192 Lee, D.N. 181, 183, 192 Lee, J.W. 10, 47 Lefkowitz, M.M. 296, 312 Leiguarda, R.C. 294, 313 Leinonen, L. 335, 337, 348–9, 354 Leiser, D. 641, 644 Lejeune, B. 335, 353 Lepecq, J.-C. 183, 191–2 Lepore, F. 74, 117 Leutgeb, S. 119 Leuthold, H. 20, 47, 453, 466, 470, 472, 476, 479, 488, 492, 514, 518 Levelt, W.J.M. 629, 633, 643 Leventhal, A.G. 119 Levin, D.T. 609, 627 Levin, H. 428, 439 Levine, B. 473, 492 Lewis, S. 114, 122, 134 Lewis, T. 208, 224 Li, E. 568, 586 Li, L. 84, 118 Liang, C.-C. 474, 490, 496, 518 Liberman, A.M. 427, 433, 440 Libet, B. 199, 267–9, 272, 274–5, 277, 280, 282–3, 285 Lichtey, L. 387, 399, 410, 441 Lieberman, P. 432, 440 Lin, C.-S. 137, 155 Ling, L. 118 Linnankoski, I. 354 Lints, T. 299, 313 Lishman, J.R. 183, 192 Livingstone, M.S. 588, 607 Lloyd, D. 156 Lobo, D.H. 271, 285 Logan, G.D. 454, 472, 476, 491, 501, 505, 518–19, 556, 564, 572–3, 578, 584, 628, 630, 633–4, 640–3 Logothetis, N.K. 361, 379 Lombardino, A.J. 299, 313 Loomis, J.M. 177, 188, 192
Lorençeau, J. 386–7, 398 Lorincz, E.N. 371, 379 Lortie, C. 59, 61, 151, 156, 612, 627 Los, S.A. 469–70, 472, 513, 518 Lotze, R.H. 407, 440, 527, 536, 645, 671 Lu, C.-H. 444, 455–6, 466, 470, 472, 474, 476, 480, 489, 491–2, 496, 501, 518–19 Lubbe, R.H.J., van der 481, 488, 493 Luce, R.D. 18, 19, 45–7, 496–7, 518 Luck, S.J. 559, 569, 577, 584, 586 Ludwig, C. 228, 243 Lueschow, A. 371, 379 Luppino, G. 154, 335, 353–5, 399 Luppino, M. 154 Lurito, J.T. 416, 440 Lusher, D. 315 Lynch, J.C. 116, 140, 155 Lyon, I.N. 95, 97–8, 114
McCann, R.S. 521, 536, 563, 565, 570–1, 576, 585 McCarthy, G. 351, 353, 376–7, 380 Mach, E. 182, 192, 408–9, 440 McClain, L. 628, 633, 643 McClay, B. 476, 493 McClelland, J.L. 463, 471, 628–31, 643–4 McCloskey, D.I. 67–8, 119 McDaniel, C. 396, 398 McDonald, J.E. 629, 644 McGlone, F. 156 McIntyre, J. 151, 155 Mack, A. 65, 106, 119 MacKay, D.G. 685, 687–9 MacKay, W.A. 139, 155 Mackeben, M. 613, 626 MacKenzie, C.L. 116–17, 321, 333 McKeon, R. 295, 313 Macko, K.A. 610, 626 MacLeod, C.M. 12, 47, 595, 607, 628–30, 643 McMillan, A. 189, 192 McMullen, P.A. 411, 439 Maddox, W.T. 600, 606–7 Magen, H. 589, 591, 607 Magne, P. 100, 114 Magno, E. 273, 283, 285 Mahmud, S.H. 159, 176 Majeres, E. 628, 633–5, 643 Maki, W.S. 569, 578, 585 Malsburg, C., von der 538, 552 Mandler, J.M. 306, 312 Marble, J.G. 450, 453–4, 464–6, 469, 472–3 Marcel, T. 21, 47, 267, 459, 472 Marchetti, C. 267, 284 Marcus, S.M. 281, 285
699
aapd01.fm Page 700 Wednesday, December 5, 2001 10:25 AM
700
Author index
Marean, G.C. 411, 440 Mari, M. 315 Marr, D. 356, 364, 371, 379 Marsden, C.D. 294, 313 Marsh, N.W.A. 446, 472, 475, 491, 500, 518 Marshall, J. 74, 119, 121, 135 Marshburn, E. 219, 225 Marteniuk, R.G. 100, 116–17, 321–2, 333 Martin, O. 60, 65, 97–8, 117, 315 Martineau, J. 335, 353 Masaki, H. 481, 488, 491 Massey, J.T. 141, 154, 416, 440 Mateeff, S. 159, 175, 440 Matelli, M. 154, 156, 290, 293, 333, 335, 338, 350, 353–5, 380, 396, 399 Mates, J. 228, 230, 234–5, 243–4, 247, 263–4 Mather, G. 410, 440 Matin, E. 67, 116 Matin, L. 67, 116, 408, 440 Mattingley, J.B. 112, 114, 116, 147, 153 Mattingly, I.G. 433, 440 Maunsell, J.H.R. 371, 380 Mauritz, K.-H. 239, 244 Mayr, U. 225 Mays, L.E. 408, 440 Mazziotta, J.C. 293, 312, 332–3, 354, 378, 398, 438 Meck, W.H. 213, 224 Meckler, C. 153 Mecklinger, A. 534, 537 Meenan, J.P. 154 Meikle, T.H. 73, 119 Meiran, N. 473, 492 Melara, R.D. 628, 633, 644 Meltzoff, A.N. 152, 155, 295, 297, 299–301, 303, 307, 311–13, 396, 399, 411, 440, 527, 536, 686, 688 Melzack, R. 152, 156 Merzenich, M.M. 137, 147, 153, 155 Mewaldt, S.P. 552 Meyer, A.S. 643 Meyer, D.E. 481, 491, 521, 532, 536, 565–6, 585, 629, 643–4 Meyer, M.M. 411, 439 Meyer, R.K. 686, 689 Miall, R.C. 243 Michael, D. 136, 189, 192, 558 Michaels, C.F. 199, 201 Michel, F. 58, 69, 72, 116–17, 376, 380 Michon, J.A. 213, 225, 252, 261, 264 Michotte, A. 266, 285, 409, 440 Midgett, J. 174, 176 Miedreich, F. 243 Mignot, J.C. 114
Mikami, A. 380 Miller, B. 687, 689 Miller, E.K. 371, 379 Miller, G. 577, 585 Miller, J. 522, 537, 589, 592, 594, 607 Miller, J.D. 411, 440 Miller, J.O. 478, 481–2, 485, 490–1 Milne, A.B. 98, 117 Milner, A.D. 57–8, 61, 63–4, 69–71, 74, 76, 86, 94–5, 97, 99, 101–3, 109–12, 114–6, 121–2, 135, 290–1, 293, 352–6, 371, 378–9, 381, 383, 398–9, 587, 607, 610, 626 Mine, S. 119, 161, 176, 380 Mineka, M.D. 296, 313 Miner, C.S. 687–8 Mischel, W. 296, 312 Mishkin, M. 58, 61, 74, 94, 119, 121, 135, 139, 155, 157, 290–1, 293, 352, 355, 370, 378, 380, 610, 626 Mistlin, A.J. 350, 354, 373, 379, 382, 399 Mitra, P.P. 299, 313 Mitrani, L. 159, 172, 175 Mittelstaedt, E. 270, 285 Mittelstaedt, H. 161, 176, 408, 439 Miyake, I. 228, 243 Moffet, A.M. 139, 155 Mohler, C.W. 73, 116 Mokler, A. 624, 626 Molen, M.W., van der 492, 496–7, 501, 517, 519 Monsell, S. 18, 35, 47–8, 470, 473, 578, 585 Mon-Williams, M. 116 Moore, C. 410, 438 Moore, G.P. 252, 264 Moore, L.P. 628, 644 Moore, M.K. 152, 155, 297, 300–1, 311, 313, 396, 399, 527, 536 Moortele, P.-F., van de 114 Mordkoff, J.T. 37, 39, 47, 453, 466, 470, 472, 594, 607 Morel, A. 74, 77, 85, 114, 116, 119, 372, 379 Morgan, J.L. 300, 313 Morgan, R. 152, 155 Morin, R.E. 458, 469, 472 Moriya, M. 380 Morrison, J.H. 372, 377 Morrison, R. 410, 438 Morton, J. 281, 285 Moscovitch, M. 147, 155, 370, 379 Mounoud, P. 422–4, 442 Mountcastle, V.B. 75, 116, 137, 139–41, 155, 182 Mounts, J.R.W. 628, 633, 644 Mouton, J.S. 296, 312 Movshon, J.A. 383, 385–6, 397, 399 Muckenhoupt, M. 562, 585
aapd01.fm Page 701 Wednesday, December 5, 2001 10:25 AM
Author index
Mueller, M. 114 Mulder, G. 481, 492 Mulder, L.J.M. 481, 492 Müller, H.J. 591, 607, 612–13, 626 Müller, K. 229, 239, 243 Müller, U. 228, 243 Münsterberg, H. 645, 671 Murata, A. 118–19, 161, 176, 371, 379–80 Murdock, B.B., Jr. 581, 585 Müri, R. 398, 613, 626 Murphy, K.J. 154, 371, 379 Murray, E.A. 139, 155 Murray, J.T. 534–5 Mushiake, H. 144, 155 Müsseler, J. 47, 158–9, 161, 164, 171, 175–6, 241, 243, 404–5, 434, 439, 520–3, 526–34, 536–7, 540–1, 550–2, 578, 582–3, 585 Mutter, S.A. 689
Naccache, L. 114 Nagell, K. 298–9, 313 Nagle, M. 114, 122, 134 Nakamura, K. 140, 156 Nakayama, K. 389, 399, 613, 626 Nalwa, V. 358, 378 Nattkemper, D. 241, 244, 270, 282–3, 285, 556, 645, 665, 672, 675, 682, 686–7, 689 Neely, J.H. 574, 585 Neisser, U. 559, 585 Nelson, J.M. 219, 225 Nelson, R. 137, 155 Neumann, O. 68, 116, 520, 532, 536, 610, 626 Newcombe, F. 138, 156 Newman, C. 273, 285, 321, 323, 360, 621–2 Newstead, S.E. 628, 635, 643 Nico, D. 58, 61 Nicoletti, R. 444, 473, 475, 492, 573, 586 Nicolle, D.A. 154 Nishihara, H.K. 364, 371, 379 Nixon, P.D. 76, 118, 139, 156 Norman, D.A. 501, 518, 628, 632, 644 Nottebohm, F. 299, 313 Nowak, L. 85, 117 Nuñez, L.N. 119 Nyman, G. 335, 337, 349, 354 Nystrom, L.E. 490
O’Boyle, D.J. 230, 235, 242–3 Ochs, M.T. 155 Oh, A. 146, 473, 492 O’Leary, M. 445, 456, 471 Oléron, G. 230, 243
Olguin, R.S. 298, 313 Olson, C.R. 114, 671 Oram, M.W. 317, 333, 361–4, 367, 370, 372, 375–80, 382, 396, 398–9, 426, 440 O’Regan, J.K. 159, 176 Oriet, C. 281, 285, 522, 536, 542, 552, 555, 558 Orliaguet, J.-P. 429, 440 Ortega, J.E. 354, 379 Osaka, N. 159, 176 Osman, A. 9, 47, 445, 472, 475, 481, 491, 495, 518, 552, 588, 607, 642–3 Ostry, D. 376–7, 382, 398 Otto-de Haart, E.G. 98, 117
Packard, M.G. 673, 689 Padden, D.M. 411, 440 Paillard, J. 63, 72, 78, 117, 121, 135, 228, 230, 232, 235, 239, 242–3 Palmer, C. 246, 264, 686, 689 Palmer, S.E. 437, 440 Pandya, D.N. 137, 156–7, 335, 338, 354–5, 372, 380 Panzeri, S. 380 Paprotta, I. 611, 623, 626 Paquet, L. 649, 671 Pareni, D. 293 Parsons, L.M. 426–7, 440–1 Pascual-Leone, A. 681, 689 Pashler, H. 21, 45, 47, 477, 491, 520–2, 532, 535–6, 563–8, 570, 573, 576–7, 580, 584–5, 588, 607, 609–10, 626 Passingham, R.E. 76, 118, 139, 156, 377 Pastore, R.E. 412, 441 Patashnik, O. 225 Patterson, T. 482, 491 Paulesu, E. 355, 380 Paulignan, Y. 87, 88, 95, 114, 116–17 Pauls, J. 361, 379 Paulson, K. 569, 585 Pavani, F. 99, 117 Pavard, B. 183, 191 Pavel, M. 386, 399 Pavesi, G. 290, 292, 309, 312, 317, 332, 335, 353 Pearl, D.K. 267, 285 Pearson, R.C.A. 137, 156 Pechmann, T. 643 Pedotti, A. 154, 353 Peery, S. 114, 124, 134 Pélisson, D. 64–6, 77, 87, 111, 115–18, 122, 135, 156 di Pellegrino, G. 148–9, 153–4, 161, 175, 289–92, 309, 312, 315, 332, 334, 350, 353, 376–7, 426, 438, 688–9 Pellizzer, G. 416, 440–1
701
aapd01.fm Page 702 Wednesday, December 5, 2001 10:25 AM
702
Author index
Penel, A. 246, 262, 264 Peper, C.E. 198, 201, 205, 218, 221–5 Perani, D. 290, 292–3 332–3, 355, 377, 380, 398, 438 Perenin, M.-T. 68–71, 74, 110, 117–18 Peronnet, F. 115 Perrett, D.I. 289–91, 294, 310, 317, 328, 333–5, 350–1, 354, 356–67, 370–80, 382, 393, 396, 398–9, 426, 440 Perrot, D.R. 419, 441 Peru, A. 147, 152 Peters, M. 234, 244 Peterson, E.W. 673, 689 Peterson, L.R. 581, 585 Peterson, M.R. 581, 585 Petrides, M. 354, 376–7, 382, 398 Phaf, R.H. 629, 642, 644 Philbeck, J.W. 188, 192 Phillips, W.A. 568, 585 Piaget, J. 295, 297, 313, 315, 333, 669, 671 Pick, H.L., Jr. 58–61, 148, 154, 174, 176–7, 185–6, 189, 192, 241, 244 Pierce, A. 245, 264 Pierce, R. 245, 264 Pierrot-Deseilligny, C. 613, 626 Pigarev, I.N. 154, 349, 353 Pillon, B. 441 Pinker, S. 559, 585 Pinto, J. 145, 156, 289, 291–2, 294, 381, 389, 393–4, 398–9, 410, 438, 441 Pisella, L. 57–62, 64, 69, 76–7, 81–4, 86, 88, 90–1, 94–8, 102, 105–6, 110–11, 114–15, 117–18, 120–2, 128, 133, 140, 156, 174, 176, 381, 587, 608 Pitts, G.S. 574, 584 Place, U.T. 64, 117 Plenacoste, P. 114 Poggio, T. 357, 361, 379–80 Poincaré, H. 206, 408–9, 441 Pollack, I. 520, 536 Polyak, S. 73, 117 Pomerantz, P. 413, 440 Pöppel, E. 74, 117, 121, 135, 228, 230, 243 Poranen, A. 75, 115 Port, R.P. 435, 441 Posnansky, C.J. 629, 644 Posner, M.I. 161, 176, 627 Posse, S. 113 Post, A.A. 225 Post, R.B. 183, 192 Potter, D.D. 379 Potter, M.C. 561–2, 585 Poulson, C.L. 299, 313 Powell, T.P.S. 137, 140, 155–6
Praamstra, P. 476, 479, 488, 491 Prablanc, C. 64–5, 97–8, 114–15, 117, 119, 122, 135, 150–1, 153, 156 Prablanc, D. 65, 116 Prablanc, M. 65, 115 Pressing, J. 198, 201, 205, 214, 223–5, 247, 263–4, 574 Price, M.C. 67, 95, 117 Prinz, W. 3, 47, 61, 161, 171, 174–6, 227–30, 234–6, 241–4, 263–4, 266, 282–3, 285, 297, 303, 308, 312–13, 315–16, 332–3, 352, 354, 381, 394, 397, 399, 434, 439, 444, 472, 496, 518, 521, 527–8, 536, 541, 545, 551–2, 646, 671, 675, 688–9 Prinzmetal, W. 456, 471, 600, 606–7 Pritchard, C.L. 116 Pritchatt, D. 628, 633, 644 Proctor, R.W. 404, 443–4, 447–8, 450, 452–74, 476– 7, 480, 488–92, 496, 501, 505, 513–14, 518–19, 595, 608, 628, 634, 644 Procyk, E. 86, 118, 292, 377, 398, 438 Proffitt, D.R. 84, 108–10, 113, 174, 176, 410, 412, 438 Prud’homme, M. 141, 155 Prum, E. 266, 285 Ptito, A. 74, 117 Ptito, M. 74, 117 Puce, A. 351, 353, 376, 380 Puce, I. 376–7 Puleo, J.S. 441
Rabbitt, P.M.A. 21, 47, 526, 530, 536, 612–13, 626 Rabuffetti, M. 117 Racicot, C.I. 154 Radford, K. 440 Radil, T. 228, 230, 243 Rafal, R.D. 600, 607 Ragot, R. 481, 492 Ramachandran, V.S. 145, 148, 151, 156, 413, 437 Rapoport, S.I. 378 Ratcliff, G. 138, 156 Ratcliff, R. 492 Ratner, H.H. 301, 314 Raymond, J.E. 561, 585 Rayner, K. 629, 644 Rea, J.G. 119 Redding, G.M. 148, 156, 628, 633, 635, 644 Redolfi, M. 382, 399, 419, 442 Redondo, M. 488, 493 Reeve, K.F. 299, 313 Reeve, L. 299, 313 Regan, J. 628, 635, 644 Regard, M. 395, 398
aapd01.fm Page 703 Wednesday, December 5, 2001 10:25 AM
Author index
Régnier, C. 79, 81–3, 86, 105, 117–18, 133, 135 Reinhard, M. 671 Reiss, L.A.J. 144, 154 Remington, R.J. 19, 48 Rémond, A. 481, 492 Renault, B. 481, 492 Renkin, E. 19, 47 Rensink, R. 609, 627 de Renzi, E. 138, 153 Repp, B.H. 198–200, 228, 230, 235, 244–8, 250–1, 255–6, 261–5 Requin, J. 9, 47, 491, 500, 518 Restle, F. 412, 441 Revonsuo, A. 64, 72, 112, 118 Richaud, S. 114 Ridderinkhof, K.R. 404, 494, 496–7, 501, 505, 517, 519 Riddoch, M.J. 59, 61, 72, 118 Ridley, R.M. 139, 156 Ridley-Johnson, R. 396, 399 Riesenhuber, M. 357, 361, 380 Rieser, J.J. 58–61, 174, 176–7, 181, 184–7, 189, 192–3, 241, 244 Riggio, L. 161, 176, 475, 492, 573, 585, 610, 612, 627 Rivaud, S. 613, 626 Rizzolatti, G. 76, 116, 144, 151, 153–4, 156, 161, 175–6, 289–93, 309, 312–13, 315, 317–18, 328, 332–7, 340, 349–55, 376–8, 380, 382, 393, 396–9, 426–7, 438–9, 441, 610, 612, 624, 626–7, 688 Robertson, I. 58, 61 Robertson, R.G. 380 Robinson, D.A. 163, 176 Robinson, D.L. 140, 156 Rocha-Miranda, C.E. 376, 378 Rochat, P. 152, 155–6 Rode, G. 72, 84, 114, 117–18, 135 Roelfsema, P.R. 161, 176, 539, 551 Roelofs, C. 57, 61, 81, 100, 106, 124–28, 130–5 Rogers, R.D. 35, 48, 419, 441, 470, 473, 578, 585 Roll, J.P. 146, 156 Roll, R. 146, 156 Rolls, E.T. 112, 118, 317, 333, 350, 354, 358, 371, 374, 378–80 Romanes, G.J. 315, 333 Rorden, C. 116, 147, 153 Rosenbaum, D.A. 219, 225 Rosenberg, K.E. 454, 472 Rosenblatt, F. 538, 552 Rosenblum, M. 224 Ross, N.E. 562, 585
Rossetti, P. 156 Rossetti, Y. 62, 64, 67, 69–72, 74, 76–88, 94–5, 97–100, 102, 105–22, 128, 133, 135, 140, 151, 153, 156, 174, 176, 587, 608 Rouiller, E. 119 Roux, S. 335, 353 Roy, E.A. 98, 119 Rubichi, S. 475, 492 Rubinstein, B. 260, 264 Ruch, T.C. 139, 156 Rudell, A.P. 474, 492, 573, 586 Rugg, M.D. 269, 285 Rumelhart, D.E. 628–9, 631–2, 644 Rumiati, R.I. 289 Runeson, S. 415, 419, 441 Rushworth, M.F.S. 76, 118, 139, 156 Russon, A.E. 297–8, 301–2, 311–13 Ruthruff, E. 522, 528, 532, 537, 581, 586
Saadah, E.S.M. 152, 156 Saetti, M.C. 115 Sainburg, R. 154 Saito, H.-A. 380 Saito, S. 213, 225 Sakata, H. 76, 111, 116, 118–9, 139–40, 155–56, 161, 176, 371, 376, 378–80 Salenius, S. 354, 398 Salerno, J. 378 Salin, P.-A. 114 Salthouse, T.A. 579–81, 585–6 Saltzman, E.L. 211, 225 Sanders, A.F. 448, 458, 468–9, 473, 495–6, 519 Sanders, M.D. 74, 119, 121, 135 Sanes, J.N. 239, 244 Santee, J.L. 534, 537 Saoud, M. 100, 119 Sato, S. 595, 598, 606, 608 Savage-Rumbaugh, E.S. 298, 301, 313–14 Scandolara, C. 154, 156, 349, 353 Scarpa, M. 332 Schaal, B. 311–12 Schall, J.D. 74, 77, 85, 114, 119 Schapiro, M.B. 378 Scheerer, E. 407, 441 Scheffczyk, C. 224 Schlaghecken, F. 68, 114, 501, 517–18 Schmidt, H. 600–2, 608 Schmidt, R.A. 645, 671, 685, 688–9 Schmidt, R.C. 199 Schmidtke, V. 674, 689 Schmitz, F. 243 Schmolesky, M.T. 85, 119 Schneider, G.E. 73, 119–20, 135
703
aapd01.fm Page 704 Wednesday, December 5, 2001 10:25 AM
704
Author index
Schneider, R. 416, 442 Schneider, W.X. 95, 161, 176, 282, 284, 589, 608–11, 613, 623, 626–7 Schnitzler, A. 243, 476, 491 Scholz, J.P. 212, 214, 225 Schomaker, L.R. 429, 441 Schöner, G. 211–13, 219, 222, 225 Schriefers, H. 643 Schroter, H. 472 Schulkind, M.D. 564, 572–3, 578, 584 Schultz, D.W. 501, 518 Schulze, H.-H. 214, 225, 252, 255, 265 Schwartz, A.B. 76–7, 85, 119, 416, 441 Schwartz, S.P. 411, 441 Schwarz, W. 534, 537 Seal, J. 140, 156 Seashore, R.H. 235, 244 Seashore, S.H. 235, 244 Sechenov, I. 408, 441 Sedgwick, E.M. 232, 242 Seeger, C.M. 10, 47, 444, 472 Seibert, M. 361, 380 Seidenberg, M.S. 628–9, 643–4 Seitz, R.J. 113, 292, 353 Selst, M., van 565, 581, 586 Seltzer, B. 335, 338, 354–5, 372, 380 Semjen, A. 198, 200–, 214–6, 225, 231, 242, 255, 265 Sereno, A.B. 371, 380 Sergent, V. 213, 225 Servos, P. 115 Shaffer, L.H. 48, 225, 446–8, 450, 452, 454, 456, 464, 466–9, 473 Shallice, T. 411, 442, 501, 518 Shankweiler, D.P. 433, 440 Shapiro, K.L. 561, 568–9, 585–6 Shaw, A. 98–9, 115 Sheliga, B.M. 612, 627 Shelton, J.R. 411, 438 Shepard, R.N. 390, 399, 410, 413, 426–7, 432–5, 441 Shepherd, M. 611, 627 Sherrick, C.E. 419, 441 Sherrington, C.S. 271, 285 Shibutani, H. 156 Shields, C. 94, 115 Shiffrar, P. 145, 156 Shiffrar, M. 289–93, 381, 383, 386–7, 390–5, 397–9, 410, 413–15, 434, 437, 439, 441 Shiffrin, R.M. 589, 608 Shimamura, A.P. 456, 471, 501, 519 Shimojo, S. 389, 399 Shin, J.C. 261, 265 Shiu, L.-P. 48, 51, 488, 492
Shorland, B. 156 Shorter, A.D. 594, 608 Shoup, R. 588–90, 596–600, 606–7 Shove, P. 245, 265 Shyi, G.C.-W. 158, 175 Siegel, G.M. 189, 192, 300, 313 Siegel, R.M. 353 Silver, P.H. 69, 116 Silverman, G. 389, 399 Simon, J.R. 10, 48, 68, 112, 137, 153, 444, 473–5, 490, 492, 541, 552, 573, 586, 595, 607 Simons, D.J. 609, 627 Singer, M.H. 628, 635, 644 Singer, W. 161, 176, 284, 527, 537, 539, 551 Singh, J. 673, 688 Siqueland, E.R. 411, 439 Sirevaag, E.J. 490 Sirigu, A. 426, 441 Skavenski, A.A. 174–5 Small, A.M. Jr. 326, 330, 475, 492, 567 Smania, N. 147, 152 Smeets, J.B.J. 99, 111, 113, 119 Smid, H.G.O.M. 481, 492 Smith, L.B. 435, 441 Smith, M.C. 21, 48, 483, 492, 521, 537, 574, 579, 586 Smith, P.A.J. 379 Smulders, F.T.Y. 481, 492 Snyder, L.H. 140, 153, 156 Soechting, J.F. 145, 151, 156 Soetens, E. 19–21, 48, 472, 480–1, 484, 489–90, 492, 514, 519 Sommer, W. 20, 47, 472 Soury, J. 408, 441 Sparks, D.L. 86, 119, 408, 440 Sparrow, W.A. 198, 200 Speidel, C.R. 552 Spence, C. 148, 153 Spence, K.W. 296, 313 Sperling, A. 65, 114, 122, 134 Sperling, G. 390, 398, 559, 563, 586 Spetner, N.B. 410, 438 Spinnler, H. 267, 284 Sprague, J.M. 73, 119 Springer, C.J. 629, 644 Squires, K.C. 20, 48 Squires, N.K. 20, 48 Stadler, M.A. 681, 689 Stanford, T.R. 86, 119 Stanton, G.B. 140, 156 Stark, L. 64, 114, 122, 134, 145, 156 Stebbins, G.T. 673, 688 Steele, C.M. 614, 626 Steenhuis, R.E. 177, 193
aapd01.fm Page 705 Wednesday, December 5, 2001 10:25 AM
Author index
Stein, D.G. 115, 350, 353 Steininger, S. 528–9, 536–7 Stelmach, G.E. 72, 117, 153 Stelt, O., van der 517, 519 Stemwedel, M.E. 675, 689 Stenneken, P. 227, 234–5, 244 Stephan, K.M. 113, 292 Sternberg, S. 280–1, 285, 564, 577, 586–7, 608 Stevanovski, B. 536, 555, 558 Stevens, G.T. 9, 10, 12, 17, 47–8, 491, 496, 500, 518 Stevens, J. 291, 293, 394–5, 399 Stevens, L.T. 203, 225 Stewart, M.I. 608 Stoerig, P. 94, 119 Stoet, G. 270, 283, 285, 405, 533, 537–41, 545, 547–8, 552, 582–3, 586 Stoffels, E.J. 447–9, 467–9, 473, 476, 488–9, 492, 496, 513, 519 Stoffregen, T.A. 199, 201 Stolz, J.A. 568–9, 575, 578, 583, 586, 628, 643 Stork, S. 158, 164, 176 Stratta, F. 155 Strayer, D.L. 505, 518 Strick, P.L. 140, 156 Stricker, W. 408, 436, 441 Stroop, J.R. 12, 48, 575, 586, 595, 608, 628, 644 Stucchi, N. 164, 174, 246, 265, 394, 399, 415–19, 427–29, 437–8, 442 Studdert-Kennedy, M. 433, 440 Stürmer, B. 308–9, 313, 316, 331, 333, 472, 476, 479, 488, 492, 514, 518 Stuss, D.T. 473, 492 Styles, E.A. 18, 46, 470–1 Subramaniam, B. 611, 626 Sugg, M.J. 629, 644 Sumi, S. 410, 441 Summers, J.J. 197–9, 201, 219, 225–6 Sur, M. 137, 155 Sussman, A.L. 113, 116 Suzuki, R. 213, 226 Sweet, J.B. 161, 175, 522, 526, 528, 532, 535, 567, 578, 582–3
Tadary, B. 332, 438 Tagliabue, M. 446, 455, 470, 473, 476, 480, 489, 492 Taira, M. 75–6, 111, 118–19, 140, 156, 161, 176, 380 Takaoka, Y. 156 Takasawa, N. 481, 491 Takeda, T. 426, 440
Talor, C. 177, 192 Tamura, H. 371, 378 Tanaka, K. 371, 375–6, 378, 380 Tanaka, Y.Z. 118, 372, 380 Tanatsugu, Y. 144, 155 Tanji, J. 144, 155 Tanné, J. 76–7, 85, 119 Tannenhaus, M.K. 628, 635, 644 Tarr, M.J. 597, 606, 608 Taublieb, A.B. 146, 155 Tavernier, G. 491 Taylor, C.S.R. 154 Taylor, J.L. 68, 119 Taylor, T.L. 67–8, 119 Tchernichovski, O. 299, 313 Teasdale, N. 229–30, 242–3 Telford, C.W. 521–2, 537 Telford, L. 183, 193 Terzuolo, C.A. 416, 440, 442 Thaut, M.H. 228, 244, 252, 265 Theeuwes, J. 613, 624, 627 Theios, J. 20, 47 Thelen, E. 396, 399, 435, 441 Theodor, L. 574, 586 Thinus-Blanc, C. 118 Thomas, A. 410, 438 Thomas, S. 317, 333, 354, 379–80 Thomassen, A.J.W.M. 429, 441 Thompson, I.D. 383, 399 Thompson, K.G. 119 Thorndike, E.L. 296, 298, 313, 315, 333 Thornton, I.M. 164, 176, 394, 399, 410, 441 Tian, B. 228, 244, 252, 265 Tilikete, C. 115, 117 Tipper, S.P. 59, 61, 148, 151, 156, 551, 612, 627 Tisseyre, F. 535 Todd, J.A. 219, 225 Tolhurst, D.J. 383, 399 Tomasello, M. 296, 298–9, 301–2, 307, 311–14 Tombu, M. 281, 285, 522, 536, 542, 555, 558, 565–6, 586 Toni, I. 115 Toth, J.P. 454, 457, 473, 476, 492 Townsend, S. 59, 61 Travis, L.L. 306, 310, 312, 314 Treisman, A.M. 34, 48, 527, 537, 540, 545, 552, 559, 586, 588, 591, 595, 598, 600–2, 606, 608, 627 Treisman, M. 213, 226 Trevarthen, C.B. 73, 119, 121, 135 Treves, A. 380 Trotter, Y. 371, 380
705
aapd01.fm Page 706 Wednesday, December 5, 2001 10:25 AM
706
Author index
Tsutsui, K.-I. 380 Turatto, M. 522, 535 Turner, R.S. 114 Turvey, M.T. 435, 440 Tweed, D. 687, 689 Tysseyre, F. 19, 47 Tzelgov, J. 641, 644
Ulrich, R. 482, 491 Umiltà, C.A. 161, 176, 292, 318, 332, 444–6, 455, 473, 475–6, 492–3, 573, 585–6, 610, 612, 626–7 Ungerleider, L.G. 58, 61, 74, 76, 94, 119, 121, 135, 139, 157, 290–1, 293, 352, 355, 361, 370, 372, 377–8, 380, 538, 552, 610, 626 Urquizar, C. 114
Valle-Inclán, F. 404, 453, 466, 473–4, 476–8, 481, 488, 491–3, 513–14, 517, 519 Varoqueaux, D. 141, 153 Velay, J.-L. 146, 156 Vercher, J.L. 243 Vergiles, N.Y. 408, 442 Verleger, R. 476–7, 493, 671 Vermersch, A.I. 613, 626 Vervaeck, K.R. 20, 48 Vicario, G.B. 164, 174 Vighetto, A. 68–9, 115, 117 Vigorito, J. 411, 439 Vindras, P. 79, 119 Virzi, R.A. 628–9, 633–4, 644 Vishton, P.M. 98–9, 119 Viviani, P. 119, 174, 176, 246, 265, 382, 394–5, 397, 399, 404, 406, 415–25, 429, 437–8, 440–2 Vogel, E.K. 559, 569, 577, 584, 586 Vogt, B.A. 137, 157, 671 Vogt, S. 316, 333 de Vooght, G. 528, 535 Vorberg, D. 207–9, 214, 222, 225–6, 247, 255, 263, 265, 643 Vos, P.G. 228, 244 Vu, K.-P.L. 404, 443, 450, 453, 456–66, 472–3, 476, 488, 505, 513–14, 519, 595, 608 Vyas, S.M. 526, 536 Vygotsky, L.S. 302, 314
Wachs, J. 681–3, 689 Wachsmuth, E. 359, 380 Wade, N.J. 182, 193
Wagner, D.G. 185, 188–90, 192–3 Wallace, B. 148, 156 Wallace, R.J. 151, 157, 573, 586 Wallach, H. 384, 399 Walsh, V. 371, 380 Walters, R.H. 295, 312 Wang, H. 444, 456, 472–3, 496, 519 Wang, Y. 119 Want, S.C. 306, 314 Ward, R. 555, 568, 579, 586 Warren, W.H. 183, 193 Warrington, E.K. 74, 119, 121, 135, 411, 442 Wascher, E. 648, 671 Watson, J.S. 152–3 Watt, R.J. 357, 379 Watts, S. 687, 689 Wauschkuhn, B. 671 Waxman, A.M. 361, 380 Wegener, J. 139, 154 Wei, J.Y. 137, 153 Weiskrantz, L. 70, 73–4, 78, 115, 119, 121, 135 Welch, R.B. 148, 157 Welford, A.T. 496, 519–20, 537 Wells, L.A. 675, 689 Werner, L.A. 411, 440 Wertheim, A.H. 183, 193 Wertheimer, M. 413, 442 West, S. 410, 440 Westwood, D.A. 98–9, 119 Wetekam, B. 246, 264 Whipple, A. 9, 47, 491, 500, 518 Whishaw, I.Q. 138, 152, 155 White, J.M. 86–7, 97, 119 White, N.M. 673, 689 Whiten, A. 297, 301–2, 314 Wickens, C.D. 20, 48, 490, 532, 537 Wicker, B. 362, 372, 374, 376–8, 380 Wicklegren, W.A. 375, 380 Wieringen, P.W.C., van 198, 200, 225 Wiesel, T. 383, 398 Willats, P. 672 Williams, J. 19, 48 Willingham, D.B. 673, 675–6, 682–3, 686–9 Wing, A.M. 197–9, 202, 206–9, 214, 218, 222, 226, 247, 263, 265 Winocur, G. 370, 379, 473, 492 Wise, S.P. 396, 399, 688–9 Woestenburg, J.C. 481, 488, 493 Wohlschläger, A. 228, 244, 282, 284, 289–90, 292, 294, 303, 305, 307–9, 312, 314–16, 332–3, 396, 686–8 Wolfe, J.M. 559, 586, 588–9, 595, 598, 600, 606, 608, 625, 627 Wolff, P. 161, 176
aapd01.fm Page 707 Wednesday, December 5, 2001 10:25 AM
Author index
Wolpert, D.M. 150, 157, 240–2, 244, 271, 284–5 Wong, E. 65, 106, 119 Woodin, M. 153 Woods, R.P. 293, 312, 332–3, 354, 378, 398, 438 Woodward, A.L. 292–3, 318, 333 Wright, C.E. 687, 689 Wright, E.W. 267, 285 Wühr, P. 161, 175, 243, 404–5, 520, 522–3, 526, 528–34, 536–7, 540, 552, 582–3, 585 Wundt, W. 408, 442 Wurtz, R.H. 64, 73, 114, 116
Xing, J. 140, 153
Yakimoff, N. 159, 175 Yamanishi, T. 213–14, 226 Yamazaki, K. 481, 491 Yantis, S. 594, 607, 612–13, 620, 622, 624, 627 Yap, G.S. 144, 154, 349, 354, 374, 378 Yeo, C.H. 271, 285 Yeo, G.F. 138, 154 Yonas, A. 192
Yoshizawa, S. 426, 440 Young, L.R. 183, 191 Youngquist, G. 177, 192 Yu, K.P. 608
Zaal, F.T.J.M. 199–201 Zandt, T., van 492 Zare, S. 432, 441 Zbrodoff, N.J. 454, 472, 476, 491, 556, 628, 630, 633–4, 640–3 Zeki, S.M. 559, 586 Zelinsky, G.J. 627 Zellmann, P. 646, 671 Zhang, H. 12, 48, 463, 473, 629, 642, 644 Zhang, J. 500, 519, 629, 644 Ziegler, J. 591, 607 Ziessler, M. 241, 244, 270, 282–3, 285, 556, 645, 665, 672, 682, 686–7, 689 Zilles, K. 353 Zimmer, A. 427–8, 442 Zinchenko, V.P. 408, 442 Zorman, M. 260, 264 Zorzi, M. 445–6, 455, 473, 475–6, 492–3
707
aapd01.fm Page 708 Wednesday, December 5, 2001 10:25 AM
This page intentionally left blank
aapd01.fm Page 709 Wednesday, December 5, 2001 10:25 AM
Subject index accommodation 297, 407 accuracy 128, 132, 403, 411, 422 act communicatory 125 instrumental 125 action biological 315, 340 body 356–76 control 67, 88, 97, 112, 158, 160, 162, 165, 171–4, 270, 308, 646, 669 effect 158, 165, 172, 227, 282, 303, 305–6, 335, 420, 527, 529, 645–7, 655, 657, 666, 668–9, 687 -effect blindness 541, 582 -effect code 647 -effect learning 556, 645, 657–8, 663, 668–70 execution 158, 169, 171–2, 309, 311, 334–6, 350, 612, 645, 647, 657, 669 goal 5, 72, 78, 294, 306, 309, 311, 348–9, 351, 434, 646 goal-directed 62, 67, 71, 76, 84, 95, 97, 111, 267, 269, 282, 670 n. hand ~ 60, 290, 334–6, 348–50, 351–2, 376, 541, 545 imitative 310–11, 317 intention 106, 301, 306–7, 310, 529 mouth ~ 334, 349, 351–2 nonbiological grasping ~ 318 occluded from sight 356, 358–9, 361, 373 perception 4, 5, 288, 291, 294, 307, 317, 388, 395, 397 perception-action interface 521, 526 planning 5, 158–9, 161–2, 165, 167–9, 171–4, 521, 526–7, 533–4, 538–41, 545, 547, 549–50, 645–7, 657–8, 664, 668–9 recognition 290–1, 303, 311, 351–2 repertoire 311 representation 334, 538, 657 selection-for-action 610 sequence 5, 280, 302, 306, 374 understanding 282, 334, 336
activation direct 436, 447–9, 453, 494–8, 500–5, 510–11, 513, 516–17 response ~ 403–4 stimulus ~ 13, 16, 18, 28–9, 43–5 -suppression hypothesis 494, 497, 510, 516–17 active intermodal mapping (AIM) 63, 200, 231, 301, 311, 318, 613, 651 adaptation 133 agnosia 58, 60, 69, 70, 72, 74, 93–5, 101, 110–12 AIP 77, 138, 140 alternation 208, 219, 391–2, 396, 414, 477, 479, 481–5, 487–9 anterior superior temporal polysensory (area STPa) 381, 396 anticipation 279, 282 effect ~ 645, 669 perceptual 428, 430 antiphase 210–12, 214, 216–18 aperture problem 384–5 aplastic phantom 152 apparent motion 292 apraxia 122, 294 articulation 362, 364, 366, 367–70, 411 assimilation 297 association long-term 445, 447, 449, 453, 458–9 short-term 445, 449 asynchrony 198, 214–15, 227–8, 230–1, 233–9, 241 n.–2, 263, 413–14, 454, 560, 603 negative 198, 200, 227–31, 234–9, 241–2 attention 642 involuntary 620 social 357 spatial 138, 587–9, 591–2, 595, 606, 624–5 visual 59, 148, 160, 555, 579, 587, 609–14, 620–6 visual attention model (VAM) 609–10 attentional blink (AB) 65, 141, 240, 372, 516, 555, 561–2, 569, 578–9 attractors 205–6, 435
aapd01.fm Page 710 Wednesday, December 5, 2001 10:25 AM
710
Subject index
automatic processing 574–5 response activation 403–4 pilot 63, 89–90, 93–5, 98, 106, 111 response activation 475, 488 automaticity 94, 325, 404, 474, 488, 630, 673
batch 562, 576 bimanual trials 221 binding action-relative 170 cross-dimensional 596, 599 effect 158, 162 efferent 199, 266, 270–2, 276–7, 280–4 feature 538, 540, 542, 545, 547, 549, 595 impact 170–1, 173 problem 5, 356, 375–6, 538–40, 571–2, 598 within-shape 596, 598 blindsight 70–2, 74, 78–80, 84, 86, 88, 93–5, 105, 110, 121 body -centered frame of reference 178, 180 movement 4, 270, 356, 361–4, 366–70, 372, 397, 410 posture 141, 357, 359–361, 367, 410, 413 schema 58, 61, 136, 145–9, 151–2, 393, 395, 397 bottleneck 81, 520, 522, 532–4, 558–60, 563–5, 570–1, 576, 579–80 brain injury 122, 148 Broca’s area 310, 396
calibration 120, 126, 134, 185–90, 614 capacity central 558, 565, 567, 576–7 limitation 303, 522, 532, 542, 555, 558, 559–61, 563–6, 569–72, 576–7, 579–82, 610 causation 269–71, 276, 284 chaos 206 chording 558, 580–1 chunk/chunking 558, 580–1, 581 clocks 197, 391 code(s) consolidation 549, 578–80, 583 content-specific 5 effect ~ 527, 653, 669 event ~ 230, 235, 405, 434, 527–8, 532–4 feature ~ 527–8, 533–4, 540–2, 546–8, 550, 571 object ~ 568 perceptual ~ 495, 568, 576–80, 583 stimulus ~ 474, 527
coding common 161, 171, 173 event coding account 520 of visible and hidden actions 356 cognition process 549 representation 538 structures 549 coherence 3 functional 3 collective variables 435 common mediation 162, 165, 167, 171–4 compatibility effects 12, 21, 241, 308, 316, 403, 444, 476–7, 479, 570–1, 629, 642, 648–56 response ~ 4, 5, 443, 474, 477, 648–9, 651–5 spatial 404, 443, 446, 464 stimulus–response (S–R compatibility) 10, 29, 44, 403–4, 443, 477, 545, 628, 682 conditional accuracy functions 495, 497–9, 504, 516 congruency 589, 616, 618, 628, 634–8, 640–1 congruent 104, 454, 483, 494, 628–9, 637, 640–1 incongruent 454, 628–30, 637, 640–1 conjunction illusory 375, 540, 596, 600–1, 603–5 map 597–9, 606 consciousness 112, 124, 199, 267, 269–72, 284, 357, 407, 415, 434 consolidation short-term 522, 549, 558, 561–3, 576–9, 581, 583 content-specific interactions 401, 403, 405 control action ~ 67, 88, 97, 112, 158, 160, 162, 165, 171–4, 270, 308, 646, 669 distal-effect 174 effector ~ 162, 165, 172–4, 444, 495 goal-driven 613–4, 623 movement 64, 73, 140, 144, 159, 612, 624 parameter 211–12, 214, 419 stimulus-driven 613–14, 623–4 strategic 325, 404, 470, 476 coordinates body-part centered 144, 151 egocentric 60, 132, 686 hand-centered 151 coordination 5, 73, 84, 148, 199, 200, 209–12, 214, 217–19, 222–3, 241, 302, 406, 435, 517, 553, 555, 581, 628, 669, 687 corollary discharge 408 correction error ~ 235, 240, 247, 253, 263 period ~ 263 phase ~ 215, 219, 263
aapd01.fm Page 711 Wednesday, December 5, 2001 10:25 AM
Subject index
correspondence 308, 443, 466, 494–6, 498–501, 503–5, 509, 512, 515–17 cortex anterior superior temporal polysensory (area STPa) 381, 396 area PRR 138, 140 inferotemporal 74, 121 parietal 138, 161, 240, 334–6, 350, 352, 370–1, 376 posterior parietal 63, 65, 69, 71, 75, 77, 112, 121, 290, 334–8, 349–52, 372, 610 prefrontal 77, 85, 335, 501 premotor 77, 85, 109, 137, 140, 143–4, 149, 151, 161, 290, 292, 309, 350–2, 374, 382, 396 primary motor 77, 138, 140–1, 395–6, 416, 496, 501 superior parietal 137, 139–40, 152 supplementary motor 140 temporal 63, 69, 77, 356–7, 367, 370, 372, 374–6, 610 V6A 138, 140 ventral premotor (Area F4) 85, 334–6, 349, 396 ventral premotor (Area F5) 85, 136–43, 146–7, 152, 290, 309–10, 317, 328, 334–6, 340, 350–1, 396 coupling function 211, 217, 219, 221 input-output ~ 173 perception-action ~ 158, 197, 200, 495, 609–14, 618–24 synergistic 158, 173–4 crosstalk 468, 534, 558, 570, 572–3, 576 cumulative density functions 495, 497–8, 503, 516
deafferentation deafferented patient 238 delta plots 494–5, 497–500, 502–5, 508–16 dimension visual 587–9, 591, 595–7, 606 dimensional action-model 588–9 cross ~ 589, 591, 596–606 n. dimensional overlap 4, 9–11, 13, 16, 41, 445, 516, 642 model 4, 9, 10, 12, 13, 17, 18, 20–1, 26, 36–7, 42, 44–6 n., 445, 642 dimensional overlap taxonomy type 1 task 10, 13, 15, 16, 22–8, 30–3, 36–7, 40–1; see also neutral task type 3 task 10, 16, 36–7, 39, 41, 44–5; see also Simon task
type 4 task 10, 17, 21, 36–7, 39, 41–2, 44; see also Stroop task type 8 task 12, 21; see also Stroop task dimensionality 206, 435 distractor 570, 601–3, 612, 615 similarity 635 distributed representations 538–9 distributional analysis 404, 494–7, 500, 503, 511, 513, 516–17 dorsal stream 62–3, 78, 84–6, 93–5, 98–9, 104, 106, 109, 139, 290–1, 352, 370–1, 625 double dissociation 65, 70, 111, 122–3 interaction 63 dual task 81, 213, 534–5 n. dynamical systems 197, 202, 205–6, 213, 221–4, 435 theory 206, 222, 435
Ebbinghaus illusion 123 effect anticipation 645, 669 action ~ 158, 165, 172, 227, 282, 303, 305–6, 335, 420, 527, 529, 645–7, 655, 657, 666, 668–9, 687 code 527, 653, 669 compatibility 648–56 distal 161, 171, 173–4, 527 efference copy 141, 161, 240 egocentric 60, 126, 132 electroencephalogram (EEG) 268, 481–2 emulation 296–9, 311 encoding stimulus 520 visual 405 enhancement stimulus ~ 296–8, 300 environmental consequences 269–70, 673, 676 event code 230, 235, 405, 434, 527–8, 532–4 event-related brain potentials (ERP) 20, 481, 496, 517, 569 expectancy 281, 480, 490 exteroception 181 extinction tactile 147–8 eye movements 128, 130, 271, 415, 422, 436 saccadic 122, 611, 613–14 smooth-pursuit ~ 159
feature binding 538, 540, 542, 545, 547, 549, 595 code 527–8, 532–4, 540–2, 546–8, 550, 571
711
aapd01.fm Page 712 Wednesday, December 5, 2001 10:25 AM
712
Subject index
feature (cont.) integration 34, 533, 538, 540, 544–5, 547–8, 587, 600 overlap 405, 520, 524, 527–30, 532–4, 541–50 feedback auditory 198, 227–8, 230–1, 233, 235–9, 299 correction 207 kinesthetic 198, 221, 223, 230, 234 proprioceptive 150, 237, 240–1, 301 tactile 230–1 feedforward 241, 582 flanker task 496, 501, 573, 578, 589, 687 fMRI 294, 310, 317, 351, 517 forward model 198, 227, 240, 271, 669–70 modeling 240 frame of reference body-centered 178, 180 environment-centered 177, 179, 185, 191 free will 267, 269, 271 gestures 297, 300–1, 303–5, 316, 331, 336, 340, 396, 416, 432–4, 632–3 goal(s) action ~ 5, 72, 78, 294, 306, 309, 311, 348–9, 351, 434, 646 -directed action 62, 67, 71, 76, 84, 95, 97, 111, 267, 269, 282, 670 -driven control 613–14, 623 end-state ~ 306 hierarchy of 298–9, 302–3, 305, 309 mental 295, 302, 306, 310–11 physical 310–11 grasp components 321–3, 326–7, 330–2 reach-to-grasp 319–22 grasping nonbiological grasping action 318 hand action 60, 290, 334–6, 348–52, 376, 541, 545 movement 64, 66, 69, 141, 216, 261, 307–9, 316–17, 321, 334, 336, 416, 423, 481, 539, 541, 611 handedness 427 handwriting 202, 428–30 hierarchical processing 356, 361 iconic memory 559 ideomotor principle 303, 310–11 illusion(s) motion-induced 99 Müller-Lyer ~ 67, 98, 106, 123
rubber hand ~ 149 Titchener ~ 99 visual 65, 67–8, 93, 98–9, 101, 106–7, 113 n. imagery 199, 245, 247–8, 251, 253–63, 426–9, 544 mental 84, 426 imitation 4, 5, 257, 261–2, 282, 288–92, 294–303, 305–11, 315–18, 325, 331, 396, 424, 444, 686 implicit knowledge 145, 410, 419, 427, 437, 687 learning 681 information reduction hypothesis 9, 25–7, 42–3, 45 instrumental conditioning 295 integration feature ~ 34, 533, 538, 540, 544–5, 547–8, 587, 600 sensorimotor 138 intentions 685 interference 538, 545, 549 noise ~ 652, 654 nonspecific 520–2, 524, 529–30, 533 specific 520–1, 526, 528, 533, 535 integration 3–5 intermanual transfer 682, 685 intermodal mapping 301
jerk minimum jerk movement 416
kinematic analysis 65 kinematics 67, 69, 315, 318–320, 322, 325, 329–32
lateralized readiness potential (LRP) 68, 474, 478, 481–2, 484–9, 496, 501, 517 learning action-effect ~ 556, 645, 657–8, 663, 668–70 effector-based motor ~ 687 explicit vs. implicit ~ 677 goal-based motor ~ 673, 676–7, 681–2, 685–7 goal-based vs. stimulus-based ~ 681 intermanual transfer of 682, 685 location-based motor ~ 681–3, 685 implicit vs. explicit ~ 677, 681 motor ~ 673, 677, 685, 687–8 motor sequence ~ 673, 677 response-effect ~ 658, 660–5, 667–9 sequence ~ 241, 673–4, 677, 680, 682, 685–6 stimulus-based motor ~ 677 stimulus-stimulus ~ 665 transfer of 557
aapd01.fm Page 713 Wednesday, December 5, 2001 10:25 AM
Subject index
lesions 68–70, 73–6, 93–4, 112, 120–1, 138–9, 141, 152, 613, 673 limit cycles 222–3, 435–6 LIP 77, 140, 613 localization error 158–60, 162, 164–7, 170–3 spatial ~ 158, 171, 173–4 locomotion 58, 60, 177–91, 197, 241 n., 381, 389, 393–4 as a class of actions 177 observation of 381, 389, 393–4 self-locomotion 180–3, 191 long-term memory (LTM) 446, 558, 563 Lyapunov exponent 206
Macaque monkey 138, 290–1, 317, 334, 338, 356–7 mapping active intermodal mapping (AIM) 301, 311 mixed 447–8, 458, 469 S–R mapping and motor learning 678, 681–2, 685 stimulus–response ~ 10, 12, 21, 23–5, 28, 32–3, 36, 283, 403, 445–6, 449, 468, 475–6, 480, 488–9, 496, 570, 678, 681–2, 685, 688 n. matching systems 334–6, 351 memory iconic 559 long-term memory (LTM) 13, 446, 558, 563–4, 573, 577 short-term memory (STM) 13, 446, 522, 555, 558–61, 563–4, 569, 576–7, 582 visual short-term memory (VSTM) 532, 559–62, 576 mental image 84, 426–7 MIP 77, 138, 140, 143 mirror movements 681–3, 685 neurons/neuron system 161, 290, 292, 309–11, 317–18, 328, 334–5, 340–4, 346, 348–52, 382, 396, 426 mixing costs 468–70 modality response ~ 443, 633 model anticipative 646, 669–70 dimensional-action ~ 589 dual-process ~ 495–6, 501 forward ~ 198, 227, 240, 271, 669–70 representational 229, 231, 238–9, 242, 410 sensory accumulator model (SAM) 198, 200, 228–9 visual attention model (VAM) 556, 609–13, 623–5
modules 3, 12, 13, 15, 45, 587–90, 592, 595, 598–9 momentum representational 158–9, 172–3, 409–10 motion apparent 65, 292, 382–3, 390–3, 408–10, 413–15, 419, 432 biological 370, 376, 410–11, 413, 415, 417, 426, 433 bodily 381 -induced illusions 99 natural 419 object ~ 183, 381–2, 387, 411 visual ~ perception 410 motor competence 406, 410, 412, 420, 428–9, 431, 433–4 implementation delays 208 -perception interaction 246, 406, 408, 419, 436 space 351, 409 theory of perception 407 system 394–5, 397 selective activation during observation 396 transfer 686–7 motor learning 178–9, 181, 185, 190, 270, 557, 673, 677, 685, 687–8 effector-based 687 goal-based 673, 676–7, 681–2, 685–7 location-based 681–3, 685 motor sequence ~ 673, 677 stimulus-based 677 mouth action 334, 349, 351–2 -hand synergies 351 movement(s) bimanual 197 eye ~ 5, 64–5, 73, 86, 106, 128, 130, 140, 148, 159, 162–3, 165, 167, 172, 271, 415, 422, 436, 482, 609–11, 613–14, 623–4 hand ~ 64, 66, 69, 141, 216, 261, 307–9, 316–17, 321, 334, 336, 416, 423, 481, 539, 541, 611 minimum-jerk ~ 416 path 289, 303–4, 391 selection 316, 609–14, 618, 623–5 moving average 204–5, 223 Müller-Lyer illusion 67, 98, 106, 123 multimodal input 136 neurons 144, 151 receptive fields 145 representation 137, 145 muscle sense 407–8 music perception 199 musician practice 686
713
aapd01.fm Page 714 Wednesday, December 5, 2001 10:25 AM
714
Subject index
musical performance 581 structure 200, 245–7, 256–7, 261–3 transfer 686
negative asynchrony 198, 200, 227–31, 234–9, 241–2 nerve conduction hypothesis 198, 228 neutral task 10, 16, 17, 41 noise interference 652, 654 numbsense 71–2, 78–80, 84, 86, 88, 93, 95, 97, 105, 110
object recognition 58–9, 72, 111, 139, 352, 371, 389, 587–8, 595–7, 600–1, 605–6 n. optic ataxia 58, 68–70, 72, 74, 91, 93, 102, 104, 111–12 order parameter 210–12 orientation spatial 58, 73, 145, 177–8, 185, 188–91, 389, 394 oscillator 198, 200, 205–6, 211, 213–14, 221–3 overlap feature 405, 520, 524, 527–30, 532–4, 541–50
pacemaker 213 parietal cortex 138, 161, 240, 334–6, 350, 352, 370–1, 376 lobe 68, 74, 76, 137–40, 147, 152, 310, 350, 352 pathway cognitive 121–2, 128, 132–4 sensorimotor 121–3, 125, 127–8, 131–2, 134 subcortical visual ~s 62 two visual ~s 85, 123, 132 pattern recognition 120–1 perception action ~ 4, 5, 288, 291, 294, 307, 317, 388, 395, 397 -action coupling 158, 197, 200, 495 -action interface 521, 526 -action sequences 5 and action 3–5, 57–63, 67, 69–70, 78, 100–1, 110, 122, 158, 161, 167, 171, 173, 178, 195, 199–200, 245, 248, 266, 289, 294, 303, 331, 401, 403–5, 427, 434, 470, 521, 528, 533–4, 538, 541, 549–50, 553, 555–6, 558–9, 568, 582, 587, 595, 622–4, 628 categorical 412, 436 motor theory of 407
music ~ 199 selection for 609–10, 613, 618, 620 space ~ 4, 55, 188, 352, 408 visual motion ~ 410 perceptual anticipation 428, 430 code 495, 568, 576–80 code consolidation 583 n. motor-perception interaction 246, 406, 408, 419, 436 space 4, 59, 161–2, 165, 170–4 priming 383 perturbation 66, 88, 92–4, 96, 104, 106, 108–9, 212 phantom limb 145, 148, 151 phase relative 199, 200, 207, 210–12, 218–19, 221 space 205–6, 435, 437 transition curve 214 pipelining 558, 579–81 pointing 609–12 point-light walker 393–4 polyrhythm 197, 219–21 posterior parietal cortex (Area PF) 63, 65, 69, 71, 75, 77, 112, 121, 290, 334–8, 349–52, 372, 610 posture body 141, 357, 359–61, 367, 410, 413 power spectrum 205, 221 two-thirds power law 416–17, 419, 422–3, 425, 429, 431 precue 448, 454, 458, 468, 501 precuing benefits 448–9, 468 category 459 effects 459, 466, 469 trial types 466 mapping 459, 466 task 454 prediction of sensory consequences 227, 229–30, 235, 239 prefrontal cortex 77, 85, 335, 501 prehension 69, 98, 336, 344 premotor cortex 77, 85, 109, 137, 140, 143–4, 149, 151, 161, 290, 292, 309, 350–2, 374, 382, 396 theory 612, 623–5 primary motor cortex 77, 138, 140–1, 395–6, 416, 496, 501 prime-probe pair 22–4, 30–3, 38–9, 45, 50, 54 primer human 315, 318, 321–3, 325, 327–32 robotic 330, 332
aapd01.fm Page 715 Wednesday, December 5, 2001 10:25 AM
Subject index
priming effect 68, 315, 318, 322, 325, 327–32 motor 292, 315, 328, 332 negative 37, 51, 595 semantic 68, 568, 574–6, 578–9, 642 visuomotor 318 prism 110, 148–9, 151 adaptation 110, 149, 151 processing automatic 574–5 controlled 471 response ~ 5, 44–5, 549, 556 stages 20, 435, 587 stimulus ~ 5, 16, 20, 44, 527, 549, 588, 625, 647 proprioception 5, 72, 80, 137, 141, 143, 145–9, 180–2, 227, 239, 645 psychological refractory period (PRP) 520–3, 526, 528–32, 534, 555, 558, 563–6, 570–2, 576–8, 580 pursuit tracking 425
rate parameter 209 reach to grasp 319–22 reaching 17, 58, 60, 121, 140–1, 148–51, 267, 290, 538, 612, 665, 670 component 321–2, 326–7, 329–30 reaction time (RT) 474–84, 488–9 serial ~ (SRT) 673 readiness potential 68, 268–9, 474, 478, 481, 486–7 recalibration 148–9, 185–6, 188–91 receptive fields bimodal 349 tactile 144, 335, 338 visual 58, 144, 151, 349 recognition object ~ 58–9, 72, 111, 139, 352, 371, 389, 587–8, 595–7, 600–1, 605–6 n. pattern ~ 120–1 relaxation 212, 214 repetition effect 9, 19–21, 26–9, 33–6, 39, 41, 43–6, 51, 449, 453, 459–60, 466–7, 470–1, 476–7, 479–80, 483–5, 488–9 response ~ 9, 18, 21, 28, 30–1, 33–5, 42, 53, 477, 487, 489 stimulus ~ 21, 30–1, 33–5, 44, 477, 479 representation(s) diversified spatial 188, 191 sequence ~ 675–6, 680–3, 685–7 spatial 60, 147, 272, 432 unified spatial 178, 180, 188, 191 representational momentum 158–9, 172–3, 409–10 residual activation hypothesis 25–6, 46
resonance 223, 317, 434–5, 437 response activation 403–4, 445, 474, 481, 484, 486, 488–9, 496–7, 501, 517, 556 arbitrary keypress ~ 633–4 code 5, 475, 480, 495–6, 501, 527, 555–6, 558, 561, 569–73, 575–81, 598, 674, 686 code consolidation 583 n. compatibility 4, 5, 443, 474, 477, 648–9, 651–5 competition 570, 578 dimension 403, 445 -effect learning 658, 660–5, 667–9 execution 481, 485, 520, 528, 534, 555, 579, 587–9, 657, 668–9 keypress ~ 13, 273, 277, 447, 456–7, 460–3, 465, 469, 471 planning 520–1, 526, 532–3, 546, 645–8, 653, 657–8, 661, 663–6, 668–9 preparation 481, 485, 488, 646, 653, 656, 660–1, 664–5, 670 n. processing 5, 44–5, 549, 556 repetition 9, 18, 21, 28, 30–1, 33–5, 42, 53, 477, 487, 489 selection 3, 16, 28, 43, 308, 403–4, 448, 460, 465–70, 474–5, 477, 481, 488, 516, 520–22, 549, 555–7, 565, 568, 570, 577, 582, 587–91, 595, 597–8, 606 n., 647–8, 688 stimulus–response correspondence 494 stimulus–response translation 3, 571, 576–7 suppression 501, 517 to stimulus interval (RSI) 19–22, 24, 32, 37, 41, 477, 483 typewritten 628, 633, 639–41 vocal 88, 456–63, 465, 469, 571, 628, 633–4, 636–7, 639, 641–2, 674 rhythm 202, 208–9, 219 rhythmic grouping 246, 261 robot 292, 315, 318–23, 325, 327–32, 424 robotic hand 315, 318–19, 329 Roelofs effect 57, 81, 100, 106, 124–8, 130–4
saccade/saccadic 64–5, 86, 95, 129–30, 282, 422, 609, 611–25 control 161 eye movements 122, 611, 613–14 target 160 saliency 274 scotoma 73–4 search visual 540, 559, 591, 598, 681 selection -for-action 610 -for-perception 609–10, 613, 618, 620
715
aapd01.fm Page 716 Wednesday, December 5, 2001 10:25 AM
716
Subject index
selection (cont.) movement ~ 316, 609–14, 618, 623–5 response ~ 3, 16, 28, 43, 308, 403–4, 448, 460, 465, 467–70, 474–5, 477, 481, 488, 516, 520–2, 549, 555–7, 565, 568, 570, 577, 582, 587–91, 595, 597–8, 606, 647–8, 688 sensorimotor synchronization 227, 238, 241, 245, 534 sensory accumulator model 198, 200, 228–9 consequences 198–200, 227–8, 240–1, 271, 308, 645 consequences of action 240 sequence action ~ 5, 280, 302, 306 integration 675 knowledge 673–4, 676, 680, 685, 687 representation 675–6, 680–3, 685–7 sequential effects 9, 10, 18–22, 36, 44–6 n., 469–70, 476–7, 479, 481, 484, 488–90, 495, 513, 516 serial reaction time task (SRT) 136, 272–6, 279–82, 407, 673–7, 679, 681–3, 685–7 set switching 35–6 shared mechanisms 3, 4 short-term consolidation 522, 549, 558, 561–3, 576–9, 581, 583 n. short-term memory (STM) 446, 522, 555, 558–9, 564, 576–7, 582 visual short-term memory (VSTM) 532, 559 SI 137–9, 141, 143, 337–8 SII 137–9, 335 similarity 21, 246, 250, 309, 403, 422, 425, 427, 549–50 Simon task/effect 10, 14, 17, 21, 36, 68, 112, 404, 444–6, 449, 453–7, 460–7, 469–71, 474–81, 483–5, 488–90, 494–500, 503–5, 508, 510–14, 516–17, 541–2, 558, 573–4, 576, 578, 595, 628 single-task performance 528, 535 n. social enculturated theory 301 somatosensory submodalities 139 space behavioral ~ 174 motor ~ 351, 409 perception 4, 55, 188, 352, 408 perceptual ~ 4, 59, 161–2, 165, 170–4 phase ~ 205–6, 435, 437 representation 84, 110 spatial attention 138, 587–9, 591–2, 595, 606, 624–5 coding 291
compatibility 404, 443, 446, 464, 474, 476, 480, 489 dynamic ~ orientation 177, 188 localization 158, 171, 173–4 locations 372 orientation 58, 73, 145, 177–8, 185, 188–91, 389, 394 perception 158, 160, 162, 173–4 representations 60, 147, 272, 432 response codes 682 spatial coding 134, 151, 371–3 allocentric 371 egocentric 371 spatially directed action 158 stimulus activation 13, 16, 18, 28–9, 43–5 code 474, 527 -driven control 613–14, 623–4 -driven response activation 474 encoding 520 enhancement 296–8, 300 information 3, 165, 549, 558, 571, 674, 676, 681 irrelevant 9, 12, 14–18, 36–7, 39, 41–5, 54, 308, 444, 475, 480, 489, 494–5, 504 processing 5, 16, 20, 44, 527, 549, 588, 625, 647 relevant ~ 12–18, 28, 36–7, 39, 41–2, 45, 54, 444–5, 494, 574 repetition 21, 30–1, 33–5, 44, 477, 479 -response compatibility 4, 5, 10, 16, 22, 445, 474, 477, 534, 541 -response correspondence 494 -response mapping 283, 403 -response translation 3, 571, 576–7 -stimulus learning 665 Stroop task/effect 10, 12, 14, 21, 36, 476, 558, 575–6, 578–9, 595, 628–30, 632–42 structural diversity 3 structure event ~ 197 subcortical visual pathways 62 superior colliculus 71, 73–4, 77, 120, 613 superior parietal lobe 137, 139–40, 152 superior temporal sulcus (STS) 77, 289, 291, 317, 328, 334–5, 338, 350–51, 356–7, 371–2, 376 supplementary motor cortex 140 suppression activation-suppression hypothesis 494, 497, 510, 516–17 response 501, 517 selective 494–5, 497, 500–6, 510–11, 513–14, 516–17
aapd01.fm Page 717 Wednesday, December 5, 2001 10:25 AM
Subject index
synchronization 198–200, 202–3, 214–18, 227–9, 231, 234–5, 238 sensorimotor ~ 227, 238, 241, 245, 534 systems distal-effect system 173–4 effector control ~ 173–4 nested control ~ 174
tapping finger 213, 230, 242, 396 foot 230, 233–8 task conceptualization ~ 685 conflict ~ 476, 494–6, 500–1, 504, 516–17 cross-dimensional 591, 599, 600, 604 Eriksen ~ 494, 501, 505, 517 flanker ~ 496, 501, 573, 578, 589, 687 mixing ~ 446 neutral ~ 10, 16, 17, 41 serial reaction time ~ 673 set 404, 444, 447, 449, 452, 464, 467, 470–1, 558, 572–6, 578 Simon ~ 10, 14, 17, 21, 36, 68, 112, 404, 444–6, 449, 453–7, 460–7, 469–71, 474–81, 483–5, 488–90, 494–500, 503–5, 508, 510–14, 516–17, 541–2, 558, 573–4, 576, 578, 595, 628 single-task performance 528 Stroop ~ 10, 12, 14, 21, 36, 476, 558, 575–6, 578–9, 595, 628–30, 632–42 switching 18, 34, 36, 455, 470 type 1 task 10, 13, 15, 16, 22–8, 30–3, 36–7, 40–1; see also neutral task type 2 task 10, 13, 16, 22–4, 27–4, 41 type 3 task 10, 37, 39, 41, 44–5; see also Simon task type 4 task 10, 21, 17, 36–7, 39, 41–2, 44; see also Stroop task type 8 task 12, 21; see also Stroop task temporal cortex 63, 69, 77, 356–7, 367, 370, 372, 374–6, 610 time perception 197 production 197 series 203–6, 219, 223 constraints 63, 104 timekeeper 207–9, 214, 216, 218, 222, 263 timing 4 expressive 198–9, 245–6, 247–9, 251, 260–2 interceptive 198 mechanisms 197, 199, 200, 227 movement 197, 199, 200, 202, 223, 263, 394 variability 203–4, 207, 218, 223
timing patterns 245–53, 256–64 n. complexity of 250, 252–3, 256–8, 260–1 expressive 198–9, 245–9, 251, 257, 260–3 learning of 250, 254, 256–8, 260–3 phase-shifted 245, 247, 257–62 principal components 248–50 typicality of 247, 250, 257–8, 262 Titchener illusion 99 trajectory 60, 65, 141, 150, 159, 162–3, 165, 168–9, 198, 205–6, 211, 221, 223, 290, 343, 345–6, 368, 416–25, 430, 437, 673 transfer intermanual 682, 685 of learning 557 of sequence knowledge 674, 676–7, 681–3, 685 translation efficiency 447, 467 transmagnetic stimulation (TMS) 317 two-step response selection 447–8, 467 two-thirds power law 416–17, 419, 422–3, 425, 429, 431
vection circular 182–3 linear 183 ventral premotor cortex (Area F4, F5) 85, 334–6, 396 stream 60, 62–3, 72, 76–7, 84–6, 93–5, 98–9, 102, 106, 109, 139, 290–1, 352, 370–2 verbalization 63, 79–82, 105, 109–10, 134 vestibular system 180 visual analysis 358, 372, 381–2, 392–7 attention 59, 148, 160, 555, 579, 587, 609–14, 620–6 attention model (VAM) 556, 609–13, 623–5 dimension 587–9, 591, 595–7, 606 encoding 405, 520–3, 526, 528–9, 531–2, 534, 560, 567–8, 582, 585 occlusion 389 pathways 63, 73, 370 perception 59, 63, 67, 73, 100, 112, 214, 381–2, 385, 388, 390, 393–7, 406, 412, 539, 609–12, 614, 619, 623–4 search 540, 559, 591, 598, 681 short-term memory (VSTM) 532, 559–62, 576 two visual systems 62–3, 73, 93–4, 98, 110 systems 587, 595 visual motion perception 410 human movement 381–3, 387, 389–97 illusions 65, 67–8, 93, 98–9, 101, 106–7, 113 n. integration 381–3, 385–90, 392–4, 397 locomotion 389 object 381–3, 385–95, 397 segmentation 383, 387
717
aapd01.fm Page 718 Wednesday, December 5, 2001 10:25 AM
718
Subject index
visually guided behavior 120–1 visuomotor network 63–4, 68–9, 73, 75–6, 78, 84–5, 92, 94, 99, 102, 106, 108–9, 112 volition 140, 266, 282 voluntary actions 161, 268, 281, 284, 521, 646 behavior 302
control 90, 94, 109–10, 470, 646 effort 426 gating 467, 470 movements 68, 416, 423, 425, 669 saccades 282
walking 363–5, 370